PDF Snapshot

by: Allan Thræn

A quite common request I have heard a number of times is the need to take a snapshot of a web site and store it securely, in order to in the future be able to proof what was stated on the site at a given date. I know a number of EPiServer customers have already implemented solutions for this – but so far there haven’t been an easy, generic solution to the problem. Until now :-)

Here is a small, handy scheduled task, that you can set up to take a snapshot of the entire website into JPG and / or PDF every day. The snapshot can then be stored either on the hard-drive or in a virtual path and later accessed. It uses ExpertPDF's HTML to PDF converter for which EPiServer has bought a redistributable license.

The installation is fairly straightforward – just copy the 2 assemblies in this zip into your bin folder and you should see it appear as a scheduled job. Before you run it the first time, make sure that you go to the Admin-mode Plug-in-Manager and fill in the configuration:

image

  • Author name – the PDF meta-field for the Author.
  • Which page size to use. Americans often use “Letter” and Europeans tend to go for “A4”
  • If some of your pages require log in to see, you can provide a user name that it should impersonate in order to retrieve those pages.
  • The folder where you want your snapshots to be stored. Either a local physical folder (make sure that the IIS user has access) or a virtual path like “~/Global/Snapshots”
  • Finally, you need to check whether you want PDF files to be generated, JPGs and if PDF’s should have a header and footer indicating where they are from and when they were generated.
  • Set the starting point for the generation and you are done!

The files generated will be in a folder hierarchy similar to the site structure, and all languages will be extracted and files generated. Since this can be a rather slow process, it spawns off in it’s separate thread, that reports back to the Scheduled Task log when it’s done. If you try to start a new instance of the job manually, while one is already running, it will simply report back the progress the existing job is having.

image

Here’s a couple of examples of generated snapshotsPDF and JPG.

Note that this is released as a research prototype. No guarantees or promises – use AS-IS. Suggestions for improvements and bug-reports can me left in a comment or tweeted to me (@athraen).

Download the assemblies here. (EPiServer CMS 5 R2 SP2)

28 January 2010


Comments

  1. Wouldn't it be better if a screenshot was saved each time a page was published? Otherwise there would be a possibility to miss published information. Or does it take to long for it to run one page?
  2. Markus, The reason for not taking a screenshot every time a page is published is that it might not just be that one page that's affected by the publishing. All pages with menus containing links to that page might be affected. In fact, they might be affected by other things than publishing - like change of permissions, vpp-file edits, dynamic properties, code-updates, etc. And grabbing the entire site after every single change would be a big deal.
  3. This is great news on many levels. One sentence that made me super exctited though is "It uses ExpertPDF's HTML to PDF converter for which EPiServer has bought a redistributable license." Does it mean that we can use the library in our Episerver projects? If that is the case, you all guys deserve a big hug! We have built some in house converters but that will never be as good as something that a company that dedicated it's full resources to have a product made.
Post a comment    
User verification Image for user verification  
Allan Thræn

About me

I am a product manager @ EPiServer, with a passion for the more geeky side of things. My technical interests are typically focused around user problems, user experience,  search, information management, artificial intelligence and  personalization

On top of this blog I have the blog Allan On Technology and I often crosspost.

DISCLAIMER: Unless otherwise stated in the posts, this blog expresses my personal opinions, experiments and views, not necessarilly the views of EPiServer AB.

 528 page views this week.

 

 

Syndications


Archive


Tag cloud

EPiTrace logger