The EPiCode Cache Framework

by: Steve Celius

On the EPiServer Developer Summit I demonstrated a way to speed up your lists by caching the content after the first initial loading, using a small framework that helps you extract the code that loads the content. It handles the caching completely transparent to you, and has been written to make as little impact on your existing code as possible.

It is called the EPiCode Cache Framework, and it is available right now on EPiCode. This article describes the cache framework more details. A big thanks to Nick Urry for porting the code to CMS 5.

The "Big List Problem"

In my presentation I showed that finding the last 5 news items and adding them to a NewsList from a newscontainer with 100 pages took 45ms, but doing the same thing with a 2000 pages container took 950ms on EPiServer CMS 4. On CMS 5 the numbers were 18ms vs. 400ms.

The test was done by filling a NewsList control by setting the PageLink property to point at the 100 pages container and then at the 2000 pages container. It had MaxCount set to 5 and no sorting specified (the NewsList sorts on PageStartPublish by default). I measured the numbers with a rather crude timer, but the numbers in themselves are not that important, it is the relative increase in time and resources that we see due to the number of pages that is interesting.

Remember, we're only showing the 5 latest articles, a very common operation, which can be a rather expensive one when we have to work through lots and lots of pages to find the newest (even though we're limiting the number to 5.)

Enter Caching

When a site grows, performance can degrade due to the way content is organized because EPiServer has to work its way through a lot of data to find the parts that you are interested in.

The point I tried to make on the Developer Summit talk was that caching data in scenarios like this really can help you decrease some of the CPU usage and memory allocation/deallocation on the server. Caching is not a silver bullet, but it is one of the most useful tools you've got when you're in a performance tight spot.

The aim of the EPiCode Cache Framework is to help you cache content with as little impact on your existing code as possible. Often, the real fixes to performance problems on a customer site requires re-architecting, and possibly a major rewrite of central pieces of your code. You do not want to rewrite and test lots of code when you're having performance problems, you should focus on fixing the biggest problems (and most visible ones) and get the fixes out there as fast as possible.

Using the Cache Framework

Here is a small example on how to change the way you fill a NewsList from code, using the cache framework. This code is placed in the code-behind file of a template that has a NewsList control that shows the last 5 news items directly below itself. I have removed the PageLink property from the markup that was previously responsible for filling the news list:

protected void Page_Load(object sender, System.EventArgs e)
{
    // Simple filtered GetChildren
    NewsListingData lstData = new NewsListingData();
    lstData.PageLink = CurrentPage.PageLink;
lstData.SortOrder = FilterSortOrder.PublishedDescending; lstData.MaxCount = 5; // Get pages, from db or cache myNewsList.DataSource = lstData.GetPages(); myNewsList.DataBind(); }

The code instantiates a new object of the type NewsListingData, it sets the PageLink property on the object to where the list should get its content, a sort order and finally we give the list a max count of 5 items. What we have done so far is to tell the NewsListingData object how to find its pages, we haven't yet retrieved any of them.

We then set the DataSource on the NewsList by calling the lstData.GetPages() method and bind the list. The GetPages() method is responsible for actually "getting" the pages and return them in a PageDataCollection.

As you can see, not much code is needed to fill a list with pages, with caching enabled. Of course, there is some logic in the NewsListingData class, but as you'll see, even that is simple.

This is what the NewsListingData class looks like:

public class NewsListingData : 
FilteredPageDataCollectionCacheBase { private PageReference _pageLink; public PageReference PageLink { get { return _pageLink; } set { _pageLink = value; } } protected override PageDataCollection PopulatePages() { return Global.EPDataFactory.GetChildren(PageLink); } protected override string CreateCacheKey() { // If there are any filters, // we need to add to the key string key = base.GetPartialFilterSortKey(); return "StartPageChildrenAnonymous_" + key + "PageLink_" + _pageLink.ID; } }

The NewsListData class is what we call a "query class" in the cache framework because it is responsible for loading PageData objects.

The class inherits from FilteredPageDataCollectionCacheBase and has a PageLink property to tell where to fetch pages from. Two other methods are implemented, PopulatePages and CreateCacheKey, which are required for this to work (they are abstract methods that needs to be implemented).

This is what happens when we call GetPages() in the code behind file with a news list:

image

  1. We create the object responsible for fetching the data (in the example above, this is NewsListingData.) This is called the "query" class.
  2. Then we call GetPages() on this object. The GetPages() method is defined in a base class called PageDataCollectionCacheBase.
  3. We need a unique cache key, so we'll ask the query class to provide it. The query class needs to implement the abstract CreateCacheKey method, and return a string that is unique for the query being performed.
  4. The base class checks the cache, to see if GetPages has been called before, and something has been stored in the cache. We assume this is the first call, so nothing is in the cache.
  5. The base class calls the abstract PopulatePages method, which the query class (that means you) must implement. This method returns a PageDataCollection with pages to the base class.
  6. In this case, the query class inherits from FilteredPageDataCollectionCacheBase (yes, it is a long name) instead of the PageDataCollectionCacheBase directly. The FilteredPageDataCollectionCacheBase class knows how to sort and limit on MaxCount (if you tell it to), which is done in this step.
  7. The PageDataCollection is now filled, sorted and "maxcounted", and is ready to be stored in the cache. The unique key is used to store the pages in the cache, and a dependency to the global EPiServer page cache is added to it. After this is done, the collection of pages is returned to the query class.

The next call to GetPages() looks like this:

image

  1. The query class is created, the same query parameters are used
  2. GetPages() is called, leaving control to the base class
  3. The unique cache key needs to be created, so we can look in the cache for any existing data
  4. Based on the key, we find a PageDataCollection in the cache
  5. The collection is returned as it was found in the cache

To summarize - when you call GetPages() (from the base class), it will check the cache first, and hand you the cached value if it is found. If not, it will ask you to populate the pages yourself (or provide criterias if it is a search). The result is then stored in cache for the next look-up.

One important thing to notice is that the cache framework is dependent on a unique key for storing the pages. This key is used for storing the pages in cache, and retrieve them later. If the key is not unique, you risk ending up with the wrong pages in the cache or as a result from the cache lookup.

To provide a unique key, implement the abstract CreateCacheKey() method, and return a string that is built using one or more of the parameters you have in your class. For an example, if you implement a class that lists all children of another page, and limit the list to 5 pages, put the id of the container page and the number 5 in the key. And a string prefix to make it really unique.

If you inherit from the FilteredPageDataCollectionCacheBase class instead of the PageDataCollectionCacheBase you even get a helper method to add the filter criterias (MaxCount and sorting) to the unique key, like this:

protected override string CreateCacheKey()
{
    // If there are any filters, we need to add to the key
    string key = base.GetPartialFilterSortKey();
    return "StartPageChildrenAnonymous_" + key + 
"PageLink_" + _pageLink.ID; }

In the example above, the key would be :

StartPageChildrenAnonymous_MaxCount_5_
SortOrder_PublishedDescendingPageLink_123

(Intentionally wrapped here for readability.)

As you can see, creating your own query class is easy, the same goes for using it. In my talk I showed that the time to load the news list from the 2000 pages container went from 950ms to 2ms. I would really call that an improvement.

Conclusion

Some things you need to know before you start using the cache framework:

  • No caching is performed for logged-on users. Editors will not get pages from the cache, nor will they generate cached items. The reason is that logged on users could potentially see pages that anonymous users won't see, and even pages that other logged on users are not allowed to see. These pages cannot be put into the cache, as it would break security on your site.
    If you have logged on users, and you know that you do not have secured pages, you could turn off this check (or limit it to editors only.)
  • Each item cached has a dependency to the global EPiServer Cache Key, which is invalidated each time a page is published, regardless of where the page is in the page hierarchy. It means, if you cache 100 different lists, all of them are invalidated and removed from the cache when a page (any page) is published by an editor. The caches will then be rebuilt as they are requested. If you have user generated content, like forums, this will also be considered publishing pages, and invalidate the cache.
  • The first hit to the page will need to build the data to cache, if that is real slow and your site has many visitors, requests will queue up. There is no serialization of access to the GetPages method going on, if many users are visiting your site before the pages are in the cache, it means many users will start building the collection before it is cached (and also overwrite the existing cache item).
    Without the cache, you'd be doing this constantly anyway, but you should know that during publishing and app domain restarts, this will occur.
  • Monitor your memory usage. If you're caching aggressively, make sure you have enough available memory. 64-bit Windows allows a lot more memory than 32-bit Windows. If you really want to cache a lot and you're seeing memory exceptions, you should think about moving to a 64-bit OS.

Download

To download the latest version of the source code, please go to the Cache Framework information page and use one of the links under the Source Code heading. If you do not have an EPiCode account yet, you need to register and then apply for membership.

If you do not want to be bothered with the EPiCode membership (even though it is free), download a snapshot of the current source code revision right here:

Source code is based on revision 531 of the EPiCode source repository.

The next article about the cache framework will show how you can write FindPagesWithCriteria calls with a cached result. The base class for doing FindPagesWithCriteria can help you write cleaner and more maintainable page searches in EPiServer.

Stay tuned!

15 June 2008


Comments

  1. Very nice swimlane diagrams!
  2. What would be the best way to narrow the results for NewsListingData to only pages of a certain page type?
Post a comment    
User verification Image for user verification  
Steve Celius

About me

I work for EPiServer in Norway, mostly with technical stuff. Trying to keep up with all the new stuff from the development team. I also hang out on the EPiCode project, why don't you come join us?

Number of visits:

841


Syndications


Archive


Tag cloud

EPiTrace logger