A lot of the questions in the EPiServer Community forum are about how to fetch sets of entities filtered and/or sorted in a specific way from the database. I’m going to present five strategies for doing that in this post. While I wont solve any specific problem I hope that these strategies will provide an insight in how these types of problems can be approached.
The five strategies below are ordered after complexity, with the simplest first. This is also the order in which they should be tried, except perhaps for #2 and #3 as which one of those two that are best to use differs from situation to situation.
1. The native Handlers
Each module in the community framework contains a handler class, such as BlogHandler, ForumHandler and ClubHandler. These classes has methods for adding, getting, updating and removing single instances of the entities in the module. They usually also contain methods for getting collections of these entities with paging, basic filtering and sorting.
The results returned by these methods are always cached, as long as you are using them to retrieve page one, so the database, which is usually the bottleneck, will only be queried once for the same combination of method parameters. The cache will also be cleared when it should be, such as when entities in the cached collection are updated or removed or when new entities that belong in the collection are saved.
While these methods are limited to basic filtering and sorting they are usually the best choice to use in situations where it’s possible to use them.
Example usage
The native methods usually provide filtering and sorting of entities based on the entities own attributes, such as getting the last created blog entries in a specific blog:
EntrySortOrder[] entrySortOrders =
new[] { new EntrySortOrder(EntrySortField.Created,
SortingDirection.Descending) };
int totalItems;
EntryCollection lastCreatedEntries =
BlogHandler.GetEntries(blog, 1, 10,
out totalItems, entrySortOrders);
There are usually “shortcuts” to the handler methods on entities. The below code does the exact same thing as the code above.
EntrySortOrder[] entrySortOrders =
new [] { new EntrySortOrder(EntrySortField.Created,
SortingDirection.Descending) };
int totalItems;
EntryCollection lastCreatedEntries =
blog.GetEntries(1, 10, entrySortOrders);
Pros
- Easy to use and you don’t have to think about caching.
Cons
- None, but only applicable for simple queries.
2. The query system
When the handler methods are not enough the natural next step is to try to use the query system. The query system is basically a strongly typed wrapper around NHibernate. The handler classes in the community modules have methods named GetQueryResult which accepts an instance of a query class, such as ForumQuery, BlogQuery and UserQuery.
The query classes consist of a set of criteria (instances of classes that implement EPiServer.Common.Queries.ICriterion) and all contain a method named GetQuery which returns HQL (Hibernate Query Language) based on the criteria. When you invoke the GetQueryResult method in a handler with a query it will retrieve the result of the query by getting the HQL query from the query instance and translating that into a database query. The handler actually doesn’t do most of the work but forwards it to EPiServer.Common.Queries.QueryHandler which handles caching and then in turn forwards the O/R mapping and database querying to NHibernate.
Caching
The query system will cache the result of the query and the cache system will make sure that the cache is cleared if any of the cached entities are updated or removed. If a new entity that belongs in the collection is added the cache will however not be cleared. It also won’t be cleared if an existing entity which previously didn’t belong in the collection but belongs in it after being updated is updated.
If you want to clear the cache when new entities are added or for some other reason you can either clear the cache for a specific query by using the QueryHandler.RemoveQueryResultCache(IQuery query) method or you can manually handle all caching for the query. In order to do that you shouldn’t use the handlers GetQueryResult method but instead use QueryHandlers method with the same name as it has an overload which allows you to manually set the cachetime. Set the cachetime to none and instead manually cache the query result.
Example usage
The below code gets all forum topics posted by a specific user.
UserAuthorCriterion userAuthorCriterion = new UserAuthorCriterion();
userAuthorCriterion.User = new UserCriterion();
userAuthorCriterion.User.ID = new IntegerCriterion();
userAuthorCriterion.User.ID.Value = user.ID;
TopicQuery topicQuery = new TopicQuery();
topicQuery.Author = userAuthorCriterion;
Pros
-
Strongly typed and fairly easy to use as long as there are criterions that suits your needs.
-
Most caching issues are handled automatically.
Cons
-
The cache is not cleared for collections when new entities that belong in the collection is added.
3. In-memory filtering and sorting
An alternative to using the query system can be to use in-memory filtering on the webserver. This means getting all, or at least a wider set than you are after, of entities using a handler method and then filtering that collection in your code. Using LINQ this is usually very easy to do, but it has some clear limitations performance wise.
First of all the returned collection may be very big causing a lot of data to be transferred from the database server to the webserver and the webservers cache to be filled by data that we aren’t really using. It might not seems like a big deal, but remember that a community can have a lot of content and loading 500 000 blog entries into the webservers memory is probably not a viable strategy.
Second, if we need to filter the collection based on information that isn’t loaded with the initial data fetching we might end up querying the database a lot.
With that said, in-memory filtering and sorting can be very useful, and in some situations it might also provide a performance boost as the same set of entities can be filtered and ordered in many different ways without querying the database. And remember that it’s much simpler to scale up the webservers than it is to scale up the database server(s).
So, to recapitulate, use in-memory filtering when the original set of entities that you wish to filter or sort is very small or when it’s fairly small and the query system can’t help you.
Example usage
The example code below uses LINQ to achieve the same thing as the example code in the previous section, namely getting all topics created by a specific user. This is usually not a good way to use in-memory filtering as forums usually have a lot of topics. So in this case the query system is probably better, but it’s always nice to illustrate how the same thing can be achieved using different methods.
List<Message> topicsByUser = allTopics.Where(message =>
{
Topic topic = ((Topic) message);
if(!(topic.Author is UserAuthor))
return false;
UserAuthor author = ((UserAuthor) topic.Author);
return author.User.ID == user.ID;
}).ToList();
Pros
-
Simple if you use LINQ.
-
You don’t have to care about caching.
Cons
-
Only a viable solution for filtering/sorting small sets of entities.
4. Extending the query system
When in-memory filtering isn’t an option and the built in query functionality can’t do the trick you can extend the query system with your own queries or criterions.
You may also write your own custom HQL queries without using instances of a Query class at all by using the GetQueryResult method in the EPiServer.Common.Data.QueryFactory class. Using that last approach will however not assist you with caching. A nice work around for that is to inherit from a query class and override the GetQuery method. That does however create a class that violates the Liskov Substitution Principle, so it’s best to keep it’s visibility to an absolute minimum.
As for caching, the situation is the same as for strategy #2, meaning that the results of queries will be cached and that cache will be released when objects in a cached collection is updated or removed. But when new objects that belong in the collection, or objects that was not included in the collection but should be after they have been updated are updated, the cache wont be cleared. You will either have to live with that, manually clear the cache or handle all caching your self.
Example usage
The code below creates a subclass of BlogQuery which when executed returns blogs sorted by the creation date of the last entry in the blog. For an example of how to create a custom criteria, check out this thread in the EPiServer Community forum.
internal class BlogsByLatestEntryQuery : BlogQuery
{
public override string GetQuery()
{
return
"SELECT blog FROM EPiServer.Community.Blog.Blog AS blog"
+ " LEFT JOIN blog.Entries AS entry"
+ " ORDER BY MAX(entry.Created) DESC"
+ " GROUP BY blog.ID, blog.Created, blog.Name,"
+ " blog.ImageGallery, blog.Active, blog.NumEntries,"
+ " blog.PresentationImage, blog.PresentationText,"
+ " blog.Active, blog.Author, blog.Site";
}
}
Pros
-
Very flexible, especially if you write your own HQL queries, there are not many filtering or sorting problems that can’t be solved this way.
-
As long as you work with Query classes caching is done automatically.
Cons
-
Developing custom criterias is a pretty complex and time consuming task.
-
The cache is not cleared for collections when new entities that belong in the collection is added.
-
You have to violate good design principles to write your own HQL queries, or handle all caching yourself.
5. Custom factory methods
When everything else fails, there is one last solution that should always work: query the database yourself. Thanks to the frameworks entity layer this is actually quite easy (and fun) to do. Here's what you do:
- Take a look at one of the stored procedures in the database that returns the data for the same type of entities as you are going to query for to find out which columns to return and in what order.
- Create your own stored procedure that performs the filtering and/or sorting that you're after.
- Create methods for fetching collections using your stored procedure. Given that your SP returns the correct columns in the correct order you can load the result into a DataReader and load each row as an entity of the specific type using the GetEntityInstance method in the IEntityProvider interface. You can figure out which entity provider to use using the GetEntityProvider method in the EntityProviderHandler class.
The third step above should also include caching and clearing the cache when appropriate. Doing so is a bit out of scope for this post though :)
This strategy has a clear drawback in the fact that it relies on the community platforms database structure which may be changed in newer versions or even in service packs and hotfixes. Therefore this strategy should only be used as a last resort or if it can provide a performance boost so great that it’s worth the cost of having to give the solution special attention when upgrading.
Example usage
In section four we created a Query class for getting blogs sorted by last entry date. The same thing can be achieved using this strategy by creating a stored procedure such as this:
CREATE PROCEDURE spGetAllBlogsOrderedByLastEntryDate
AS
BEGIN
SELECT b.intID, b.intSiteID, b.strBlogName,
b.blnActive, b.intNumEntries,
b.datTimeStamp, b.intImageGalleryID,
b.intPresentationImageID,
b.strPresentationText, b.intAuthorID
FROM tblEPiServerCommunityBlog b
LEFT JOIN tblEPiServerCommunityBlogEntry e
ON e.intBlogID = b.intID
GROUP BY b.intID, b.intSiteID, b.strBlogName,
b.blnActive, b.intNumEntries,
b.datTimeStamp, b.intImageGalleryID,
b.intPresentationImageID,
b.strPresentationText, b.intAuthorID
ORDER BY MAX(e.datTimeStamp) DESC
END
The stored procedure can be executed using code such as this:
using (DbDataReader reader = DatabaseHandler.GetReader(
"spGetAllBlogsOrderedByLastEntryDate", new object[0]))
{
IEntityProvider blogProvider =
EntityProviderHandler.GetEntityProvider(typeof(Blog));
while (reader.Read())
blogs.Add(
blogProvider.GetEntityInstance(typeof(Blog), reader));
}
Pros
-
Extremely flexible, I can’t think of anything you can’t do using this approach.
-
You may also optimize certain queries using this approach as you use your own SP, and may for instance create and query against an indexed view.
Cons
-
You will have to do quite a lot of coding as you will have to write code for data access and won’t be assisted with caching.
-
The solution may break when the community platform is updated.