Introducing EPiSolr

by: LBi

Introduction

At LBi we build web sites in a wide range of platforms and technologies. For many .Net sites which have a content management element we find EPiServer is a great and cost effective framework.

Increasingly web sites are becoming more dependent on search technology. Where search once was limited to locating content on the site, it now forms an integral component to aggregating site content and providing navigational constructs.

Facetted search is an important tool allowing users to explore information navigating multiple axes independently. While LBi use a number of high end enterprise facetted information retrieval engines, they are typically very expensive and fit poorly in the price point which makes EPiServer attractive.

In the open source arena the products Apache Lucene and Apache Solr create the interesting opportunity of low end enhanced search functionality more economically than previously possible. Many CMS products now integrate this technology as the search engine of choice. Although there already exists a Lucene integration for EPiServer, it is the rich functionality including its facetted functions which are more interesting.

With this in mind LBi have explored and developed a reusable deployment of Solr tailored specifically to an EPiServer installation.

Goals

EPiSolr is the name given to the deployment package of Solr and associated .Net integration components that have been developed by LBi. Integrating commercial search technologies can be very expensive and often hard to develop, manage and deploy.

Amongst the key objectives for EPiSolr are:

  1. Simple deployment of Solr on a .Net or Unix platform
  2. Seamless integration into an EPiServer installation, even after the event
  3. Aim to minimise configuration and development requirements
  4. Provide robust full text and facetted search functionality

Some of the features of Solr which make its attractive as an economic search platform include:

  1. Zoned full text search
  2. Hit highlighting
  3. Faceted search & Analysis
  4. Caching
  5. Replication
  6. Pluggable Architecture
  7. Real-time Updates
  8. Presented via XML/HTTP and JSON APIs

EPiSolr Platform

The platform is comprised of a number of components.

image

Figure 1: Logical architecture of an EPiSolr site

Apache Solr

The core Solr package exists as a Java runtime which can run on Unix or Windows. Sadly, the Windows deployment of Solr is very basic and lacks the necessary components to properly deploy it on the Windows platform. Thankfully, the necessary components do exist in other packages and they have been aggregated to derive a runtime which can be deployed appropriately in a production environment under Unix or Windows.

Deployment merely requires the copying of the latest directory structure and execution of a service installation script. The core deployment has already been configured for standard EPiServer page data, security and meta-data constructs, spell checking, highlighting and auto suggest. On developing a new site all that is required is to add the specific EPiServer content type property definitions that need to be searched or have facets built against.

SolrNet

Communication with Solr is normally via query string and XML over HTTP.

SolrNet is an open source .Net API layer for interacting with Solr which provides an object and interface model to programme against, rather than composing query string and XML requests and interpreting XML responses.

SolrNet is a DLL used by EPiSolr and is deployed as a DLL with the target site package.

SolrTools

Although extensions to Solr support the indexing if data files, reliability and support for file types is limited.

LBi’s SolrTools component provides data file to text stream support using Microsoft’s IFilter interface. Through this, EPiSolr can index file content and include it in the Solr index, either independently or attached to an EPiServer page.

Again, SolrTools are deployed as a DLL as part of the site.

EPiSolr

EPiSolr is the glue which joins EPiServer to Solr.

EPiSolr is a pluggable architecture which allows customisation and extension of content type and property indexing behaviour.

The default deployment has extensive customisation options, however the default content type and property handlers intelligently index most cases and only require configuration to change or enhance their behaviour. When behaviour cannot be supported by the default handlers, extensions can be deployed for individual properties or content types as a whole.

All primitive EPiServer constructs are indexed including Category and Access Control List (ACLs).

EPiSolr is responsible to hooking the EPiServer events and populating and managing the Solr search index.

Index management is asynchronous with indexing operations running independently of editorial or publishing activity.

EPiSolr has its own configuration handler and section in the web.config.

image

Table 1: Example EPiSolr Configuration

Deployment is achieved by including the DLL in the site binaries and including the appropriate web.config sections.

EPiSolr hooks into the EPiServer event model by installing it as an HttpHandler which provides a convenient mechanism to control and instantiate the entry point for service registration.

EPiSolrAdminPlugin

The EPiSolrAdminPlugin provides the administrative interface to EPiSolr. It provides tools to selectively re-index content and execute diagnostic queries.

The component and interfaces are implemented as EPiServer plugins and all compiled into a single DLL which uses a VirtualPathProvider to deliver admin page templates and PagePlugIn bootstrap to initialise the VirtualPathProvider. As such, deployment merely requires the DLL to be included with the deployment and no extra configuration.

In Action

Executing searches and extracting facets is straight forward through SolrNet. Specific abstractions for individual facets can be created to make implementation typed. The majority of work to incorporate search and facetted function is in creating the user experience.

Controls to commoditise user interface constructs for facetted navigation are being developed to further speed development and reduce costs.

Below is a screenshot of a site which uses EPiSolr almost entirely in order to manage, segment and navigate its complex subscription repository.

In addition to being able to provide multiple axes of breakdown of the document collection, it is also able to determine the differences between the different subscription result sets indicating documents the visitor is entitled to as well as identifying those requiring additional subscription.

clip_image004

Figure 2: EPiSolr in action

Conclusions

Solr is no Endeca or FAST, and it is very important to differentiate high end facetted navigation and taxonomic analysis. However, we have been very pleasantly surprised what can be achieved using Solr and EPiServer. With a pragmatic view, Solr and EPiSolr provides a valuable cost effective entry point to vastly improving the value of data and ease of use in a data centric EPiServer deployment. Especially in these challenging times, from a cost benefit perspective, in many but the most demanding scenarios, it’s hard to see how Solr could be ignored as an important option for search and facetted navigation.

Deployment of Solr and EPiSolr is even easier than that of EPiServer itself. With every saving there is a cost, and the only concern is that as Solr is open source, when it comes to runtime issues, you may have to diagnose and fix the problem yourself. But at the end of the day, there is a huge community investing and evolving Solr, so hopefully this will be a minor an irrelevant point.

With EPiSolr and Solr, LBi look forward to being able to offer its clients a whole new range of richer content managed search solutions while keeping delivery and ongoing costs highly competitive.

08 July 2009


Comments

  1. Sounds intriguing. - Is this in any form or shape available for us to see and evaluate? - Is it open source or is that a closed solution that LBi uses as part of a value added for its clients? - how does EpiSolr compare to EasySearch that's available on Epicode? (it seems that they try to achieve the similar if not the same goal)https://www.coderesort.com/p/epicode/wiki/EasySearch
  2. This is really (!) interesting. I'd love to get a chance to play around with it!
  3. At this time EPiSolr has been developed as a value add for its clients, but improvements and fixes to open source components have been submitted back to the community for consideration of integration back into the public releases. Originally we considered EasySearch for enhanced search, however at that time it had no support for facetted navigation; and that was a pre-requisite for us. Given that Solr is the current platform of choice for leveraging enhanced search functionality on the Lucene platform, it made sense to develop a framework specifically geared to integrating Solr into EPiServer. I am unable to comment in detail, but since we last looked at EasySearch it looks like it is maturing nicely and well worthy of consideration, however it would appear that they need to baseline and bespoke any richer search functions such as facetted search. I imagine that to maintain high end feature parity while retaining search platform interchangability can only get harder. To this end, EPiSolr and EasySearch are actually quite different. EPiSolr gets all of its search functionality from Solr, and is geared to leveraging enhancements the community make to that search stack. In turn, attention has been paid into a pluggable architecture which allows easy customisation and control of how artefacts are manufactured, represented and indexed in the search engine.
  4. This is a great piece of work and something we considered providing as an EasySearch option. However, we resisted making Solr the default indexer because of the Java dependencies and additional configuration. EasySearch does support Faceted Search (the controls are all in there but documentation is being finished. There is a sample project from Mari at BVNetwork that demonstrates all the faceted controls).
Post a comment    
User verification Image for user verification  
LBi

About me

LBi is an international full service digital marketing agency that provides breadth of capability, ranging from high-level strategy to heavyweight technology implementation and award-winning media. LBi employs over 1,450 professionals located primarily in the major European and American business centers, such as Amsterdam, Atlanta, Berlin, Brussels, Copenhagen, London, Madrid, Milan, Mumbai, Munich, New York, Paris and Stockholm. LBi reports annual sales of about € 160 million and had in September 2007 a combined market capitalization of € 294 million.

LBi's market leadership is based on our ability to bring together a series of disciplines. By linking marketing and communications across all digital touch points, our approach helps strengthen the position, efficiency and market share of our clients.

LBi is listed on Euronext in Amsterdam and as a Mid Cap company on the OMX Nordic Exchange in Stockholm (symbol: LBi).

Syndications


Archive


Tag cloud

EPiTrace logger