When a 404 Not Found should be a 404 Not Found

by: Svante Seleborg

The ASP.NET standard behavior with custom error pages is dubious, at best. When a page is not found, it does not say so. It says that the page has been moved (302), and then it typically says either that the page now indeed was found (200) at the new location, or it says that the redirection to the new location is wrong (404). Humans will read the content on the displayed page, but that status is wrong and will potentially confuse search engines.

If a page is not found, that page request should return a 404 Not Found, nothing else.

Unfortunately, this is non-trivial to achieve in an EPiServer environment with friendly URL enabled.

The solution

The solution requires quite a few steps. To get this to work right, you need at least to:

  1. Configure IIS 404 handling to refer to a page that literally does not exist, for example ThisFileDoesNotExist.aspx.
  2. Write code to hook Application_Error in Global.asax.cs or equivalent.
  3. Write a special page type to display errors, with a very special constructor.
  4. Write a custom HtmlRewriteToExternal class and a custom UrlRewriteProvider.

IIS

Because a request to a friendly URL that does not exist will be passed back to IIS, presumably because ASP.NET will return a status stating that it did not handle the request, IIS must also be configured. Another way is to write your own catch-all HttpHandler, but the problem there is that it's hard to append a catch-all to the HttpHandler chain without duplicating the entire chain in your own web.config.

Anyway, once it gets passed back to IIS, it'll be treated as a 404. To get back to ASP.NET and EPiServer to serve the friendly 404 page, configure IIS with a truly non-existing URL that is mapped to ASP.NET. For example a non-existing .aspx page, "ThisFileDoesNotExist.aspx".

This page will be called with a query string parameter consisting of the error code and a semi-colon followed by the URL encoded original URL, so you can use this in your error handling to determine the actual URL. (For the curious, this behavior was actually the basis for friendly URL handling in EPiServer 4).

Application_Error

Application_Error is called whenever there's an un-handled exception in your application. Here you should do approximately the following:

  • Check Server.GetLastError(). If it's a HttpException, get the error code from it and do any special handling you need. Finally you call your friendly error page using Server.Execute (not Server.Transfer, see below), set the Response.StatusCode and you're done. Here is also the place where you can check for the query string parameter indicating that you've come via IIS 404-handling.
  • If it's some other error, you probably want to set Response.StatusCode to 500, and then Server.Execute a static HTML page stating that an unexpected error occurred.

There are other equivalent ways to hook the error event, use whatever method appeals.

Server.Transfer/Server.Execute problem

Everything seems straightforward until you try it. In this case, it all blows up in the Server.Execute (the same for Server.Transfer) call with an exception in ProcessRequestInternal. This is apparently due to some dependency in one of the SimplePage page extensions. This might be a bug in EPiServer, or at least fixable, but I have not had the to time to reflect that deeply on the issue. So you need to disable it. Problem is, this is during object construction, so it must be done in the constructor. Even worse - you should probably not disable this page extension in edit mode...

public partial class MyPage : SimplePage
{
  public MyPage() : SimplePage(0, HttpContext.Current.Items["InErrorHandler"] == null ? 0 : PageExtensions.SaveCurrentPage.OptionFlag)
  {
  }
}

To get this to work, you'll have to also set Items["InErrorHandler"] before calling the error page from your Application_Error code. This will disable the troublesome page extension when the page is rendered as the result of an error, but will leave it in place in other case such as when editing the page.

The friendly URL issue

No it's all done, right? Sorry... It'll seem ok until you try a not-found on a friendly URL with for example a language prefix and you'll find all your style sheets gone. This is because the friendly URL rewriter will get confused when trying to rewrite relative URLs (those not starting with http(s): or /) relative to a URL that does not exist. (This is probably an EPiServer bug, also probably originally introduced by yours truly. Sorry about that.)

The quick and easy solution is to make EPiServer rewrite all URLs to be root relative (start with a slash) instead. To do this, you'll have to write a small custom HtmlRewriteToExternal class, and to get that to be used, you'll have to make a small custom UrlRewriteProivder. Something like this:

public class MyFriendlyUrlRewriteProvider : FriendlyUrlRewriteProvider
{
    private class HtmlRewriteToRootRelativeExternal : HtmlRewriteToExternal
    {
        protected override bool HtmlRewriteUrl(UrlBuilder int, UrlBUilder ext, UrlBuilder url, Encoding enc, out object obj)
        {
            bool isModified = false;
            PageReference pr = PermanentLinkUtility.GetPageReference(url);
            isModified = Global.UrlRewriteProvider.ConvertToExternal(url, pr, enc);
            isModified |= url.Rebase(int, ext, UrlBuilder.RebaseKind.RootRelative);
            obj = pr;
            return isModified;
        }
    }
    public override HtmlRewriteToExternal GetHtmlRewriter()
    {
         return new HtmlRewriteToRootRelativeExternal();
    }
}

Don't forget to fixup your web.config to refer to your shiny new FriendlyUrlRewriteProvider.

What was learned

One major thing was an unexpected behavior of Server.Transfer, which is the reason for using Server.Execute. Apparently (possibly not under all circumstances, I have not had the time to really ascertain this), Server.Transfer behaves much like Response.Redirect in that after the handler called by Server.Transfer has finished executing, it short circuits remaining events in the HttpApplication pipeline. This means that PostRequestHandlerExecute, ReleaseRequestState, PostReleaseRequestState, response filtering, UpdateRequestCache and PostUpdateRequestCache events will NOT be raised! This messes up all kinds of things, but most importantly it means that EPiServer rewriting of outgoing HTML (the filter is hooked up in PostRequestHandlerExecute, and is implemented as a filter) will not happen. Too bad.

Recall that if you skip UpdateRequestCache the page won't be eligible for output caching. If you skip ReleaseRequestState, you'll probably loose session state etc. It's simply not a good idea to jump to the end without doing these pit stops.

Disclaimer

These are essentially notes from memory. Details may be wrong, and something may be missing. But it should enable you to get started and finished quicker than I did...

If I missed the obvious trivial solution to the issue, please let me know!

02 October 2008

Tags:


    Comments

    1. 1) Upgrade to R2
      2) Migrate to IIS 7
      3) Make sure site setting urlRebaseKind is set to "ToRootRelative"
      4) Make a special page type for 404 that sets status code to 404 in codebehind
      5) In IIS Manager set error page 404 to "Execute a URL on this site" to the friendly URL of your 404 page

      Voila ;-)
    2. Thanks for pointing out the difference with IIS 7. That of course is also the clincher for many of us who are stuck with IIS 6 for the forseeable future. ;-( It's also nice to know that the handling of not-found friendly URLs has been improved in R2 to take advantage of the integrated pipeline, since that is one of the gotchas in IIS 6 - with wild-card mapping enabled for friendly URL, IIS will never realize that a given page does not exist and that's what causes the need for the trick with the 404-setting in IIS in the blog post above.
    3. Yes, the integrated pipeline is really a perfect match for friendly URLs (and EPiServer CMS in general).
    4. There's a little confusion as to what solution is best for the IIS version one's using. I'm stuck on IIS5 in dev (IIS 6 in production) and am getting odd behaviour in my global.asax solution in the application_eror method. I expected it to never go into application_error() with a page been served ok aka 200. Some clarity on the web.config custom errors section(Which I think is ignored with episerver) will be grand too. IIS settings in the errors tab is also an area that is being ignored accordng to me? Any direction on this please
    5. Just in case it helps anybody: private class HtmlRewriteToRootRelativeExternal : HtmlRewriteToExternal { Should be: private class HtmlRewriteToRootRelativeExternal : FriendlyHtmlRewriteToExternal { Both namespaces are valid, but I'm using r2, and the friendly one will expose your virtual method you need.
    6. In CMS 5 R2 I get the following error MyFriendlyUrlRewriteProvider.HtmlRewriteToRootRelativeExternal.HtmlRewriteUrl(EPiServer.UrlBuilder, EPiServer.UrlBuilder, EPiServer.UrlBuilder, System.Text.Encoding, out object)': no suitable method found to override Is this the right inheritance / override?
    Post a comment    
    User verification Image for user verification  
    Svante Seleborg

    About me

    When I'm not riding my bike, I keep fairly busy trying to make a living as a self-employed programmer. I'm also an EPiServer alumni, having participated in the architecture and development of EPiServer CMS - especially Friendly URL and permanent links.

    Syndications


    Archive


    Tag cloud

    EPiTrace logger