Friday, May 25, 2012

Fixing Sitefinity 3.7 URL handling–Part 2

A while back I wrote about URL handling in Sitefinity 3.7. The default internal URL handling is not working well with IIS 7 Rewrite module. Over the past weeks I’ve had more problems with making Sitefinity 3.7 behave correctly, this time its about SEO friendly 404 handling.

How ASP.NET handles errors

Sitefinity is based on ASP.NET and by default any ASP.NET application will handle errors by either showing a generated error page (the dreaded Yellow Screen Of Death – YSOD) or by redirecting to a predefined error page. This is all configured in web.config:

<customErrors defaultRedirect="Error.aspx" mode="On">
   <error statusCode="404" redirect="404.aspx"/>
</customErrors>

This bit of XML instructs ASP.NET to redirect users to Error.aspx when an (unhandled) error occurs, unless it’s a 404 error in which case the user should be redirected to 404.aspx.

SEO friendly 404 handling

Redirecting is an acceptable way to help your visitors explain what’s going on. Your users get a decent explanation and can continue on their way. If you’re using a CMS like Sitefinity you can manage the 404 page within the CMS and even drop in a smart control that offers relevant suggestions based on the requested URL.

Search engines indexing the site will however have difficulty understanding what is going on. A conversation between a crawler like the Google bot and your site would look like this:

image

The conversation ends with a successful HTTP 200 status code, indicating to the search engine that the page was found… The crawler will even index the 404 page unless it’s explicitly told not to via meta tags or robots.txt.

In order for search engines to understand what’s going on the conversation should look like this:

image

Fortunately the customized Sitefinity CMS module from my previous post can provide us with the hooks needed to set this up.

Step 1 – intercept errors

Since the CMS module is a HttpModule, it can register for the ASP.NET Error event. If that event occurs we can check the type of error and lookup where to get the alternate content from the customErrors section in web.config.

var context = HttpContext.Current;

var error = context.Server.GetLastError() as HttpException;
if ( null != error && error.GetHttpCode() == 404 )
{
  // use the web.config custom errors information to 
  // decide whether to redirect
  var config = ( CustomErrorsSection )WebConfigurationManager
                  .GetSection( "system.web/customErrors" );
  if ( config.Mode == CustomErrorsMode.On ||
       ( config.Mode == CustomErrorsMode.RemoteOnly 
                        && !context.Request.IsLocal ) )
  {
    // redirect to the error page defined in web.config
    var redirectUrl = config.DefaultRedirect;
    if ( config.Errors["404"] != null )
    {
       redirectUrl = config.Errors["404"].Redirect;
    }
    // now render the content

Step 2 – render alternate content

This is where things get interesting. In IIS7 with Integrated Pipeline mode there’s a Server.TransferRequest method that makes it easy to do an internal redirect. It’ll do a full run of the request pipeline. TransferRequest will simulate an actual request and you can specify any parameters you want to pass which will be available in the request through the HttpContext.Params collection.

If not using Integrated Pipeline mode, the Server.Transfer can do an internal redirect. The redirected request will however not go through the full ASP.NET pipeline and vital events will not fire. Most notably, some events used by Sitefinity to resolve the page that needs to be rendered. The code below works around that by setting up the request the same way Sitefinity would before handing it off to the main entry point.

Both methods will however discard the HTTP status code from the original request. To work around that the status code is reset in the transferred request.

if ( HttpRuntime.UsingIntegratedPipeline )
      {
         context.Server.TransferRequest( 
                      redirectUrl, true, "GET",
                      new NameValueCollection { { "__sf__error", "404" } } );
      }
      else
      {
        var context404 = 
          CmsSiteMap.Provider.FindSiteMapNode( redirectUrl ) 
            as CmsSiteMapNode;
        if ( null != context404 )
        {
          context.Server.ClearError();
          context.Response.StatusCode = 404;
          CmsUrlContext.Current = context404;
          context.Items["cmspageid"] = context404.PageID;
          context.Server.Transfer( "~/sitefinity/cmsentrypoint.aspx" );
        }
      }

In integrated pipeline mode you have no control over the executing request. So to rest the HTTP status code it’s passed to the transferred request using a custom header. In the PostRequestHandlerExecute event handler in the Cms module the header is picked up and used to alter the status code:

private void PostRequestHandlerExecute( object sender, EventArgs e )
{
   var context = HttpContext.Current;
   // Set the error code passed in the headers when TransferRequest was invoked.
   var error = context.Request.Headers["__sf__error"];
   if ( null != error && context.Response.StatusCode == 200 )
   {
      int errorCode;
      if ( Int32.TryParse( error, out errorCode ) )
      {
         context.Response.StatusCode = errorCode;
         context.Response.TrySkipIisCustomErrors = true;
      }
   }
}

Full code and installation instructions available on GitHub.

This code has been tested with Sitefinity 3.7 SP4 and is in use on production systems.

3 comments:

mshmsh5000 said...

Marnix, thanks so much for this post -- I'm wrestling with some related issues (getting an existing SF 3.7 app to run in a subdirectory, for instance), so looking forward to integrating your path-handling improvements.

After running through the installation steps from Github, I'm seeing this runtime error:

"Could not load file or assembly 'Alanta.Sitefinity' or one of its dependencies. The system cannot find the file specified."

I also noticed these in your sample config: http://screencast.com/t/3raccNy33

Even after fixing these references, though I get the "Could not load" error. My sandbox config looks like this: http://screencast.com/t/BusT3HVM

Am I missing something obvious? Thanks for any help.

Marnix said...

Thanks for spotting the typo's. I'll fix them asap. Depending on where you put the code, the exact name for the module in the web.config file may differ.

Assuming you added the code to an assembly that you've dropped into the /bin folder you should use the full type name (including the namespace) followed by a comma and then the name of the assembly that holds the type.

If however you dropped the source code in the App_Code folder of a site, you should only specify the full type name with namespace, not the assembly name.

Let me know if that helps.

mshmsh5000 said...

That was the trick! You're right -- probably worth posting in the config notes for posterity (and so you don't have to answer this question again).

Thanks again for the code and assistance.