Aria Stewart (aredridel) wrote,
Aria Stewart
aredridel

Efficiently implementing content-negotiation

The only feature of Apache that I miss using Lighttpd is content negotiation.

In a nutshell, content negotiation takes an abstract resource URL like http://example.org/2005/chart and maps it to the files on the filesystem based on the available files and their mime-types, and the mime-types in the requestor's Accept: header.

Given that URL, an Accept: header suggesting image/svg+xml; q=1, image/*; q=0.5 and the files /www/example.org/2005/chart.png and /www/example.org/2005/chart.svg, the server would see that there is a image/svg type file, which matches the highest preference, and return that along with a Varies: Accept header.

The efficiency problems come from needing to know the available files and their mime-types. At the most efficient, an expensive scan for available files will happen for one hit, and be cached for subsequent hits. However, cache consistency is a difficult problem, and many of the solutions are as inefficient as no caching at all. Very recent linux kernels support the inotify mechanism which would work to monitor efficiently and keep the cache consistent, but it's not a generally portable solution.

The simplest implementation would take the URL, and check to see if it's immediately satisfiable — this is the same efficiency as normal serving, without content-negotiation. If it's not found, then ir must perform a directory listing (one open call, some read calls). This gets expensive for huge directories. (Directories of over 1000 files, though the expense depends on the type of filesystem). Candidates are selected, mime-types mapped, and selected according to the criteria in the HTTP spec. Unless there are extremely many alternatives or an absurdly large Accept: header, computing this isn't computationally intensive, on the order of O(m * n).

However, to send Content-Length: headers, at least one stat() call must be made, and to handle dangling symbolic links, a stat() for every file under consideration (though since dangling links are an edge case, this could be implemented as a fallback, not normal operation.).

The biggest issues are the ones dealing with unusually large directories, where a linear scan of the listing can take a long time, and if caching is performed, how to keep cache consistency and still gain from the cache.

Thoughts are always welcome. I'll probably implement this in Lighttpd at some point.

Tags: unix, web
Subscribe

  • (no subject)

    You do occasionally visit Boston Public Library, yes? If not, get on it! You were raised in and on libraries. They are in your blood! You…

  • (no subject)

    "I had never been in a room of people who were going to say 'yes' to me before." My friend and I crammed into a rush hour crowded train…

  • Recipe: Storm in the Garden

    Recipe: Storm in the Garden Ingredients 10 ml lavender vodka 10 ml orange vodka 10 ml hibiscus vodka 200 ml ginger ale ice…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 8 comments