Friday, February 23, 2007

The Rise of the URL Fascist Regime

Some time within the past couple of years, something really strange happened. Most of my web developing friends turned into a bunch of very vocal url fascists.

Urls are suddenly divided into two sharply divided groups. The urls that fit their views and the urls that do not. If your urls do not fit into the group supported by the fascist leaders, it doesn't matter what the reasons are for your url design or how great your system is. Your urls suck, thus your system sucks and you as a developer are at best hopelessly misguided.

So what is this all about?

The people I'm referring to are not graphic designers who do a bit of web development. The leaders of this regime are all programmers who feel right at home in emacs and vi. The natural assumption would thus be that this is one of the regular crusades for following standards, but thats not it at all... this is not about making urls more technically functional. This is about making the urls look pretty, thats actually what the whole topic is called - pretty urls. Not readable urls, not memorizable urls, not semantic urls, not understandable urls, not hackable urls, not short urls.... but pretty urls.

I decided to ask around a bit, to see if I could better understand what exactly divides the ugly from the pretty. It turns out that there are varying opinions about what the ultimate pretty url looks like, but everybody seems to agree on what an ugly url looks like. One of my friends gave me the following example.

http://foo.bar/cmssystem.php?page=foo&object=bar&object_id=123123&PHPSESSIONID=123123123123123&showOnlyFemales=YESPLEASE&username=FUNKY&password=MYASS&submit.x=33&submit.y=453

Ok, even I can follow that this is not pretty. I would also say that its not readable, memorizable or understandable either. However, I do think it has some good functional sides which I will return to later.

So, I tried to create some different examples of urls which I passed around to get a prettiness rating and I finally seemed to figure out the primary difference between a pretty and an ugly URL. Take a look at the following two URLs.

http://foo.bar/profile?id=12345
http://foo.bar/profile/12345/

The ruling from the fascist regime is that the first url is ugly and the second one is pretty. My first instinct was to conclude that this was a question of keeping the urls as short as possible, but it turns out that thats not it at all. The following url combines the two by adding the name of the parameter as a part of the url path.

http://foo.bar/profile/id/12345/

Almost everybody agreed that this was the prettiest of them all. Since this is also the longest of them all, that rules out the short is better theory. As can easily be deduced from the fact that the fascist leaders didn't mind the actual word id being part of the path, its not a question of exposing the names of the variables to the users. Its all a question of keeping the ? = and & characters out of the URL. Apparently these are really ugly characters.

Here are a few more examples of pretty urls given to me by people from the new regime. Notice how they are not designed to be short. Instead they do everything possible to keep all parameters in the path.

http://foo.bar/2007/feb/10/danielwinsthelottery/
http://foo.bar/types/hatchbacks/ford/fiesta/

Notice another thing here. None of these refer to data using an id. They use nearly readable names instead. It should be mentioned though that thats not a fixed rule within the pretty url regime, some people still use ids.

So, to sum it up: Pretty urls must do everything possible to avoid the regular query parameters and if it makes sense, they should be humanly readable.

Unfortunately, none of the believers in the new pretty url regime can actually give an explanation towards the evilness of the query parameters. Most people I asked could not come up with anything better than "its ugly". In fact, most of these people can't even give a good precise definition of a pretty url and even less an explanation towards why its good. In fact, the following three statements where the only direct statements I could get out of any of them (it should be noted that two of these statements where given to me in danish, so these are my translations).

  • Each element in the path should be a further refinement into the dataset.
  • The overall path should be humanly readable and understandable.
  • Query parameters should not be used for navigation.

Its not that I completely disagree with these ideas, but it seems that everybody has gotten so focused on the new and wonderful world made possible by brand new technology such as mod_rewrite (been there since 1996) that everything must now be done in this way or its just way too web1.0 to be acceptable.

So why am I making a big fuzz about this and calling half of my friends fascists? First of all, I'm calling them fascists because they consider the system to be more important than the individuals. That is pretty urls are more important to them than the actual websites. Secondly, I'm making a big fuzz because query parameters actually do have some good things going that shouldn't be left out just because you think a question mark or an equal sign is ugly. As long as you use proper parameter, they let the user know what the individual "magic" parameters actually are.

I'm not saying that you shouldn't put anything in the path. I am definitely all for the move from using the path to specify code into using the path to specify data. But religiously staying away from query parameters because they are ugly is just plain stupid. One of the uses for query parameters that I really like, is for view manipulation uses. For example to hold the offset and sort order of a table of data. Lets look at an example.

http://foo.bar/users?offset=300&sortby=username

You could put these parameters into the path. But by keeping them as query parameters it is clearly visibly to the user what they are and how they change as she works with the table. For more advanced users, this will also make it easier to manipulate the url directly.

So let me end this looong rant with the guidelines I use when designing urls for web applications.

  1. The path should be readable and specify data on code.
  2. Things that modify the view of the data should be placed in query parameters.
  3. Avoid using characters in id's that are easily confused such as 0, O, I, 1, l.

5 comments:

Anonymous said...

You should probably learn to overcome peer pressure. It's a whole lot easier than making superfluous changes that don't matter.

Anonymous said...

"pretty" urls don't over-expose the interal structure of your application and are also more portable, as in you can change that internal structure however you see fit, and not have you r site die from url incompatibilities. They're also better in terms of SEO, as they look "permanent" to search engines, whereas a ?parameter=laden&var=transient url literally tells search engines that they are not permanent, or at least session dependent.

Kasper Jeppesen said...

Thats not what I am against... as I write in the post, I am all for the transition to paths that point to data instead of paths that point to code. What bothers me its the way people reject any use of query parameters at all.

As long as you keep the base path of your url as a path to the data that will resolve and provide a view of the data your are fine SEO wise, but that doesn't mean you can't use query parameters for things such as view modifications.

Anonymous said...

Hey, you're right, some people go ape over the looks of a URL and don't know why you should use a URL vs. query parameters. It's a lot like people who just follow any religion without wondering how the rituals came about.

I'll give you a very simple explanation:

The URL specifies a resource, query parameters specify a question to that resource.

Now, the definition isn't crisp but if you divide things cleanly between stuff you serve and the queries made to that stuff then you get a very logical URL scheme. It's quite a bit like designing a well normalized databse schema, algorithm, or OOP architecture.

In your example, /profile/12345 is more like saying "I stored the thing named 12345 in the profile directory." Yet, /profile?id=12345 is more like saying "to get a profile, you ask the profile for the profile with a specific id". That's why it has ? mark on it, since you're asking /profile a question.

There's also an immediate practical application: caching. It's much easier to cache URLs than query parameters. URLs have a structure that matches most databases and file systems. Query parameters don't, so when you try to cache a link with a query, you don't know if that query is destructive, modifies things, returns the same results, etc. When you cache a URL it's assumed that it doesn't modify the resouce since it was a single resource acquired via GET.

However, like all these things the rules lawyers (who don't understand why the rules exist or when they don't apply) tend to go crazy and overly apply them. Whenever I come to a fuzzy part of this arbitrary distinction, I just side with whatever is simplest. Rules lawyers seem to just apply the rule all the time to be fashionable and "fit in" with other rules lawyers. They even go so far as to invent new rules and terms like calling HTTP request "methods" "verbs" instead.

The best way to combat these kinds of idiots is to find out why they insist on these rules, and then do a subversive version that follows the rules but doesn't follow them. For example, /profile/new/12345 should make them wonder for a while.

Anonymous said...

I'm with Zed on this. There is a distinction between a URL (which points to a resource) and the querystring (which passes the parameters to a query). The latter are transient, the former permanent.

As to overexposing the internal structure of your application - once someone has determined which bits of your path are actually querystring, they know just as much as before. The person you are baffling is the person who has to maintain the codebase afterwards...

In addition, you have closed off the ability to use certain subdirectory names, because they are identical to pseudo-querystring 'paths'.

Overall, an acceptable idea only where there are 'permanent' querystrings.