Monday, February 26, 2007

Minimizing Id Size While Maximizing Readability

The last few posts have primarily been me bitching about things that annoy me, so I thought I would throw in a little happy-go-lucky constructive post with almost no bitching at all.

I mentioned in my last post about pretty urls that I always attempt to create my ids so they avoid using characters that are easily confused, such as I, 1, l, O and 0. Doing so is actually very easy and in this little post I will show you how I do it in Ravenous and even provide you with a ready to use implementation in Java.

The basic idea is that we convert integer ids into a string representation using only the letters and numbers that are not easily confused. For my implementation, I chose to use the following 31 characters: 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F, H, J, K, M, N, P, Q, R, S, T, U, V, W, X, Y, Z.

A positive side benefit from this, is that we will also be moving from a base 10 system to a base 31, which means that we in many cases will be seriously reducing the number of characters used for an id.

So lets start talking about the implementation by looking at the way we would convert a number into a string. First, lets build an array of chars containing our 31 character set.

char[] digits={'2', '3', '4', '5', '6',
               '7', '8', '9', 'A', 'B',
               'C', 'D', 'E', 'F', 'G',
               'H', 'J', 'K', 'M', 'N',
               'P', 'Q', 'R', 'S', 'T',
               'U', 'V', 'W', 'X', 'Y',

We use this array to directly lookup the character a number between 0 and 30 should be converted into. To convert a full number into a string, we simply keep dividing it by 31 and each time adding the character representing the remainder of the division to the string. When the remaining number is 0 we are done.

String convertToString(long num){
    StringBuilder buf=new StringBuilder();
        buf.append(digits[num % 31]);
    return buf.toString();

Easy right? Converting a string back into a number is just as easy. This time simply run through each character, convert the single character back into a number based on its position in our translation array and multiplies that with 31 raised to the power of the current character position. Sound confusing? Try imagining if we where working with a 10 based system. The value of the first character would be multiplied by 10 ^ 0 = 1, the next by 10 ^ 1 = 10, the next by 10 ^ 2 = 100 and so forth. Thus multiplying each character by the value of its position. This is the exact same thing we are doing in our base 31 system. So lets look at the code. I'm assuming that we have a method called charToInt that will convert a base 31 character into its int value from our table.

long convertToLong(String str){
    long num=0;
    for(int i=0;i<str.length();i++){
        char c=str.charAt(i);
    return num;

Easy right? So how about that charToInt method. Well, there are several strategies you can use for implementing it, but honestly the simplest is just to create a switch statement that utilizes the numeric value of the char to switch into a statement that returns the right value.

int charToInt(char c){
        case '2':return 0;
        case '3':return 1;
    return 0;

I have omitted the last 29 cases, but you should be getting the point by now :-). The last return is purely there to satisfy the Java compiler and to provide a semi rational behavior when (not if) someone attempts to feed it an invalid id string. If you want to optimize this further, you could create an array just like the one converting from numbers to characters. All you need to do is find the largest numeric value of those 31 characters, create an array of that size and fill in the right values.

So what do these ids look like and how much space will we save. Lets end this article with a set of sample ids, the strings they would be converted into and a rounded down percentage for how much space we are saving.

130 %
12E50 %
123Z533 %
1234UA325 %
123459VE20 %
123456GG6633 %
1234567SPFC328 %
12345678QQEDF37 %
123456789425QB633 %


Anonymous said...

Why are you not using your digits array in your charToInt method?

He who always bugs you

Kasper Jeppesen said...

I wanted to provide easy to understand sample code that wouldn't be completely ridiculous performance wise in case someone copy pastes the code and starts to use it as is :-)