Thursday, 15 December 2011

Reading an integer from a string

This is a topic barely breached in common howtos, and is a point for confusion for many. It seems to be a great secret only implemented buried deep within thousands of lines of code, and even there not very modularly, with support for not many bases besides the standard decimal, octal, binary and hexadecimal.

The algorithm


The algorithm is simple, but one I have tried to implement without success on several occasions and only today did I write a function to successfully and flexibly read an integral value from a string of characters.
   The function I wrote takes two arguments: an integer named 'base', obviously to notate the base of the integer to read, and a pointer to a c-style string of characters names 'context'.
   In the scope, the function initiates two variables: Value, defined as 0, and Offset. The function then sets Offset as 0 and enters a loop. Upon each iteration, it is checked whether the value pointed to by context iterated by Offset is not zero. if it is, the loop breaks. In the loop, the first thing performed is a comparison between the value pointed by context offset by Offset and a character between '0' and '9' (Or 48 to 57, in ASCII notation). If it is, the most important part of the algorithm is performed: Iterating Value.
   Value is assigned as itself multiplied by base plus context[Offset].
The pseudocode might look like

Value = Value * base + context[Offset];

Of course, this is notated in an arbitrary order of operations compliant semi-colon terminated language.
   The next part of the algorithm is an else if: if the current character was not between '0' and '9', i is compared to the ranges of 'A'-'Z' or 'a' 'z'. If it is, another operation is performed. It is checked whether the current character + 10 is between the base and 10; if it is not, an example case of this may be a hexadecimal number with the letter 'g' within it. The number parser is in no place to throw any form of exception, so the loop simply breaks, taking with it the value that has been accumulated so far. Is this comparison is true, the function moves on to the operation: Value is assigned as Value multiplied by base (as before), plus the current character, casted to an integer by a comparison between itself and 'Z'; if this is true, the character is uppercase, and therefore in the range 48-90 in ASCII notationand 48 is taken off the value, to reveal its offset from 0; if this is false, it is assumed the character is greater than 90 in ASCII notation,  and therefor an uppercase character in the range of 97 - 122 in ASCII notation; so 90 is taken off the value, to reveal its offset from 0. 10 is then added to the value given, and finally this is the value added to Value * base. The pseudo code may be:

Value = base * Value + (((context[Offset] <= 'Z') ? context[Offset] - 'A' : context[Offset] - 'a') + 10);


And there, any range of bases can be used, despite the Alpha vaues used in said base's notation, and despite cases. The next thing performed is an else case, where the loop simply breaks. After this, the function merely returns the value accumulated.

The code I used

Any avid programmer may be already implementing this as they read, but I felt it was necessary to include my own implementation, and keep in mind this is code in C++, and is merely my own take on the algorithm.


int readInt(int base, char* context){
    if (!context)
    return 0;


    int Offset;
    int Value = 0;


    for (Offset = 0; context[Offset]; Offset++){
        if (Char_Numeric(context[Offset]))
        Value = base * Value + (context[Offset] - '0');


        else if (Char_Alpha(context[Offset])){
            if ((((context[Offset] < 'Z') ? context[Offset] - 'A' : context[Offset] - 'a') + 10) > base - 10)
            break;
            Value = base * Value + (((context[Offset] <= 'Z') ? context[Offset] - 'A' : context[Offset] - 'a') + 10);
        }
        else break;
    }

    return Value;
}

No comments:

Post a Comment