27 02 2013
Uwe | C#
Sometimes you have to work with alot of numbers and want to save them in a file. Especially when you want to save alot of small numbers, it is inefficient to use a whole byte, or worse an int.
When i started to write a program to search a specific pattern of moves played in a database of Go-Game-Records, i came across this problem. But at first, let us have a deeper look into bits and bytes. Most of you know 8bits are a byte and a byte is the smallest unit you can save on a harddisk. With 8 bits (datatype byte) you can use 256 different numbers. If you use 2 bytes (datattype = int16) its up to 65536 and with 4 bytes (datatype int) 4294967296.
So the first thing you should do is to use the right datatype for your number. If you use, for example numbers up to 1000 you can use an int or an int16 to store them. But int uses the double amount of space. If you have to save alot of them this realy does matter.
But what datatype do you use when you need numbers up to 3? The number 3 only uses 2 Bits, which means you can store 4 of your numbers into one byte (4 * 2bits = 8bits). Oh wait, you don’t have a datatype to do that, you have to use a byte for each of them. Which means your file will be 4 times larger then it has to be.
Thats exactly what happens in my pattern serahc program. A board in the game of Go (a.k.a. Baduk, Weiqi) has 19 x 19 (in total 361) intersections where you can play a move. Each intersection has 3 states: empty, black stone, white stone. And my initial database used 100.000 games. Now do the math :), ok ok , without optimization this database uses 34MB. When we find a way to save 4 numbers in one byte its only 9MB.
Now here is the way to do it: Bit shifting.
int partA = 3; //binary: 11
int partB = 2; //binary: 10
int partC = 1; //binary: 01
int partD = 2; //binary: 10
byte b = (byte)((partA << 6) ^ (partB << 4) ^ (partC << 2) ^ partD);
Bit shifting works in many programming languages and it is very fast. Never every use code that converts a byte into a string and parse the bits (recently seen in an encryption code!). How does it work? A number as bit-stream is writen from right to left. Which means when we use only numbers up to 3 then only 2 bits are used and there are right.
Now the code shifts these 2 bits of partA 6 steps to leftResulting byte 1 1 0 0 0 0 0 0
partB is shifted 4 steps to leftResulting byte 1 1 1 0 0 0 0 0
partC is shifted 2 steps to leftResulting byte 1 1 1 0 0 0 0 0
partD don’t need to be shifted at allResulting byte 1 1 1 0 1 0 0 1
As you can see all our bits are savely moved into one byte. It’s quite easy isn’t it?
Now you can work with this value instead of 4 single values.
How to get your values back? In may case, i didn’t need a way to decode a byte into 4 single values, because i just convertet my pattern in the same way and compare the resulting byte-values. This way i only have to compare 1 value instead of 4. Which increases the search by almost 4 times.
But maybe you want your values back. Here is the way to do it: You need to right-shift your byte to undo our left-shift. But there is one more thing, bit-shifting doesn’t delete any bits, which means we will have alot of bit we dont want. The unwanted bits can be removed with a binary AND expression.
int partA = (b >> 6 & 0x03);
int partB = (b >> 4 & 0x03);
int partC = (b >> 2 & 0x03);
int partD = (b & 0x03);
Here we shift the bits to right and use and &-Expression where only the last 2 bits are used.