# Converting words to DNA segments

## The idea

By reading about base64 encoding, I learned how ASCII or utf-8 characters can be converted to binary, and then to other characters. It immediately occured to me that this could be done for other sets of characters, not just base64. For instance, I remember learning, in high school biology, about the human genome project and the structure of DNA. My thought was this: What if I set up the four nucleotides found in human DNA (Guanine, Adenine, Thymine, and Cytosine) as a zero-indexed array, and converted ASCII characters to them? This made sense because ASCII characters are all a byte long (8 binary digits, or bits). Since there are four nucleotides, a character conversion would result in four times the length.

## The example

Let's take the word man, and get the ASCII values for each character:

m = 109, a = 97, n = 110

Next, we'll convert these values to a binary string. As alluded to, this string will be three bytes long, 8 bits for each character:

01101101 01100001 01101110

To prepare for conversion, we'll take pairs of bits by splitting each byte into four sets of digits. This means that 01101101 becomes 01, 10, 11, 01.

Now, here's where it gets fun.

Let's take the four nucleotides and order them (a bit arbitrarily, I admit) by mass, in descending order:

• guanine: 151.126 g/mol
• thymine: 126.113 g/mol
• cytosine: 111.102 g/mol

We can throw these into a zero-indexed array:

• [0] => G,
• [1] => A,
• [2] => T,
• [3] => C

What this allows us to do is to take any two-digit binary number and find its corresponding nucleotide. To use our previous example:

• 01 becomes 1 which becomes A
• 10 becomes 2 which becomes T
• 11 becomes 3 which becomes C
• 01 becomes 1 which becomes A

Therefore, the character m can be represented, through this method, as "ATCA"

#### Breakdown

ASCII Character m a n Decimal value 109 97 110 Binary value 01101101 01100001 01101110 DNA segment ATCA ATGA ATCT

#### Just for fun

Once converted, you can use Wolfram Alpha to look up words to see if their nucleotide equivalents appear as a sequence in the human genome. Lookup "man" as a genome sequence on Wolfram Alpha

## My takeaway

This whole experiment has no real value in terms of DNA or whether words "show up in our bodies" or not. This is merely an example of how binary works, and further, how interchangeable slices of our universe can be. I think it's really neat that God has written math into nature, and given us the knowledge and understanding to use math to read it. I'm thinking that this idea will continue to roll around in my mind a bit, and maybe lead to something more compelling, but it was a fun thought to play with.