Influenza A (H1N1) is like Computer Virus
For those not familiar with molecular biology, DNA is information-equivalent to RNA on a 1 to 1 mapping; DNA is like a program stored on disk, and RNA is like a program loaded into RAM. Upon loading DNA, a transcription occurs where “T” bases are replaced with “U” bases. Remember, each base pair specifies one of four possible symbols (A [T/U] G C), so a single base pair corresponds to 2 bits of information.
Proteins are the output of running an RNA program. Proteins are synthesized according to the instructions in RNA on a 3 to 1 mapping. You can think of proteins a bit like pixels in a frame buffer. A complete protein is like an image on the screen; each amino acid on a protein is like a pixel; each pixel has a depth of 6 bits (3 to 1 mapping of a medium that stores 2 bits per base pair); and each pixel has to go through a color palette (the codon translation table) to transform the raw data into a final rendered color. Unlike a computer frame buffer, different biological proteins vary in amino acid count (pixel count).
To ground this in a specific example, six bits stored as “ATG” on your hard drive (DNA) is loaded into RAM (RNA) as “AUG” (remember the T->U transcription). When the RNA program in RAM is executed, “AUG” is translated to a pixel (amino acid) of color “M”, or methionine (which is incidentally the biological “start” codon, the first instruction in every valid RNA program). As a short-hand, since DNA and RNA are 1:1 equivalent, bioinformaticists represent gene sequences in DNA format, even if the biological mechanism is in RNA format (as is the case for Influenza–more on the significance of that later!).
OK, back to the main point of this post. The particular RNA subroutine mentioned above codes for the HA gene which produces the Hemagglutinin protein: in particular, an H1 variety. This is the “H1″ in the H1N1 designation.
If you thought of organisms as computers with IP addresses, each functional group of cells in the organism would be listening to the environment through its own active port. So, as port 25 maps specifically to SMTP services on a computer, port H1 maps specifically to the windpipe region on a human. Interestingly, the same port H1 maps to the intestinal tract on a bird. Thus, the same H1N1 virus will attack the respiratory system of a human, and the gut of a bird. In contrast, H5 — the variety found in H5N1, or the deadly “avian flu” — specifies the port for your inner lungs. As a result, H5N1 is much more deadly because it attacks your inner lung tissue, causing severe pneumonia. H1N1 is not as deadly because it is attacking a much more benign port that just causes you to blow your nose a lot and cough up loogies, instead of ceasing to breathe.
Researchers are still discovering more about the H5 port; the Nature article indicates that perhaps certain human mutants have lungs that do not listen on the H5 port. So, those of us with the mutation that causes lungs to ignore the H5 port would have a better chance of surviving an Avian flu infection, whereas as those of us that open port H5 on the lungs have no chance to survive make your time / all your base pairs are belong to H5N1.
So how many bits are in this instance of H1N1? The raw number of bits, by my count, is 26,022; the actual number of coding bits approximately 25,054 — I say approximately because the virus does the equivalent of self-modifying code to create two proteins out of a single gene in some places (pretty interesting stuff actually), so it’s hard to say what counts as code and what counts as incidental non-executing NOP sleds that are required for self-modifying code.
So it takes about 25 kilobits — 3.2 kbytes — of data to code for a virus that has a non-trivial chance of killing a human. This is more efficient than a computer virus, such as MyDoom, which rings in at around 22 kbytes

