Sunday, December 30, 2018

Information Science: What do Digital and Analog Mean? It's Fairly Simple Really...

Motivation

People frequently use the word "digital" to describe anything that's modern.  It means something specific, and it's pretty easy to learn.  And learning is cooool.

"Digital"

In modern society, "digital" is often used in non-scientific ways from "digital music" to vague sentiments like "the Digital Age."  Digital and analog have simple meanings, and neither has anything to do with computers.  Digital simply means you're using a set of values that you know the exact values of.  In computers, that set of values is 0 and 1.  It is a discrete set of values.

"Analog"

Analog simply means you're using the full, continuous range all values between your minimum and maximum.  There are an infinite number of values between 0 and 1.  For example, 0.5 and 0.6.  But I can "go between" these numbers with the value 0.55.  and I can "go between" 0.55 and 0.6 with 0.575.  I can keep doing this infinitely.  Analog systems do not measure, track or store discrete values.

Examples

The simplest example of this I can think of involves musical instruments.

Imagine a trombone: you can play an E by holding the slide in a certain position.  To play an F, you move the slide a little closer in, but if you move it a little less than you should, you're between the two notes.  Since you're not playing an E or an F, it's probably going to sound bad because you'll be flat.  You can play an E and an F, but you can also play the full, infinite range of pitches between those 2 notes.

On piano however, you press the E key, you get an E, you press the F, you get an F.  You cannot play the pitches between those notes.  That's nice because it's one less thing to worry about... but you also can't play the "womp womp" sound.

The trombone is analog and the piano is digital.

A few more quick examples are a full-on rainbow, versus a simple rainbow:


And slider belt versus a belt with holes:

The Physical Medium vs How It's Used

In the case of the belts above, you might notice something: The "digital" belt with the holes cannot be used in an "analog" way.  The "analog" belt however... if you used white paint to make lines on it every inch or so, and you chose to only use the belt at those increments, you would effectively be using the "analog" slider belt in a "digital" way.  In fact this is true of cassette tapes (think 1980s).  Cassette tapes were primarily used to store music, for example MC Hammer's "Too Legit to Quit," but they were also used in computer systems (Atari, Commodore, etc.) to store binary (digital) data.

I used them some in the 80s and was surprised how well they worked.  I'm guessing the error rates on these tapes wasn't too bad because with binary data, only having 2 values to represent meant you could make it pretty clear which was which.  It wasn't like playing an E versus an F on a piano (most humans couldn't tell you which was which) it was probably more like playing the very lowest note on a piano versus the very highest (most humans could easily tell you which was which).

Digital as "Better"

The reason people think of digital as better than analog is because you can make a perfect copy of it.  A cassette tape with music on it is analog.  If you recall, every time you made a copy of one, the quality deteriorated a little.  If you made a copy of your friend's tape, then another friend made a copy of your copy, it sounded noticeably worse than the original.  An analogy for modern times might be if you take a picture with your phone, and then someone takes a picture of your phone's screen, and then someone else takes a picture of their phone's screen.  That 3rd picture's not going to look too good.

If you made copies with a low quality tape recorder vs a higher quality one, it made a difference.  It might even make a difference if you made the copy in a room that was really hot or humid.  This is because the data on the tape was analog, and the tape recorder reading that tape is not precise enough to read the data exactly.  In the same way, while the best trombone player in the world is very very close to hitting that E on their instrument, their finger placement will probably actually be sharp or flat by at least a nanometer or two...

However, if you make a copy of a music CD, you can make a copy of that copy, and a copy of that copy, etc. and it will be an exact copy of the original.  The original and all of the copies are perfectly identical.  The modern analogy would of course be just emailing the picture to your friend so that they have a copy.

Why We Care

Basically, it's simply much easier to make an exact copy of something that is stored using discrete values.  For computers to execute computer programs exactly the same way every time, digital is the way to go.  Most of the time, when a program crashes or doesn't do what you want, it's not running it wrong.  It's running the program - with all of its bugs - exactly as it was written.

Friday, December 28, 2018

Information Science: Information is Meaningless Without Context - Now With Puppies!

Motivation


Most simple text editors will open any file and try to display it as text.  This phenomenon is where this post came from... a junior coworker asked why an .exe looks like garbled text when viewed with a text editor.

The answer is context.

What does "10110101101000101110101110" mean?  


Well it depends on who you ask.
Let's put those exact bits into a file called:
  • my_file.jpeg and open it with Photoshop - it's going to be interpreted as an image
  • my_file.mp3 and open it with iTunes - it's going to be interpreted as sound
  • my_file.txt and open it with Notepad - it's going to be interpreted as text

That's oversimplifying it because some standard file types such as jpegs and mp3s must always start with a "header" that basically says "this is an mp3," but let's ignore that for a moment.  Let's just talk about the 1s and 0s that represent the song in the mp3 file.

How many puppies does "11" represent - A, B, or C?

A


B


C

The answer again is context.
  • If "11" is read as decimal (base 10, which we all use in everyday life), then it's A.
  • If "11" is read as binary (base 2), then it's B.
  • If "11" is read as octal (base 8), then it's C.

For a quick read on this, see Information Science - A Simple Explanation of Base Number Systems: Binary, Decimal, Octal, Hexadecimal

So when a program like iTunes sees a string of 1s and 0s it's going to read it as sound, and if it's just a random string of 1s and 0s, it'll be pretty noisy.  An .exe file is supposed to be read by the operating system, so if you open it with a text editor it tries to turn the 1s and 0s into a bunch of letters, so it just looks like garbled text.

Other examples

Human Languages

To think of this yet another way, I could write "Plato" on a piece of paper and give it to someone who speaks English and they'd probably interpret it as the name of the Greek philosopher.  If I give that same piece of paper to someone who speaks Spanish, they'd probably interpret it as a flat round thing you eat food off of ("plato" is Spanish for the English word "plate").

Character encoding schemes

Similarly to the decimal vs binary example, we have different standards that make a specific set of bits represent different readable symbols.  Some examples you may have heard of are ASCII and ANSI.  Let's take this set of bits:
01001000 01000101 01011001 00100001

ASCII was thought of long ago as a simple character set with a small range of 00000000 to 11111111 (a total of 255 unique characters), so each of the 4 chunks above represent a character... in fact it represents "HEY!"

Unicode is intended to be the last character set we'll ever need, so it has a huge range of 1,114,112 unique characters, currently only around 100,000 of which are used.  It's a much more complex system, including the Latin, Greek, Cyrillic, Chinese, and Thai alphabets (along with others) and all kinds of symbols for music, math, just about anything you can think of.  Suffice to say, if you give a long string of 1s and 0s, interpreting it as ASCII and Unicode will likely not yield the same result.

To sum up...


and reiterate (to a hopefully tediously clear degree): information in a file is just like information in the real world - it can only be correctly understood if the context is correctly understood first.

Information Science - A Simple Explanation of Base Number Systems: Binary, Decimal, Octal, Hexadecimal

Motivation


I Googled for a quick, simple explanation of this and was very surprised when I failed to find one, so here we go.  Oh, and the whole puppies thing is a reference to this post.

Bases


The base is just defined by how many characters we have to represent numbers.
  • In binary (base two), we use characters 0, 1
  • In octal (base eight), we use characters 0, 1, 2, 3, 4, 5, 6, 7
  • In decimal (base ten), we use characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
  • In hexadecimal (base sixteen), we use characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
In decimal we can represent zero to nine puppies using only one character.  At that point if we add another puppy, we have to move over a spot and start over with the one character, so we count:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 08
  • nine 09
  • ten 10  --  here we start reusing characters...
  • eleven 11
  • twelve 12

but for binary we only have two characters with which to represent puppies, so we have to move over a lot quicker:
  • zero 000
  • one 001
  • two 010  --  quickly we start reusing characters...
  • three 011
  • four 100
  • five 101
  • six 110

similarly for octal, we count:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 10  --  here we start reusing characters...
  • nine 11
  • ten 12

and finally, for hexadecimal, we have to "make up" some extra characters that we're not used to (since most humans are used to the decimal, or base 10, system), so someone long ago chose the letters A-F:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 08
  • nine 09
  • ten 0A
  • eleven 0B
  • twelve 0C
  • thirteen 0D
  • fourteen 0E
  • fifteen 0F
  • sixteen 10  --  finally we start reusing characters...
  • seventeen 11
  • eighteen 12

Hexadecimal can be a little weird to think about but try thinking about it this way:  if someone created five hundred unique characters that we used to count things, we could just point to a herd of, for example, four hundred and eighty puppies in a park, and you could use a single character to represent that number. That character would uniquely identify the number of puppies you see sniffing and playing and rolling in the dirt.  My guess is that you would spontaneously explode from how cute it was.

A generic formula for figuring out what a number means in your base


So the generic formula for figuring out how many puppies we're talking about is to take the base multiplied by itself the number of times we've had to "move over a spot" and then multiply it by the number in that spot.

So "432" in decimal:
  • ten * ten * four +
  • ten * three +
  • two =
  • Four hundred and thirty two

"432" Octal:
  • eight * eight * four +
  • eight * three +
  • two =
  • Two hundred and eighty two

"2BC" Hex:
  • sixteen * sixteen * two +
  • sixteen * eleven +
  • twelve =
  • Seven hundred

"1101" Binary:
  • two * two * two * one +
  • two * two * one  +
  • two * zero +
  • one =
  • Thirteen

The formula can also be expressed as "the base raised to the power of the number of times you've had to "move over a spot."  It's just the same.  For example:

"432" in decimal:
  • ten to the second power * four +
  • ten to the first power * three +
  • ten to the zeroth power * two =
  • Four hundred and thirty two