Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

Wednesday, March 6, 2019

Language: Necessary and Sufficient

Motivation


This is a nice piece of language I learned in school long, long ago.  It's very useful when talking about requirements or specifications for a project, acceptance criteria for a user story, etc.  Bonus: it makes you sound smart.

Example


The long and short of it is that what you want in order to work efficiently and effectively is the necessary and sufficient information (or tools or whatever).  Since I like working on cars, I'll use the simple example of changing a tire.

If you have the following, you have the necessary and sufficient toolset:
  • a new tire
  • a lug wrench
  • a car jack

If you have the following, you have tools that are necessary, but not sufficient:
  • a new tire

If you have the following, you have tools that are sufficient, but not necessary:
  • a new tire
  • a lug wrench
  • a car jack
  • a hammer

If you have the following, you have tools that are not necessary, and not sufficient:
  • a hammer

Recap


Necessary defines the lower bounds of what you need while sufficient defines the upper bounds.  If you have both, you have the ability to do the work completely while being perfectly efficient.

Sunday, December 30, 2018

Information Science: What do Digital and Analog Mean? It's Fairly Simple Really...

Motivation

People frequently use the word "digital" to describe anything that's modern.  It means something specific, and it's pretty easy to learn.  And learning is cooool.

"Digital"

In modern society, "digital" is often used in non-scientific ways from "digital music" to vague sentiments like "the Digital Age."  Digital and analog have simple meanings, and neither has anything to do with computers.  Digital simply means you're using a set of values that you know the exact values of.  In computers, that set of values is 0 and 1.  It is a discrete set of values.

"Analog"

Analog simply means you're using the full, continuous range all values between your minimum and maximum.  There are an infinite number of values between 0 and 1.  For example, 0.5 and 0.6.  But I can "go between" these numbers with the value 0.55.  and I can "go between" 0.55 and 0.6 with 0.575.  I can keep doing this infinitely.  Analog systems do not measure, track or store discrete values.

Examples

The simplest example of this I can think of involves musical instruments.

Imagine a trombone: you can play an E by holding the slide in a certain position.  To play an F, you move the slide a little closer in, but if you move it a little less than you should, you're between the two notes.  Since you're not playing an E or an F, it's probably going to sound bad because you'll be flat.  You can play an E and an F, but you can also play the full, infinite range of pitches between those 2 notes.

On piano however, you press the E key, you get an E, you press the F, you get an F.  You cannot play the pitches between those notes.  That's nice because it's one less thing to worry about... but you also can't play the "womp womp" sound.

The trombone is analog and the piano is digital.

A few more quick examples are a full-on rainbow, versus a simple rainbow:


And slider belt versus a belt with holes:

The Physical Medium vs How It's Used

In the case of the belts above, you might notice something: The "digital" belt with the holes cannot be used in an "analog" way.  The "analog" belt however... if you used white paint to make lines on it every inch or so, and you chose to only use the belt at those increments, you would effectively be using the "analog" slider belt in a "digital" way.  In fact this is true of cassette tapes (think 1980s).  Cassette tapes were primarily used to store music, for example MC Hammer's "Too Legit to Quit," but they were also used in computer systems (Atari, Commodore, etc.) to store binary (digital) data.

I used them some in the 80s and was surprised how well they worked.  I'm guessing the error rates on these tapes wasn't too bad because with binary data, only having 2 values to represent meant you could make it pretty clear which was which.  It wasn't like playing an E versus an F on a piano (most humans couldn't tell you which was which) it was probably more like playing the very lowest note on a piano versus the very highest (most humans could easily tell you which was which).

Digital as "Better"

The reason people think of digital as better than analog is because you can make a perfect copy of it.  A cassette tape with music on it is analog.  If you recall, every time you made a copy of one, the quality deteriorated a little.  If you made a copy of your friend's tape, then another friend made a copy of your copy, it sounded noticeably worse than the original.  An analogy for modern times might be if you take a picture with your phone, and then someone takes a picture of your phone's screen, and then someone else takes a picture of their phone's screen.  That 3rd picture's not going to look too good.

If you made copies with a low quality tape recorder vs a higher quality one, it made a difference.  It might even make a difference if you made the copy in a room that was really hot or humid.  This is because the data on the tape was analog, and the tape recorder reading that tape is not precise enough to read the data exactly.  In the same way, while the best trombone player in the world is very very close to hitting that E on their instrument, their finger placement will probably actually be sharp or flat by at least a nanometer or two...

However, if you make a copy of a music CD, you can make a copy of that copy, and a copy of that copy, etc. and it will be an exact copy of the original.  The original and all of the copies are perfectly identical.  The modern analogy would of course be just emailing the picture to your friend so that they have a copy.

Why We Care

Basically, it's simply much easier to make an exact copy of something that is stored using discrete values.  For computers to execute computer programs exactly the same way every time, digital is the way to go.  Most of the time, when a program crashes or doesn't do what you want, it's not running it wrong.  It's running the program - with all of its bugs - exactly as it was written.

Friday, December 28, 2018

Information Science: Information is Meaningless Without Context - Now With Puppies!

Motivation


Most simple text editors will open any file and try to display it as text.  This phenomenon is where this post came from... a junior coworker asked why an .exe looks like garbled text when viewed with a text editor.

The answer is context.

What does "10110101101000101110101110" mean?  


Well it depends on who you ask.
Let's put those exact bits into a file called:
  • my_file.jpeg and open it with Photoshop - it's going to be interpreted as an image
  • my_file.mp3 and open it with iTunes - it's going to be interpreted as sound
  • my_file.txt and open it with Notepad - it's going to be interpreted as text

That's oversimplifying it because some standard file types such as jpegs and mp3s must always start with a "header" that basically says "this is an mp3," but let's ignore that for a moment.  Let's just talk about the 1s and 0s that represent the song in the mp3 file.

How many puppies does "11" represent - A, B, or C?

A


B


C

The answer again is context.
  • If "11" is read as decimal (base 10, which we all use in everyday life), then it's A.
  • If "11" is read as binary (base 2), then it's B.
  • If "11" is read as octal (base 8), then it's C.

For a quick read on this, see Information Science - A Simple Explanation of Base Number Systems: Binary, Decimal, Octal, Hexadecimal

So when a program like iTunes sees a string of 1s and 0s it's going to read it as sound, and if it's just a random string of 1s and 0s, it'll be pretty noisy.  An .exe file is supposed to be read by the operating system, so if you open it with a text editor it tries to turn the 1s and 0s into a bunch of letters, so it just looks like garbled text.

Other examples

Human Languages

To think of this yet another way, I could write "Plato" on a piece of paper and give it to someone who speaks English and they'd probably interpret it as the name of the Greek philosopher.  If I give that same piece of paper to someone who speaks Spanish, they'd probably interpret it as a flat round thing you eat food off of ("plato" is Spanish for the English word "plate").

Character encoding schemes

Similarly to the decimal vs binary example, we have different standards that make a specific set of bits represent different readable symbols.  Some examples you may have heard of are ASCII and ANSI.  Let's take this set of bits:
01001000 01000101 01011001 00100001

ASCII was thought of long ago as a simple character set with a small range of 00000000 to 11111111 (a total of 255 unique characters), so each of the 4 chunks above represent a character... in fact it represents "HEY!"

Unicode is intended to be the last character set we'll ever need, so it has a huge range of 1,114,112 unique characters, currently only around 100,000 of which are used.  It's a much more complex system, including the Latin, Greek, Cyrillic, Chinese, and Thai alphabets (along with others) and all kinds of symbols for music, math, just about anything you can think of.  Suffice to say, if you give a long string of 1s and 0s, interpreting it as ASCII and Unicode will likely not yield the same result.

To sum up...


and reiterate (to a hopefully tediously clear degree): information in a file is just like information in the real world - it can only be correctly understood if the context is correctly understood first.

Information Science - A Simple Explanation of Base Number Systems: Binary, Decimal, Octal, Hexadecimal

Motivation


I Googled for a quick, simple explanation of this and was very surprised when I failed to find one, so here we go.  Oh, and the whole puppies thing is a reference to this post.

Bases


The base is just defined by how many characters we have to represent numbers.
  • In binary (base two), we use characters 0, 1
  • In octal (base eight), we use characters 0, 1, 2, 3, 4, 5, 6, 7
  • In decimal (base ten), we use characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
  • In hexadecimal (base sixteen), we use characters 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F
In decimal we can represent zero to nine puppies using only one character.  At that point if we add another puppy, we have to move over a spot and start over with the one character, so we count:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 08
  • nine 09
  • ten 10  --  here we start reusing characters...
  • eleven 11
  • twelve 12

but for binary we only have two characters with which to represent puppies, so we have to move over a lot quicker:
  • zero 000
  • one 001
  • two 010  --  quickly we start reusing characters...
  • three 011
  • four 100
  • five 101
  • six 110

similarly for octal, we count:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 10  --  here we start reusing characters...
  • nine 11
  • ten 12

and finally, for hexadecimal, we have to "make up" some extra characters that we're not used to (since most humans are used to the decimal, or base 10, system), so someone long ago chose the letters A-F:
  • zero 00
  • one 01
  • two 02
  • three 03
  • four 04
  • five 05
  • six 06
  • seven 07
  • eight 08
  • nine 09
  • ten 0A
  • eleven 0B
  • twelve 0C
  • thirteen 0D
  • fourteen 0E
  • fifteen 0F
  • sixteen 10  --  finally we start reusing characters...
  • seventeen 11
  • eighteen 12

Hexadecimal can be a little weird to think about but try thinking about it this way:  if someone created five hundred unique characters that we used to count things, we could just point to a herd of, for example, four hundred and eighty puppies in a park, and you could use a single character to represent that number. That character would uniquely identify the number of puppies you see sniffing and playing and rolling in the dirt.  My guess is that you would spontaneously explode from how cute it was.

A generic formula for figuring out what a number means in your base


So the generic formula for figuring out how many puppies we're talking about is to take the base multiplied by itself the number of times we've had to "move over a spot" and then multiply it by the number in that spot.

So "432" in decimal:
  • ten * ten * four +
  • ten * three +
  • two =
  • Four hundred and thirty two

"432" Octal:
  • eight * eight * four +
  • eight * three +
  • two =
  • Two hundred and eighty two

"2BC" Hex:
  • sixteen * sixteen * two +
  • sixteen * eleven +
  • twelve =
  • Seven hundred

"1101" Binary:
  • two * two * two * one +
  • two * two * one  +
  • two * zero +
  • one =
  • Thirteen

The formula can also be expressed as "the base raised to the power of the number of times you've had to "move over a spot."  It's just the same.  For example:

"432" in decimal:
  • ten to the second power * four +
  • ten to the first power * three +
  • ten to the zeroth power * two =
  • Four hundred and thirty two

Thursday, September 20, 2018

Automation Strategy: Externalize All Your Strings


Motivation

We need to externalize all the strings we use in our locators, such as "User order number" or validation methods, such as "Order Created."  Why?
  1. It'll make your code more maintainable if that string changes
  2. It keeps things organized
  3. Internationalization (i18n)
  4. Multi-platform automation
From the outset, you might as well plan for i18n by including a switch for locale (not language). Language is "English."   Locale is "American English" and looks like "en_US" or "en-US" depending on who you ask.  Google it.  I like to model the strings exactly as they are, including capitalization and punctuation:

This works well for automating small to medium apps that themselves maximize code reuse.  But for big (or legacy) applications it may not be the best approach due to non-standardization, your mileage may very.

And for multi-platform automation... (there is a lot more work involved, but this is part of the deal...)

Ok I'm going to be honest right now. I wrote the part above tonight and this next part 2 years ago and I don't even know what language this is. The filenames are .cs but I don't even know C# well enough to know if you can define methods like this, I might have just used .cs because gist colorization was good.  Hmm, now I'm thinking it's just pseudocode.

Anyway, the last thing I'll say about externalizing strings is about dynamic strings.
Here's option 1, strings live in the data file and you compose them in the page object class:

This is fine, but will require everyone who uses those strings to do all that string concatenation themselves, which is code duplication. It will also become a problem when the translation is weird. Maybe in American English we'd say "Order number 404 is on its way!" but in Canadian English they'd say "404 is the number of the order comin' at ya, eh!" If this were the case, our stringA + orderID + stringB wouldn't work, or would be wonky. Let's just write methods to generate those strings instead, and notice how it's used in the page class:

And you could even get really crazy and do kind of a mix, which lets you have both the individual strings as well as the pre-composed dynamic strings. It just depends on what you need for your situation:

Monday, September 17, 2018

Automation Strategy: Positively Do Negative Testing With should_fail()

Motivation

An old adage is "don't code for the exception," which is along the lines of Keep It Simple Stupid. This doesn't mean you don't expect and handle exceptions, bad data, etc., but that you should try to avoid extensive, complex coding for very rare cases.

I try to employ this in automation development just like any other software development, therefore when I write higher-level methods, I write them for the 95% case (the rule), and avoid the 5% case (the exception). For example, if I'm writing a login() method and after logging in, there's a popup that I will dismiss in 95% of my test cases, I will include the dismissal in the login() method. If I'm using a language that supports optional parameters, God forbid I have to work in a language that doesn't, maybe I'll take an optional parameter dismiss=True.

I'm actually quite proud of this should_fail() stuff.  I always aim to simplify the test scripts so they can be written by less technical people (hey, isn't that the point of cucumber/gherkin blah blah blah) and should_fail() aids in that as well as keeping your tests user-centric instead of UI-centric. This way, you can use your great Flow methods even though they feel like they're designed for positive-path testing.

For this example, we're logging in, then verifying we're on the Home page.  In this system, if the "passed" variable isn't true by the time the test is over, the test is considered to have failed.
Here's the positive test:


Now For Negative Testing

Everything's hunky dory, which now that I type it is a very strange phrase. For the simple negative test, we'll try to login and then verify that we're still on the login page... that we have not advanced to the homepage (we could also look for an error message such as "User not found"). We can't use login() because it will try to dismiss that modal after we login and throw an ElementNotFoundException or something like that, so we'll have to use clicks:


Using Exceptions the Simple Way

Well that doesn't look too bad, but it was very-UI centric, and as our automation gets more complex, maintainability of these negative tests is going to become a factor. If all our positive testing is done using Flows, we've thrown it out the window and reverted to using page objects. Ok, let's actually try it with login() and just catch the exception:


Using should_fail()

That works, but it's a lot of code. Let's hide and generalize that code in our should_fail() method:


Ok, there's definitely some syntax to learn there, but not too much. Let's see how should_fail() works:

One thing you might notice is the hardcoding of the error messages. Those would be in some kind of dataStrings.rb file or something. The other is that I'm throwing around exceptions here quite a bit. This is not necessarily something you can shove into a pre-existing system that doesn't use exceptions very much, or doesn't use custom exception messages. All my wrapper click() and find() and type_text() and everything else methods throw exceptions with clear, relevant messages in order to make debugging easier.

Summary and Tying in Flows

So again, the whole point of this is so you can write your negative tests using Flow methods. Let's say in the example below that returning an order is a multi-step process. If we were to write this negative test using the page objects only similar to my Test_Negative_Using_Clicks() above, we would be reverting to a long test that was very tied to the UI. So by using positive testing of negative tests, we can stay user-centric and high-level, even if your task is complex. In the example below we're going to try to return an non-returnable item:


The last thing I'll say is the error I'm looking for here is very high-level. Sometimes it's difficult or tedious to make your Flow methods do this, so maybe in this example, we would have tried to click a button as part of the return process but it wasn't there or was disabled, etc. In that case, we'd just have to catch the more generic "... does not exist" exception, and this does tie us to the UI. The usefulness of this system, like all automation, depends on the context of the system you're testing.

Saturday, September 15, 2018

Automation Strategy: Flow vs Page Objects Part 2 - Test Data Abstraction

In my first post on Flow vs Page Objects, it became clear that as we move up in layers of abstraction, our tests get shorter, but the data required to execute those tests remains the same. If we were just using page objects, we could do something simple such as...


Using Flows

...but again, we want to focus on the user, and if you asked the user what they do on this page, they wouldn't say "This is the page where I put in my first name, then last name, then street, then city, then state, then zip code," they would say "This is where I put in my address." So let's do that:


Maintenance

This is going to get overwhelming quickly, on top of which it's not maintainable. You might think the structure of a user's address doesn't change much, but maybe it's a voting website and they now require you to put in your state representative district number. Yeah, it can and will change. It needs to be maintainable. Now all our address tests need to be updated.

If you ask a user what their address is, they don't say "My street is 123 Main St. My city is Austin. My state is Texas," they say "My address is 123 Main St, Austin, Texas." So instead of passing in all the pieces of information that form an address, let's just pass in an address:


Maintenance Moves from Tests to the Data Source and Related Classes

The trick here of course is that our data source is going to have to get smarter. That's up to you, whether it's coming from a file, a DB, XML, JSON, who knows. The point is, when we add the state rep district number, the things that have to change are:

  1. The data itself.  And yes, if we have 1 data file for every test, then yes, we'll have to update a bunch of files.  This will have to be done in any system, and if you reuse the same data file in N tests, then it won't be a ton of files to update.
  2. The code to get the data from the file.  If well designed, this should be trivial, and if very well designed, this might not have to change at all.
  3. The classes or containers (such as Structs in C++) that hold the data. This should be trivial.
  4. The PO class needs a new object for the new UI element. Trivial.
  5. The Flow code that actually enters the data into the UI using that PO. Trivial.

Let's say our data file looks like this, then we add the district number to the end:

and our code for holding the data looks like this, then all we have to do is update that last line:

and our Page class (the Flow doesn't do much in this trivial case):

Payoff

So that's it. You're going to have to update the data and how the data gets input into the UI regardless, there's no magic bullet for those. But this way, your tests don't change. Again, this is a simple example. The more complex it gets or the bigger a change to the application's workflow, the more this extra abstraction will pay off.

The last thing I'll say is this: notice we're updating the address, but the flow isn't updateAddressFlow, it's just addressFlow. I encourage grouping related functionality together. In fact, where we're doing this now, for the smaller of our dozen or so applications, there's just a single flow for each application. How you break that up is a trade-off you make between having tons of code files and having tons of code in just a few files.