The Insapience of Anti-Automationism

Last weekend, I attended the 12th annual Workshop on Teaching Software Testing (WTST 2013). This year, we focused on high-volume automated testing (and how to teach it).

High-volume automated testing (HiVAT) refers to a family of testing techniques that enable the tester to create, run and evaluate the results of arbitrarily many tests.

We had some great presentations and associated discussions. For example:

  • Harry Robinson showed us some of the ways that the Bing team evaluated its search recommendations. There are no definitive oracles for relevance and value but there are many useful ideas for exposing probably-not-relevant and probably-not-valuable suggestions. These are partial oracles. The Bing team used these to run an enormous number of searches and gain good (if never perfect) ideas for polishing Bing’s suggestion heuristics.
  • Doug Hoffman and I gave presentations that emphasized the use of oracles in high volume automated testing. Given an oracle, even a weak oracle, you can run scrillions of tests against it, alerting a human when the test-running software flagged a test result as suspicious. Both of us had uncharitable things to say about run-till-crash oracles and other oracles that focus on the state of the environment (e.g. memory leaks) rather than the state of the software under test. These are the types of oracles most often used in fuzzing, so we dissed fuzzing rather enthusiastically.
  • Mark Fioravanti and Jared Demott gave presentations on the use of high-volume test automation which included several examples of valuable information that could be (and was) exposed by fuzzing with simplistic oracles. Fioravanti illustrated context after context, question after question, that were well-addressed with hard-to-set-up-but-easy-to-evaluate high-volume tests. Doug and I ate our humble pie and backed away from some of our previous overgeneralizations.
  • Tao Xie showed us a set of fascinating ideas for implementing high-volume testing at the unit test level. I have a lot more understanding and interest in automated test data generation (and a long reading list).
  • Rob Sabourin and Vadym Tereshchenko talked about testing for multi-threading problems using Fitnesse.
  • Casey Doran talked about how to evaluate open source software, to determine whether it would be a good testbed for evaluating a high-volume test automation tool in development and Carol Oliver led a discussion of quality criteria for reference implementations (teachable demonstrations) of high-volume test automation tools.
  • Finally, Thomas Vaniotis led a discussion of the adoption of high-volume techniques in the financial services industries.

Personally, I found this very instructive. I learned new ideas, gained new insights, stretched the bounds of what I see as high-volume test automation and learned about new contexts in which the overall family can be applied (and how those contexts differ from the ones I’m most familiar with).

A few of times during the meeting, we were surprised to hear echos of an antiautomation theme that has been made popular by a few testing consultants. For example:

  • One person commented that their clients probably wouldn’t be very interested in this because they were more interested in “sapient” approaches to testing. According to the new doctrine, only manual testing can be sapient. Automated tests are done by machines and machines are not sapient. Therefore, automated testing cannot be sapient. Therefore, if a tester is enthusiastic about sapient testing (whatever that is), automated testing probably won’t be of much interest to them.
  • Another person was sometimes awkward and apologetic describing their tests. The problem was that their tests checked the program’s behavior against expected results. According to the new doctrine, “checking” is the opposite of “testing” and therefore, automated tests that check against expectations are not only not sapient, they are not tests. The campaign to make the word “checking” politically incorrect among testers might make good marketing for a few people, but it interferes with worthwhile communication in our field.

I don’t much care whether someone decides to to politicize common words as part of their marketing or, more generally, to hate on automated testing. Over time, I think this is probably self-correcting. But some people want to make the additional assertion that the use of (allegedly) non-sapient (allegedly) non-tests conflicts with the essence of context-driven testing. It’s at that point that, as one of the founders of the context-driven school, I say “hooey!”.

  • All software tests are manual tests. Consider automated regression testing, allegedly the least sapient and the most scripted of the lot. We reuse a regression test several times–perhaps running it on every build. Yes, the computer executes the tests and does a simple evaluation of the results, but a human probably designed that test, a human probably wrote the test code that the computer executes, a human probably provided the test input data by coding it directly into the test or by specifying parameters in an input file, a human probably provided the expected results that the program uses to evaluate the test result, and if there appears that there might be a problem, it will be a human who inspects the results, does the troubleshooting and either writes a bug report (if the program is broken) or rewrites the test. All that work by humans is manual. As far as I can tell, it requires the humans to use their brains (i.e. be sapient).
  • All software tests are automated. When you run a manual test, you might type in the inputs and look at the outputs, but everything that happens from the acceptance of those inputs to the display of the results of processing them is done by a computer under the control of a program. That makes every one of those tests automated.

The distinction between manual and automated is a false dichotomy. We are talking about a matter of degree (how much automation, how much manual), not a distinction of principle.

I think there is a distinction to be made between tests that testers use to learn things that they want to know versus tests that some people create or run even though they have no plan or expectation for getting information value from them. We could call this a distinction of sapience. But to tie this to the use of technology is misguided.

Thinking within the world of context-driven testing, there are plenty of contexts that cry out for HiVAT support. If we teach testers to be suspicious of test automation, to treat it as second-class (or worse), to think that only manual tests have the secret sauce of sapience, we will be preparing those testers to fail in any contexts that are best served with intense automation. That is not what we should teach as context-driven testing.

And then there is that strange dichotomization of testing and checking. As I understand the notion of checking, when I run a test I can check whether the program’s behavior conforms to behavior I expect in response to the test. Thus I can say, Let’s check whether this adds these numbers correctly and Let’s check the display and even Let’s check for memory leaks. Personally, I think these look like things I might want to find out while testing. I think testing is a search for quality-related information about a product or service under test, and if I execute the program with the goal of learning some quality-related information, that execution seems to me to be a test. Checking is something that I often do when I am doing what I think is good testing.

Suppose further that I have a question about the quality of the software that is well-enough formed, and my programming skills are strong enough, that I can write a program to do the checking for me and tell me the result. That is, suppose that on my instruction, my program runs the software under test in order to quickly provide an answer to my question. To me, this still looks like a test, even though it is automated checking.

But not everyone agrees. Instead, some people assert that checking is antithetical to testing (checking versus testing). They say that testing is a sapient activity done by humans but that checking is not testing.

If comparison of the program’s behavior to an anticipated result makes something a not-test, then what if we check whether the program’s behavior today is consistent with what it did last month? Is that a not-test? What if we check whether a program’s calculations are consistent with those done by a respected competitor? Or consistent with claims made in a specification? These are just more examples of cases in which testers might (often) have a reasonably clear set of expectations for the results of a test. When they have those expectations, they can assess the test result in terms of those expectations. Clearly, this is checking. But in the land of Sapient Testing, these particular expectations are enumerated as Consistency Heuristics that sapient testers rely on (e.g. http://www.developsense.com/blog/2012/07/few-hiccupps/). Apparently, at least some sapient testing is checking.

Let me try to clarify the distinction:

  • A test is checking to the extent that it is designed to produce results that can be compared, without ambiguity, to results anticipated by the test designer.
  • A test is not-checking to the extent that it can produce a result that is not anticipated by the test designer but is noticeable and informative to the person who interprets the test results.

From this, let me suggest a conclusion:

Most tests are checking. Most tests are also not-checking. Checking versus testing is another false dichotomy.

The value of high-volume automation is that it gives testers a set of tools for hunting for bugs that are too hard to hunt without tools. It lets us hunt for bugs that are very hard to find (think of calculation errors that occur on fewer than 0.00000001% of a function’s inputs, like Hoffman’s MASPAR bug). It lets us hunt for bugs that live in hard-to reach places (think of race conditions). It lets us hunt for bugs that cause intermittent failures that we (incorrectly) don’t think are possible or have no idea how to hunt with traditional methods (including manual exploration).

We do almost all of the high-volume test work with tools. In most cases, we compare test results with expected values. And yet, this is purposeful work being done by humans to answer quality-related questions. It is automated. It is checking. It is testing.

Tests are tests whether they are automated or not, whether they are checking or not. Tests can be excellent tests, and fully appropriate for their contexts, whether they are automated or not, whether they are checking or not. And tests can be worthless, or completely unsuited to their context, whether they are automated checks or intentionally-designed manual tests.

The basic premise of context-driven testing is that a test approach that works well in one context may fail utterly in some other context. There are no universal best practices (not even manual exploratory testing organized into testing sessions).

Years ago, one of the common pieces of advice we gave to testers who were interested in context-driven testing was to look for three counter-examples. If you were thinking about a really good practice, look for three situations in which an alternative would work much better. If you were thinking about a practice that you didn’t like (especially one that other people took more seriously than you considered reasonable), look for three situations in which that practice would be appropriate and valuable. I think we learned this from Brian Marick but I remember it taking hold with most (or all) of the folks who were early adopters of the context-driven approach. I came to see it as a core skill of context-driven testers. Drawing sharp lines, condemning common and often-valuable testing activities as non-testing or non-sapient, is completely out of step with this. It denies the importance of “context” in “context-driven.”

9 thoughts on “The Insapience of Anti-Automationism

    • Jesper:

      As an example for Session-Based exploration, I want you to imagine a program that has gone through basic touring and UI-level attacks. Now we have to understand whether it provides value. In moderately complex applications, it can take a tester a lot of time to research the product and devise scenarios that will be challenging for the program but persuasive for the stakeholder if the scenario fails. One of the most talented testers I ever worked with, David Farmer, could take a week to design this type of test for a database management system, ultimately arriving at a brilliant test that justified the time taken. The artificial time-chunking of the sessions would constrain this type of work.

      As an example for manual exploration generally, suppose that you guarantee a product will meet a verifiable standard and that your guarantee subjects you to liability (contract, regulatory, or fraud, doesn’t matter). There is value in carefully planning a set of tests that will test the program against that standard thoroughly enough that (a) you are confident the standard is met, (b) a reasonable observer would be confident that the standard was met, and (c) if a problem was in fact discovered in the field, a legal decisionmaker would be very likely to agree that you had made reasonable and extensive efforts to ensure that you met the standard. If potential liability was high enough, I would want to see such a set of tests, I would want them to be easily replicated, and I would want the rationale behind them to be explicit and explicable to third parties (e.g. judge and jury). I might use manual exploration to guide the development of the test suite or I might not. If we were dealing with a standard that could be verified through extensive automated testing (think of a standard for accuracy of calculations), then for assessment of standards-compliance, I might plan the tests theoretically (from a numerical analysis perspective), execute the tests against a reference program that would be widely recognized as a trustworthy oracle, and (in terms of testing the accuracy of the calculations) not run a single manual exploratory test.

      As another example, I am a big fan of writing “unit” tests when I code (“unit” includes tests that assess individual methods and the cofunctioning of several units). These are automated. If I write them test-first, they initially have a lot of information value (they tell me when I screw up the code) but as the code settles down, they are less informative — until either I refactor (and in the process, use the unit tests as an oracle) or I do something that causes an unexpected side-effect. I don’t care about the vocabulary we use to describe these. They are tests when I initially write and run them and they are useful later. Now, I personally happen to think of TDD as an exploratory testing/programming activity, at least to some degree, but it doesn’t fit within the framework of manual system-level exploratory testing and it serves needs that manual system-level exploratory testing cannot touch.

      As to whether context-driven testing itself is sometimes a bad practice, I don’t think I said that. I don’t think context-driven testing is a practice or a set of practices. I think it’s a mindset. I think we have some controversy over what fits in the mindset (which is what led me to refocusing the context-driven website) but I think that’s a broader question than I can address in a reply to a comment. So here’s an exercise for you — imagine what YOU think the “context-driven mindset” to be and then try to imagine circumstances in which applying THAT MINDSET would be unhelpful. I suspect that any particular mindset is unhelpful for at least some situations, so if you get to a specific one, I suspect you can find a counter-example for using it.

  1. Pingback: Five Blogs – 5 February 2013 « 5blogs

  2. Thanks for this. I do see and appreciate the notion of sapience, and do share some people’s desire to separate rote execution of a human or computer script from intense thoughtful exploratory testing. At the same time, the drawing of a bright line where I saw none has troubled me.

    This article invites me to look at the question from different angles, and the message that I get is that there is no bright line anywhere to be seen. That matches my deepest intuitions and gives me a kind of comfort.

    Thanks!

  3. Cem: Thank you very much for the elaboration, I recognize the counter examples from my experience. Also keeping in mind that “context-driven testing is about doing the best we can with what we get.”

    Finding where the context-driven mindset is unhelpful I tricky, as I expected. It could be a context where IEEE829-98 is used everywhere “just because” and no-one challenges it. I Personally I would challenge it, and in doing so reveal the context, but again if I accept it as “context” defined by the stakeholder, I’m at best context-imperial. Similarly with context-oblivious and context-specific.

    Much appreciated the brain exercise, Thank you.

  4. This was really good. I wish I had something of value to add, but I don’t. I just loved it, and I also appreciated Ron’s comment above.

  5. Hello,

    Interesting and controversial write-up, Cem. It has definitely stirred up some discussions and I think that is a very good thing. I haven’t seen much discussion against automation and/or tools, but just this week I’ve heard a very intellectual friend of mine saying he is against processes. Surely he is not against processes, but something he associated with the term at that moment and gave a hasty comment. I think the underlying problem here could be the same; the people against automation are (at that moment) thinking of something specific they have seen, heard or done in their past.

    Jesper, for me, context-driven mindset implies questioning what is told to us (what we see, feel etc.), thus sometimes slowing us down before we dig in to the action. This can be a problem when extremely fast response time is needed – especially if that situation comes as a surprise. Another example I was thinking is when someone is responsible of the quality of work done instead of the one who does the work. Have you ever been in a situation like that? I can’t recall a testing situation now, but I am thinking of a traffic police guiding the cars in an intersection. For his situation, it’s important that people move as he tells them to move. The third example could be a case where questions are not allowed. I was initially thinking about something completely different than software testing, but realised I’ve seen quite a few of those also. This being a public forum, I don’t want to go into more details on that one.

    Enjoy the weekend everyone!

    Best regards,
    Jari

  6. Pingback: Podcast #54 | Watir Podcast

  7. Pingback: The Insapience of Anti-Automationism – Context Driven Testing | QA 2100 Testing

Comments are closed.