Schools of Software Testing: Useful or Negative Influence?

On April 2014 Cem Kaner and Rex Black had a debate on stage at STPCon Spring 2014 titled “Schools of Software Testing: Useful Paradigm or Negative Influence?” about the impact of the Schools of Software Testing such as Context Driven Testing.

For historical reference, the debate was transcribed:

The Introduction

Chairperson: Welcome to the STPCon Spring 2014 recordings.

We’re going to talk a little bit about the schools of testing, and I don’t know how many of you have been following the debates out there, but we thought it would be interesting to bring a couple of the gentlemen that have been at the heart of the debate to come here.

Have a conversation about their points of view to a little point, counterpoint, and then we’re going to have a little bit of audience participation as part of that. So I’m going to be out there when they kind of laid the foundation of their arguments and go back and forth and I’ll be close by. So if I have to wrestle them to the ground, I will. And I’m looking for a couple of the big guys that can help me out, because these guys could probably take me, but I’ll be out there with a microphone. So I think we’re gonna have a good time here enjoying this debate and interacting. We thought it would be a nice way to interact.

So I’m going to invite up onto the stage first, Rex Black. Rex, come on up. Give Rex a hand. He’s the president of RBCS. He’s got a popular book managing the testing process, has sold over 100,000 copies and he has 11 other books and he’s great all around guy. I sat through his risk based testing last week and his Agile Tester. He’s introducing the agile testing certification coming up. And he’s a great guy and really a great trainer.

I’d like to also welcome up to the stage Mr. Cem Kaner. Another great guy. He’s a J.D., Ph.D.,. He’s a professor of software engineering at Florida Institute of Technology. He’s the author also of several books, including Lessons Learned and Software Testing and the Domain Testing Workbook. He’s also a software test luminary award winner. And we appreciate that.

And I am going to hand over the stage to them. They’ve agreed on the rules of engagement. They are a little bit close for my comfort. Anybody watch the arm wrestling? I think there may be a little bit of that going back and forth. But they’ve agreed that Rex is going to open it up and then Cem is going to discuss and then they’re going to go back and forth and then we’ll engage you guys.

Take it away, Rex.

The Opening Pitch

Continue reading

Please sign the Petition to Stop ISO 29119

Please sign this petition: http://www.ipetitions.com/petition/stop29119

I rarely rant about software engineering standards because it seems like a waste of time. Many of the participants in the software engineering standards movement are fine people who I respect. Some of them I call friends, or would be happy to have as friends. But several groups stand to benefit from being able to claim that they are following, selling, or training a collection of processes that are simplistic and easily described, even if they are ineffective and enormously wasteful. These groups can afford to invest a lot of money dominating the standards committees that, in turn, have come to serve their interests. They can also afford to invest a lot in public relations to promote the perceived legitimacy of those committees’ work.

My experiences with the IEEE software engineering standards (which are the main basis for these ISO standards), began when I first came to Silicon Valley in 1983. They have been uniformly negative. I finally left IEEE in 2010 or 2011, at that point a Senior Member who had been recognized by IEEE for my work on their standards and even been appointed by Congress to the United States’ Election Assistance Commission’s Technical Guidelines Development Committee at IEEE’s request. (TGDC wrote technical standards and much of its work was guided by an IEEE standard that I had worked on.) I left IEEE as a protest against a software engineering standards process that I see as a closed vehicle that serves the interests of a relatively small portion of the software engineering community.

Context-driven testing developed as the antithesis of what is being pushed through ISO. They represent opposite points of view.

Standards are political documents and sometimes legal ones. The existence of a standard makes it easier for a court (or a regulator) to rule that the standard-approved approach is the professionally correct one, and the non-approved approaches (or the ones that conflict with the approved one) are professionally incorrect and therefore improper. The imposition of a standard that imposes practices and views on a community that would not otherwise agree to them, is a political power play.

If ISO 29119 is adopted and then broadly accepted as a legitimate description of good professional practice in software testing, then context-driven testing will be an example of something you should not do, a way of thinking you should avoid.

I don’t think it will do much good to sign a petition against ISO 29119, but I would rather say that I protested against it than simply accept its consequences in silence.

I recommend that you do the same.

On the Petition to ISTQB

twitterKlain

I don’t usually let myself get dragged into the TwitterStorms orchestrated by the Rabid Software Testing crowd. But I decided to make an exception for this one because it has sucked some good people into endorsing what I think is a very bad petition.

In essence, the petition asks ISTQB to open its quality-control records. ISTQB is involved in training and certifying software testers, so I will write this to you (the reader) as if you are a software tester. The basic question is whether ISTQB has seen any problems in their exams and what they have done about it. (Underneath that are some more specifically-pointed details, which I’ll come to later.)

Let’s Start On This From The Basic Principle

In many countries (such as the United States), companies have a right to study the quality of their goods or services in private. That’s what I understand that ISTQB has been doing.

By the way, that’s what companies do when they test their software. Those companies (ISTQB, and software companies) have a right to hire employees and consultants and to require those people to keep what they’ve learned private. When you test your company’s or client’s software, it is normally a condition of your job or your consultancy that you keep your results private.

This petition asks ISTQB to do what (for the most part) the companies you work for would not do, and would not dream of letting you do with their quality-control data.

When I studied law, I learned that the public policy of the United States (and many other countries) favors investigative privilege. That is, we encourage companies to conduct aggressive, detailed internal investigations, to find their own problems and to improve their products, their services and their advertising based on what they learn from the investigation. We want them to investigate because we want the improvement.

These types of studies expose the companies to risk–people who get their hands on the investigative data, and don’t like those company that investigates itself, can use any report of any problem, whether they understand the problem or not, whether the problem is actually major or not, whether the company has dealt with the problem in a reasonable way or not–to attack the company. Therefore, to encourage companies to do the studies, society grants them privacy–we treat the studies as trade secret–because the companies won’t do serious studies without privacy.

When I practiced commercial law, my most significant project involved drafting legislation about the law of software quality. My most significant contribution was a very narrow disclosure rule that would preserve software vendors’ incentives to do internal investigations (i.e. test their own software harshly) while holding them accountable for significant defects that they knew about but did not make public. It took 6 years to craft the language for this. It took 6 years because the idea of almost-forced disclosure of (some) test results triggered allergic reactions among lawyers who normally defended companies, among lawyers who normally attacked companies, and among judges. Finding a way to balance the interests here was enormously difficult because the support for allowing companies to investigate their products and services in private runs so very deep. And, in my view, it should.

And if you want to be able to do significant work as testers, you should support that privilege too.

And that means you should show some respect for that privilege, even when the company asserting it is a company that you (or some people who impress you by screaming at others in public) don’t like.

Now, we have a petition from someone who frequently attacks ISTQB, asking them to open their private quality-control records associated with their certification exams. Will I sign it?

No, Keith. I will not sign that petition.

A Little More Background

Back in 2010, I saw some materials that allegedly summarized unauthorized disclosures from an ISTQB meeting. I suspect that this (2010) material underlies the specificity of the questions in this (2013) petition to ISTQB.

The materials were not authenticated and there might not be any truth in them at all. The gist of the materials was that ISTQB had commissioned a psychometric study of one or more of their tests, that a “test reliability coefficient” was a bit low, and that ISTQB was planning to use this information to improve the quality of their exams.

I believe that this is the type of research that we, as a society (and as a profession of testers), want to see done. Look for problems. When you find problems, raise them with other decision-makers in the organization and look for ways to improve them.

To keep this going, society respects these studies as trade secret–grants them privacy. There is no expectation, zero, none, that they will tell us about the studies or what they did to address them.

(There are some exceptions to that rule. I will note them later, to suggest that they are probably not relevant here.)

Certain people encouraged me to get involved in attacking ISTQB with these allegations. I refused.

One thing I said to those people, back in 2010, was:

“I’m not a huge fan of attacking competitors. I fall into it, but I think it is a distraction. I think it adds to the personality circus of the field and detracts from the content of the field.”

Another of the things I said was:

“As to the ethics, I am slightly outraged by the fact that ISTQB apparently will not make this public. But I am also slightly outraged that this was leaked, if it was. You’re being offered the opportunity to play the role of one hostile competitor viciously attacking another competitor. Your attack will not look ethical.”

A little later, I looked more carefully at this “test reliability coefficient”. I am not an expert in this area of statistics, but I know a little bit.

  • My impression is that the theory underlying the meaning and application of the statistic is not fully developed.
  • My impression is that the cutoff that separates “good” values from not-good-enough values is arbitrary.
  • My impression is that something like this is probably useful for internal studies, because it can raise flags. A not-excellent number can motivate people to improve, even if that number is not necessarily very bad, and even if the other aspects of the meaning of the number are a little uncertain.

I’ve dealt with lots of metrics like this in my career. They are good enough for private, informal, self-evaluation or evaluation by a friendly coach. But the idea of attacking a company based on not-terrible-but-not-great values of these statistics, well, I don’t think that’s a valid use of this statistic. Not without a whole lot of other converging evidence.

You may have seen me quoted about these types of metrics once or twice. The quote goes:

“Metrics that are not valid are dangerous.”

So, once I studied the statistic a little, I was no longer even a little bit outraged that ISTQB wasn’t making the data public. It was not only their privilege to keep it private. Keeping it private was also, in my view, reasonable.

I think that using metrics like this to publicly attack people (or companies) is unethical. Even if they are people or companies that I disagree with, or don’t like.

So, Keith, No, Keith. I will not sign that petition.

The Big Public Attack on ISTQB over this didn’t happen in 2010 and I expected never to hear of these allegations again.

Should We Ever Force Companies To Disclose Private Data?

Sometimes, society forces companies to disclose their private data.

For example, if a company commits crimes (you know, like rigging LIBOR, or engaging in tax evasion, or laundering money), its victims (and prosecutors representing society as a whole) can demand to see relevant records and the courts will enforce the demand.

Suppose that ISTQB made specific comments about the reliability of their tests, and they were in possession of data that indicated that those specific comments were false. That would be fraud. In that case, it should be possible to obtain ISTQB’s internal data.

I’m not a big fan of ISTQB’s syllabus or exams, or of their marketing. My attitudes are not as extreme as some other people’s. Some twits seem to believe that ISTQB’s materials are worthless, that everyone finds them worthless, and that no reasonable person thinks the exams or the certifications are worth anything.

  • I have been told by some reasonable people that they believed they learned a great deal from ISTQB courses.
  • I have been told by some reasonably bright people that they found an ISTQB exam challenging.
  • I have been told by some teachers that they see value in the material.
  • I have been told by some hiring managers who have testing experience that they believe that this should be a positive hiring factor.

That’s not exactly my evaluation, but I have no reason to doubt the integrity of these other people, or doubt their experience.

But because I’m skeptical of ISTQB, every now and again, I look at their marketing materials. I haven’t looked at everything. I haven’t even looked at most of the materials. But in what I have seen, even though I don’t agree with the views being expressed or the conclusions they would have me draw:

  • I have not seen evidence of fraud
  • I have not seen claims that could be refuted by mediocre results in test reliability scores.

So. Keith. No, Keith. Absolutely not, Keith. I will not sign that petition.

I encourage you to withdraw it.

 

The Insapience of Anti-Automationism

Last weekend, I attended the 12th annual Workshop on Teaching Software Testing (WTST 2013). This year, we focused on high-volume automated testing (and how to teach it).

High-volume automated testing (HiVAT) refers to a family of testing techniques that enable the tester to create, run and evaluate the results of arbitrarily many tests.

We had some great presentations and associated discussions. For example:

  • Harry Robinson showed us some of the ways that the Bing team evaluated its search recommendations. There are no definitive oracles for relevance and value but there are many useful ideas for exposing probably-not-relevant and probably-not-valuable suggestions. These are partial oracles. The Bing team used these to run an enormous number of searches and gain good (if never perfect) ideas for polishing Bing’s suggestion heuristics.
  • Doug Hoffman and I gave presentations that emphasized the use of oracles in high volume automated testing. Given an oracle, even a weak oracle, you can run scrillions of tests against it, alerting a human when the test-running software flagged a test result as suspicious. Both of us had uncharitable things to say about run-till-crash oracles and other oracles that focus on the state of the environment (e.g. memory leaks) rather than the state of the software under test. These are the types of oracles most often used in fuzzing, so we dissed fuzzing rather enthusiastically.
  • Mark Fioravanti and Jared Demott gave presentations on the use of high-volume test automation which included several examples of valuable information that could be (and was) exposed by fuzzing with simplistic oracles. Fioravanti illustrated context after context, question after question, that were well-addressed with hard-to-set-up-but-easy-to-evaluate high-volume tests. Doug and I ate our humble pie and backed away from some of our previous overgeneralizations.
  • Tao Xie showed us a set of fascinating ideas for implementing high-volume testing at the unit test level. I have a lot more understanding and interest in automated test data generation (and a long reading list).
  • Rob Sabourin and Vadym Tereshchenko talked about testing for multi-threading problems using Fitnesse.
  • Casey Doran talked about how to evaluate open source software, to determine whether it would be a good testbed for evaluating a high-volume test automation tool in development and Carol Oliver led a discussion of quality criteria for reference implementations (teachable demonstrations) of high-volume test automation tools.
  • Finally, Thomas Vaniotis led a discussion of the adoption of high-volume techniques in the financial services industries.

Personally, I found this very instructive. I learned new ideas, gained new insights, stretched the bounds of what I see as high-volume test automation and learned about new contexts in which the overall family can be applied (and how those contexts differ from the ones I’m most familiar with).

A few of times during the meeting, we were surprised to hear echos of an antiautomation theme that has been made popular by a few testing consultants. For example:

  • One person commented that their clients probably wouldn’t be very interested in this because they were more interested in “sapient” approaches to testing. According to the new doctrine, only manual testing can be sapient. Automated tests are done by machines and machines are not sapient. Therefore, automated testing cannot be sapient. Therefore, if a tester is enthusiastic about sapient testing (whatever that is), automated testing probably won’t be of much interest to them.
  • Another person was sometimes awkward and apologetic describing their tests. The problem was that their tests checked the program’s behavior against expected results. According to the new doctrine, “checking” is the opposite of “testing” and therefore, automated tests that check against expectations are not only not sapient, they are not tests. The campaign to make the word “checking” politically incorrect among testers might make good marketing for a few people, but it interferes with worthwhile communication in our field.

I don’t much care whether someone decides to to politicize common words as part of their marketing or, more generally, to hate on automated testing. Over time, I think this is probably self-correcting. But some people want to make the additional assertion that the use of (allegedly) non-sapient (allegedly) non-tests conflicts with the essence of context-driven testing. It’s at that point that, as one of the founders of the context-driven school, I say “hooey!”.

  • All software tests are manual tests. Consider automated regression testing, allegedly the least sapient and the most scripted of the lot. We reuse a regression test several times–perhaps running it on every build. Yes, the computer executes the tests and does a simple evaluation of the results, but a human probably designed that test, a human probably wrote the test code that the computer executes, a human probably provided the test input data by coding it directly into the test or by specifying parameters in an input file, a human probably provided the expected results that the program uses to evaluate the test result, and if there appears that there might be a problem, it will be a human who inspects the results, does the troubleshooting and either writes a bug report (if the program is broken) or rewrites the test. All that work by humans is manual. As far as I can tell, it requires the humans to use their brains (i.e. be sapient).
  • All software tests are automated. When you run a manual test, you might type in the inputs and look at the outputs, but everything that happens from the acceptance of those inputs to the display of the results of processing them is done by a computer under the control of a program. That makes every one of those tests automated.

The distinction between manual and automated is a false dichotomy. We are talking about a matter of degree (how much automation, how much manual), not a distinction of principle.

I think there is a distinction to be made between tests that testers use to learn things that they want to know versus tests that some people create or run even though they have no plan or expectation for getting information value from them. We could call this a distinction of sapience. But to tie this to the use of technology is misguided.

Thinking within the world of context-driven testing, there are plenty of contexts that cry out for HiVAT support. If we teach testers to be suspicious of test automation, to treat it as second-class (or worse), to think that only manual tests have the secret sauce of sapience, we will be preparing those testers to fail in any contexts that are best served with intense automation. That is not what we should teach as context-driven testing.

And then there is that strange dichotomization of testing and checking. As I understand the notion of checking, when I run a test I can check whether the program’s behavior conforms to behavior I expect in response to the test. Thus I can say, Let’s check whether this adds these numbers correctly and Let’s check the display and even Let’s check for memory leaks. Personally, I think these look like things I might want to find out while testing. I think testing is a search for quality-related information about a product or service under test, and if I execute the program with the goal of learning some quality-related information, that execution seems to me to be a test. Checking is something that I often do when I am doing what I think is good testing.

Suppose further that I have a question about the quality of the software that is well-enough formed, and my programming skills are strong enough, that I can write a program to do the checking for me and tell me the result. That is, suppose that on my instruction, my program runs the software under test in order to quickly provide an answer to my question. To me, this still looks like a test, even though it is automated checking.

But not everyone agrees. Instead, some people assert that checking is antithetical to testing (checking versus testing). They say that testing is a sapient activity done by humans but that checking is not testing.

If comparison of the program’s behavior to an anticipated result makes something a not-test, then what if we check whether the program’s behavior today is consistent with what it did last month? Is that a not-test? What if we check whether a program’s calculations are consistent with those done by a respected competitor? Or consistent with claims made in a specification? These are just more examples of cases in which testers might (often) have a reasonably clear set of expectations for the results of a test. When they have those expectations, they can assess the test result in terms of those expectations. Clearly, this is checking. But in the land of Sapient Testing, these particular expectations are enumerated as Consistency Heuristics that sapient testers rely on (e.g. http://www.developsense.com/blog/2012/07/few-hiccupps/). Apparently, at least some sapient testing is checking.

Let me try to clarify the distinction:

  • A test is checking to the extent that it is designed to produce results that can be compared, without ambiguity, to results anticipated by the test designer.
  • A test is not-checking to the extent that it can produce a result that is not anticipated by the test designer but is noticeable and informative to the person who interprets the test results.

From this, let me suggest a conclusion:

Most tests are checking. Most tests are also not-checking. Checking versus testing is another false dichotomy.

The value of high-volume automation is that it gives testers a set of tools for hunting for bugs that are too hard to hunt without tools. It lets us hunt for bugs that are very hard to find (think of calculation errors that occur on fewer than 0.00000001% of a function’s inputs, like Hoffman’s MASPAR bug). It lets us hunt for bugs that live in hard-to reach places (think of race conditions). It lets us hunt for bugs that cause intermittent failures that we (incorrectly) don’t think are possible or have no idea how to hunt with traditional methods (including manual exploration).

We do almost all of the high-volume test work with tools. In most cases, we compare test results with expected values. And yet, this is purposeful work being done by humans to answer quality-related questions. It is automated. It is checking. It is testing.

Tests are tests whether they are automated or not, whether they are checking or not. Tests can be excellent tests, and fully appropriate for their contexts, whether they are automated or not, whether they are checking or not. And tests can be worthless, or completely unsuited to their context, whether they are automated checks or intentionally-designed manual tests.

The basic premise of context-driven testing is that a test approach that works well in one context may fail utterly in some other context. There are no universal best practices (not even manual exploratory testing organized into testing sessions).

Years ago, one of the common pieces of advice we gave to testers who were interested in context-driven testing was to look for three counter-examples. If you were thinking about a really good practice, look for three situations in which an alternative would work much better. If you were thinking about a practice that you didn’t like (especially one that other people took more seriously than you considered reasonable), look for three situations in which that practice would be appropriate and valuable. I think we learned this from Brian Marick but I remember it taking hold with most (or all) of the folks who were early adopters of the context-driven approach. I came to see it as a core skill of context-driven testers. Drawing sharp lines, condemning common and often-valuable testing activities as non-testing or non-sapient, is completely out of step with this. It denies the importance of “context” in “context-driven.”

Censure people for disagreeing with us?

A few weeks ago, a colleague of mine tweeted an assertion that the context-driven testing school “censures” people who advocate for best practices.

I responded: “No one has the authority to censure people on behalf of the school” and “Surely we can disagree with people without censuring them.”

The discussion resurfaced on the “software-testing” listserv on yahoogroups, with statements like these:

  • “Context-driven is important as an antidote to the nonsense that otherwise pervades the industry. Consulting companies the world over are promoting best-practices. But there *are* no best practices. Therefore they are lying. They are promoting cynical, self-serving practices, actually.”
  • “I ‘know’ they are deliberately committing fraud for gain, as much as I can know anything in this industry.”
  • “To say there are best practices is to say something that is not true, and that any competent person knows is not true, and that any incompetent person is not qualified to be saying at all. It is indefensible, except that people who try to defend it generally say that they DIDN’T REALLY MEAN IT (that’s what Rex Black once told me) which is to say they were telling lies.”

A few people on software-testing–most notably Fiona Charles–stated their disagreement with these generalizations. But not many. Some agreed. Most were silent.

I think that calling people liars for advocating a commonly-espoused point of view is beyond unreasonable. To me, if seems like hate speech: an unfair and inaccurate characterization of many of the criticized group, likely to stir up negative emotions and to reinforce negative stereotypes.

Context-driven testing is about finding ways to do excellent testing in the actual context of the project. We adapt to the project. That includes adapting to the views and practices of the people doing that project. To do this, it is a fundamental skill for context-driven testers to be able to listen sympathetically, find common ground, and work effectively with people who have other points of view. Slamming groups of people as liars is antithetical to this. It promotes closed-mindedness, which for a context-driven tester, means ineffectiveness.

The underlying disagreement

Suppose that Joe says that in a certain situation, using Testing Technique X is a “best practice.” What does this mean?

1. The straw man

One interpretation is that Technique X is the genuinely best thing to do in this situation. That we know this to be true because we have done the enormous amount of research that would be needed to demonstrate its superiority.

Unfortunately, we don’t do this caliber of research in software engineering. It is difficult and expensive to do empirical research in our field, especially the type of research that can be credibly generalized to complex, real-world projects.

I think it should be obvious that under this definition, there are no best practices in software testing.

So when someone says that we should adopt some of Testing’s best practices, should we really think that this is what they mean?

Sometimes, some individuals give every indication of actually meaning exactly this. I see them peddle “best practices” to executives, government officials, and other people who have influence or authority but not enough technical sophistication. I think those individuals, when they do this, are behaving badly. Readers who know me will probably remember situations in which I have demonstrated that I’m not shy about confronting individuals and telling them that I think they are behaving badly.

But I don’t read these statements as being about a few, specific, unprincipled people. What I think I’m seeing is a general claim that anyone who talks about best practices is either a liar or a fool (incompetent, ignorant, take your pick).

I don’t think it is productive to assume that a popular assertion is made only by liars or fools. I think it is more likely that most of the people who make the assertion mean something else, something that is not obviously incorrect.

Context-driven testers often set aside the idea that any term in testing has One True Definition. I think you have to get passed the assumption (or insistence) that any term has One True Definition or you can’t do context-driven testing. People use the same words to mean different things.

For example, many advocates of context-driven testing point out that “test case” has lots of different meanings. When someone tells you that they want you to create “test cases”, there is no value in putting your favorite meaning into their mouth. You have to ask them what they mean or you will give them the “wrong” thing.

We learned this lesson as necessary for guiding our technical practice. Why is it so hard to learn it more generally?

Rather than putting a One True Definition into the mouth of someone who says “best practice”, I think the context-driven way is to ask them what they mean by that term, and if they use an unexpected definition, to say “Oh, now I know how to interpret what you are saying” instead of saying “If you use that term with that meaning, you are telling lies.”

2. An alternative interpretation

When Joe says that in this situation, using Technique X is a best practice, Joe might mean that he has used Technique X and it worked well, that he has seen Technique X used by others and it worked well, and/or he has read reports from people he trusts that Technique X worked well. If, in Joe’s knowledge and experience, Technique X has worked better, or more reliably, than anything else he’s seen tried, Joe is likely to call Technique X “a best practice.”

At least, that’s my experience. When I’ve asked software practitioners what they mean by “best practice”, they tell me about examples that they have experienced, seen or read about. When I press practitioners for scientific evidence, they don’t have any, they typically don’t claim to have any, and they often think I’m unreasonable for expecting them to have any. (It is difficult and expensive to do well-controlled scientific research in our field, so of course they don’t have any.)

So when Joe says X is a best practice, we could interpret what Joe says as an honestly-intended assertion that X is the best thing he is aware of for this situation and he has seen enough indications that it works well that he recommends it. In my experience, that’s probably what Joe actually means.

As a specific example (because his name has come up in this discussion), my impression from talks with Rex Black is that this is what he typically means by “best practice.” He certainly has more experience with some practices than others (so some of his recommendations are better-grounded than others), but I don’t doubt that he believes that he is giving good advice.

I don’t agree with all of Rex’s recommendations. I think he favors some practices that I think are pretty terrible. But I can disagree with the man without thinking he is a liar or a fool.

And if I am willing to set aside the straw man (the version of the definition that, if true, would make Rex a liar or a fool), then I can give myself the opportunity to learn from him.

The price of prejudice

Software development is a young field. Our knowledge of the field is evolving.

Let me put this more strongly. 100 years from now, when people look back at our beliefs and generalizations, they will call them “quaint” (which is a gentle way of saying “obviously wrong”).

We all work under conditions of uncertainty. We don’t know the “truth”, so we muddle through as well as we can.

This isn’t unique to software development. Professor Billy V. Koen (a winner of ASEE’s Sterling Olmsted Award) writes about the pervasiveness of heuristics in all areas of engineering: see his books: Discussion of the Method: Conducting the Engineer’s Approach to Problem Solving (2003) and the shorter but to me more powerfully written Definition of the Engineering Method (1985). Or read his historical article, Billy V. Koen: The engineering method and the heuristic: A personal history (“This was the beginning of a 37 year quest to find one thing that was not a heuristic.”)

When we work under uncertainty, we accept and apply ideas that will later turn out to be wrong. And we reject ideas that will later turn out to be reasonable.

I have personally experienced plenty of both. For example, I advocated for exploratory testing back when it was deeply unfashionable. As to my mistakes, read The Ongoing Revolution in Software Testing for some examples.

For the field to advance, we have to be willing to learn from people we disagree with–to accept the possibility that in some ways, they might be right.

When we reject people who disagree with us as liars and fools, we close our minds and we shut down discussion. We cut ourselves off from a kind of debate, and a kind of critical thinking, that I think is essential to the personal growth of senior practitioners and to the progress of the field.

So where is the difference between context-driven testing and best practices?

In my experience, people who espouse best practices (whatever they mean by that) start from the practice when they are deciding what to do. Whether they are fans of automated GUI-level test scripts or session-based test management, they start from the position that their preferred thing is probably the solution to your problem. Then they analyze the situation and probably adapt to it, which might mean modifying the practice or picking an alternative if the first choice won’t work in your situation.

In contrast, the context-driven tester starts from the situation. What’s going on here? What do people want? What are their constraints? After they have an understanding of much of the context, they decide what to do.

Some of my colleagues would disagree with my summary (I admit that it is unflattering), but as I see it:

  • They (best practice advocates) work from the solution to the problem (and perhaps try to change the problem to fit the solution)
  • We (context-driven advocates) work from the problem to the solution (trying to change the solution to fit the problem).

I think this is an important difference.

But we don’t have to demonize each other over it.

Who speaks for the context-driven school?

To this point, I’ve argued that it is wrong (meaning “bad”) to censure people who advocate for “best practices.”

There is another issue: Does the context-driven school censure these people?

My answer is no. The context-driven school doesn’t censure anyone and it misrepresents the school to claim otherwise.

So, who gets to say that “the context-driven community censures… anything?”

  • The founders of the school? — I think the most widely recognized founders would be Brian Marick, Bret Pettichord, James Bach and me. (Brian, James and I co-founded and co-moderated the software-testing list on egroups as the first “home” of the school. Bret, James and I wrote Lessons Learned in Software Testing.) Of these four, I think that Bret, Brian and I would flatly refuse to condemn groups of people for this view and that we would flatly refuse to call someone a liar or a fraud for believing something that we don’t believe or that we consider unsupportable.
  • The elected representatives of the school? — None exist.
  • The professional society that is the home of the school? — maybe the Association for Software Testing could claim this role. I strongly doubt they would issue such a condemnation. In particular, I think that at all but one of the Conferences of the Association for Software Testing (the exception was in the year James Bach chaired the conference), AST welcomed ASTQB as a corporate sponsor. Having been an executive of AST for much of that time, I can say that we would accept sponsorships from people we disagreed with, but not from people who we thought were liars and frauds. In general, AST has stood for the principle of honoring the diversity of ideas in our field even as it espouses a particular viewpoint.
  • The software-testing email listserv? We started that list as a safe place for context-driven sympathizers to think together. But we did not intend it to represent the context-driven community. There are no processes in place for making decisions based on the consensus of the list. And there has been no effort to ensure that everyone who holds a context-driven view feels welcome on the list. Some people came to the list and left because they didn’t like our style. Others were kicked off. Many never joined. Of the people who are on the list, relatively few write messages to the list, partially because some of them feel intimidated. At this point, I think the list is more a collection of fans of Bach & Bolton’s Rapid Software Testing approach than of context-driven testing. Therefore, I do not think this group has the authority to damn people on behalf of “the context-driven community” either.

I don’t think anyone has the authority to censure anyone else in the name of the context-driven school.

I think it’s fair to make statements like, “I don’t think X is consistent with context-driven views” or “I think these people are liars” — but those statements put the words in the mouth of the speaker, not the mouth of the school. That’s where those words belong.

Metrics, Ethics, & Context-Driven Testing (Part 2)

My last post responded to Michael Bolton’s: “Why Pass vs. Fail Rates Are Unethical“, Michael argued that calculating the ratio of passing tests to failing tests is irresponsible, unethical, unprofessional, unscientific and inhumane. I think this is an example of a growing problem in the rhetoric of context-driven testing–I think it considers too little the value of tailoring what we do to the project’s context. Instead, too often, I see a moralistic insistence on adoption of preferred practices or rejection of practices that we don’t like.

I think it’s easy to convert any disagreement about policy or practice into a disagreement about ethics. I think this is characteristic of movements that are maturing into orthodox rigidity. Unfortunately, I think that’s fundamentally incompatible with a contextualist approach. My post advocated for dialing the rhetoric back, for a stronger distinction between disagreeing with someone and morally condemning them.

Michael responded with a restatement that I think is even more extreme.

I think the best way to answer this is with a series of posts (and perhaps some discussion) rather than one excessively long screed.

(Added 3/22/12) Michael and I plan to discuss this soon. My next post will be informed by that discussion.

The core messages of this first post are fairly simple:

Executives are Entitled and Empowered to Choose their Metrics

Several years ago, I had a long talk about metrics with Hung Quoc Nguyen. Hung runs LogiGear, a successful test lab. He was describing to me some of the metrics that his clients expected. I didn’t like some of these metrics and I asked why he was willing to provide them. Hung explained that he’d discussed this with several executives. They understood that the metrics were imperfect. But they felt that they needed ways to summarize what the organization knew about projects. They felt they needed ways to compare progress, costs, priorities, and risks. They felt they needed ways to organize the information so that they could compare several projects or groups at the same time. And they felt they needed to compare what was happening now to what had happened in the past. Hung then made three points:

  1. These are perfectly legitimate management goals.
  2. Quantification (metrics) is probably necessary to achieve these goals.
  3. The fact that there is no collection of metrics that will do this perfectly (or even terribly well) doesn’t eliminate the need. Without a better alternative, managers will do the best they can with what they’ve got.

Hung concluded that his clients were within their rights to ask for this type of information and that he should provide it to them.

If I remember correctly, Hung also gently chided me for being a bit of a perfectionist. It’s easy to refuse to provide something that isn’t perfect. But that’s not helpful when the perfect isn’t available. He also suggested that when it comes to testers or consultants offering a “better alternative”, every executive has both the right and the responsibility to decide which alternative is the better one for her or his situation.

By this point, I had joined Florida Tech and wasn’t consulting to clients who needed metrics, so I had the luxury of letting this discussion settle in my mind for a while before acting on it.

Finance Metrics Illustrate the Executives’ Context

A few years later, I started studying quantitative finance. I am particularly interested in the relationship between model evaluation in quantitative finance and exploratory testing. I also have a strong personal interest–I apply what I learn to managing my family’s investments.

The biggest surprise for me was how poor a set of core business metrics the investors have to work with. I’m thinking of the numbers in balance sheets, statements of cash flow, and income statements, and the added details in most quarterly and most annual investment reports. These paint an incomplete, often inaccurate picture of the company. The numbers are so subject to manipulation, and present such an incomplete view, that it can be hard to tell whether a company was actually profitable last year or how much their assets are actually worth.

Investors often supplement these numbers with qualitative information about the company (information that may or may not present a more trustworthy picture than the numbers). However, despite the flaws of the metrics, most investors pay careful attention to financial reports.

I suppose I should have expected these problems. My only formal studies of financial metrics (courses on accounting for lawyers and commercial law) encouraged a strong sense of skepticism. And of course, I’ve seen plenty of problems with engineering metrics.

But it was still a surprise that people actually rely on these numbers. People invest enormous amounts of money on the basis of these metrics.

It would be easy to rant against using these numbers. They are imperfect. They can be misleading. Sometimes severely, infuriatingly, expensively misleading. So we could gather together and have a nice chant that using these numbers would be irresponsible, unethical, unprofessional, unscientific and inhumane.

But in the absence of better data, when I make financial decisions (literally, every day), these numbers guide my decisions. It’s not that I like them. It’s that I don’t have better alternatives to them.

If someone insisted that I ignore the financial statistics, that using them would be irresponsible, unethical, unprofessional, unscientific, and inhumane, I would be more likely to lose respect for that person than to stop using the data.

Teaching Metrics

I teach software metrics at Florida Tech. These days, I start the course with chapters from Tockey’s Return on Software: Maximizing the Return on Your Software Investment. We study financial statistics and estimate future cost of a hypothetical project. The students see a fair bit of uncertainty. (They experience a fair bit of uncertainty–it can be a difficult experience.) I do this to help my students gain a broader view of their context.

When an executive asks them for software engineering metrics, they are being asked to provide imperfect metrics to managers who are swimming in a sea of imperfect metrics.

It is important (I think very important) to pay attention to the validity of our metrics. It is important to improve them, to find ways to mitigate the risks of using them, and to advise our clients about the characteristics and risks of the data/statistics we supply to them. I think it’s important to use metrics in ways that don’t abuse people. There are ethical issues here, but I think the blanket condemnation of metrics like pass/fail ratios does not begin to address the ethical issues.

The Principles

In the context-driven principles, we wrote (more precisely, I think, I wrote) “Metrics that are not valid are dangerous.” I still mostly (*) agree with these words but I think it is too easy to extend the statement into a position that is dogmatic and counterproductive. If I was writing the Principles today, I would reword this statement in a way that acknowledges the difficulty of the problem and the importance of the context.

(*) The statement that “Metrics that are not valid” is inaccurately absolute. It is not proper to describe a metric as valid (see Trochim and Shadish, Cook & Campbell, for example). Rather, we should talk about metrics as more valid or less valid (shades of gray). The wording “not valid” was a simplification at the time, and in retrospect, should be seen as an oversimplification.

Contexts differ: Recognizing the difference between wrong and Wrong

Contexts differ.

  • Testers provide information to our clients (stakeholders) about the product, about how we tested it, and about what we found.
  • Our clients get to decide what information they want. We don’t get to decide that for them.
  • Testers provide services to software projects. We don’t run the projects. We don’t control those projects’ contexts.

In context-driven testing, we respect the fact that contexts differ.

What Does “contexts differ” Really Mean?

I think it means that in different contexts, the people who are our clients:

  • are going to want different types of information
  • are going to want us to prioritize our work differently
  • are going to want us to test differently, to mitigate different risks, and to report our results in different ways.

Contexts don’t just differ for the testers. They differ for the project managers too. The project managers have to report to other people who want whatever information they want.

We don’t manage the project managers. We don’t decide what information they have to give to the people they report to.

Sometimes, Our Clients Want Metrics

Sometimes, a client will ask how many test cases the testers have run:

  • I don’t think this is a very useful number. It can be misleading. And if I organize my testing with this number in mind, I might do worse testing.
  • So if a client asks me for this number, I might have a discussion with her or him about why s/he thinks s/he needs this statistic and whether s/he could be happy with something else.
  • But if my client says, “No, really, I need that number“, I say, OK and give the number.

Sometimes a client will ask about defect removal efficiency:

  • I think this is a poor excuse for a metric. I have a nice rant about it when I teach my graduate course in software metrics. Bad metric. BAD!
  • If a client asks for it, I am likely to ask, Are you sure? If they’re willing to listen, I explain my concerns.

But defect removal efficiency (DRE) is a fairly popular metric. It’s in lots of textbooks. People talk about it at conferences. So no matter what I say about it, my client might still want that number. Maybe my client’s boss wants it. Maybe my client’s customer wants it. Maybe my client’s regulator wants it. This is my client’s management context. I don’t think I’m entitled to know all the details of my client’s working situation, so maybe my client will explain why s/he needs this number and maybe s/he won’t.

So if the client says, “No, really, I need the DRE“, I accept that statement as a description of my client’s situation and I say, OK and give the number.

One more example: ratio of passing to failing tests. Michael Bolton presents several reasons for disliking this metric and I generally agree with them. In particular, I don’t know what the ratio measures (it has no obvious construct validity). And if the goal is to make the number big, there are lots of ways to achieve this that yield weak testing (see Austin on measurement dysfunction, for discussion of this type of problem.)

But Michael takes it a step further and says that using this metric, or providing it to a client is unethical.

UNETHICAL?

Really?  UNETHICAL?!?

If you give this metric to someone (after they ask, and you say it’s not very good, and they say, really-I-want-it):

  • Are you lying?
  • Are you taking something that doesn’t belong to you?
  • Are you oppressing someone? Intimidating them?
  • Are you hurting someone?
  • Are you cheating anyone?
  • Are you pretending to skills or knowledge that you don’t have?
  • Are you helping someone else lie, cheat, steal, intimidate, or cause harm?

I used to associate shrill accusations of unethicalness with conservatives who were losing control of the hearts and minds of the software development community and didn’t like it, or who were pushing a phony image of community consensus as part of their campaigns to get big contracts, especially big government contracts, or who were using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.

Maybe you’ve met some of these people. They said things like:

  • It is unethical to write code if you don’t have formal, written requirements
  • It is unethical to test a program if you don’t have written specifications
  • It is unethical to do exploratory testing
  • It is unethical to manage software projects without formal measurement programs
  • It is unethical to count lines of code instead of using function points
  • It is unethical to count function points instead of lines of code
  • It is unethical to not adopt best practices
  • It is unethical to write or design code if you don’t have the right degree or certificate
  • It should be unethical to write code if you don’t have a license

It seemed to me that some of the people (but not all of the people) who said these things were trying to prop up a losing point of view with fear, uncertainty, doubt — they were using demagoguery as their marketing technique. That I saw as unethical.

Much of my contribution to the social infrastructure of software testing was a conscious rebellion against a closed old boys network that defended itself with dogma and attacked non-conformers as unethical.

wrong versus Wrong

So what’s with this “Using a crummy metric is unethical” ?

Over the past couple of years, I’ve seen a resurgence of ethics-rhetoric. A new set of people have a new set of bad things to condemn:

  • Now it seems to be unethical to have a certification in software testing that someone doesn’t like
  • Now it seems to be unethical to follow a heavyweight (heavily documented, scripted) style of testing
  • Now it seems to be unethical to give a client some data that the client asks for, like a ratio of passing tests to failing ones.

I don’t think these are usually good ideas. In fact, most of the time, I think they’re wrong.

But  _U_N_E_T_H_I_C_A_L_?_!_?

I’m not a moral relativist. I think there is evil in the world and I sometimes protest loudly against it. But I think it is essential to differentiate between:

  • someone is wrong (mistaken)
  • someone is wrong (attempting something that won’t work), and
  • someone is Wrong (unethical).

Let me illustrate the difference. Michael Bolton is a friend of mine. I have a lot of respect for him as a person, including as an ethical being. His blog post is a convenient example of something I think is a broader problem, but please read my comments on his article as an assertion that I think Michael is wrong (not Wrong).

To the extent that we lose track of the difference between wrong and Wrong, I think we damage our ability to treat people who disagree with us with respect. I think we damage our ability to communicate about our professional differences. I think we damage our ability to learn, because the people we most agree with probably have fewer new things to teach us than the people who see the world a little differently.

The difference between wrong and Wrong is especially important for testers who want to think of ourselves (or market ourselves) as context-driven.

Because we understand that what is wrong in some contexts is right in some others.

Contexts differ.

Context-driven testing is not a religion

So, did I really say that context-driven testing is dead? No, that was some other guy (Scott Barber) who’s using the buzz to launch a different idea. It’s effective marketing, and Scott has interesting ideas. But that’s his assertion, not mine.

What I wrote a few days ago was this:

If there ever was one context-driven school, there is not one now.

A “school” provides an organizing social structure for a body of attitudes and knowledge. Schools are often led by one or a few highly visible people.

Over the past few years, several people have gained visibility in the testing community who express ideas and values that sound context-driven to me. Some call themselves context-driven, some don’t. My impression is that some are being told they are not welcome. Others are uncomfortable with a perceived orthodoxy. They like the approach but not the school. They like the ideas, but not the politics.

The context-driven school appeared for years to operate with unified leadership. This appearance was a strength. But it was never quite true: Brian and Bret left early (but they left quietly). I’ve repeatedly raised concerns about the context-driven rhetoric, but relatively quietly. James and I haven’t collaborated successfully for years–this is old news–but for most of that time, our public disagreements were pretty quiet.

I think it is time to go beyond the past illusion of unity, to welcome a new generation of leadership. Not just a new generation of followers. A new generation of leaders. And to embrace their diversity.

There is not one school. There might be none. There might be several. I’m not sure what our real status is today. There will be an evolution and I look forward to seeing the result.

For now, I continue to be enthusiastic about the approach. I still endorse the principles. But what I understand to be the meanings and implications of the principles might not be exactly the same as what you understand. I think that’s OK.

In terms of the politics of The One School, my perception is of an exclusionary tone that has become more emphatic over time. I think this can make good marketing–entertaining presentations, lots of excitement. But does it serve its community? What is the impact on the people who are actually doing the testing: looking for work; looking for advancement in their own careers; striving to increase their skills and professionalism?

For many people, the impact is minimal–they follow their own way.

But for people who align themselves with the school, I think there are risks.

I wasn’t able to travel to CAST last year (health problem), so I watched sessions on video. Watching remotely let me look at things with a different perspective. One of the striking themes in what I saw was a mistrust of test automation. Hey, I agree that regression test automation is a poor bases for an effective comprehensive testing strategy, but the mistrust went beyond that. Manual (session-based, of course) exploratory testing had become a Best Practice.

In the field of software development, I think that people who don’t know much about how to develop software are on a path to lower pay and less job security. Testing-process consultants can be very successful without staying current in these areas of knowledge and skill. But the people they consult to? Not so much.

It was not the details that concerned me. It was the tone. I felt as though I was watching the closing of minds.

I have been concerned about this ever since people in our community (not just our critics–us!) started drawing an analogy between context-driven testing and religion.

As James put it in 2008, “I have my own testing religion (the Context-Driven School).” I objected to it back then, and since then. This is deeply inconsistent with what I signed up for when we declared a school.

An analogy to religion often carries baggage: Divine sources of knowledge; Knowledge of The Truth; Public disagreement with The Truth is Heresy; An attitude that alternative views are irrelevant; An attitude that alternative views are morally wrong.

Here’s an illustration from James’ most recent post:

“One of the things that concerns Cem is the polarization of the craft. He doesn’t like it, anymore. I suppose he wants more listening to people who have different views about whether there are best practices or not. To me, that’s unwise. It empties the concept of much of its power. And frankly, it makes a mockery of what we have stood for. To me, that would be like a Newtonian physicist in the 1690’s wistfully wishing to “share ideas” with the Aristotelians. There’s no point. The Aristotelians were on a completely different path.

This illustrates exactly what troubles me. In my view, there are legitimate differences in the testing community. I think that each of the major factions in the testing community has some very smart people, of high integrity, who are worth paying attention to. I’ve learned a lot from people who would never associate themselves with context-driven testing.

Let me illustrate that with some notes on my last week (Feb 27 to March 2):

  • My students and I reviewed Raza Abbas Syed’s M.Sc. thesis in Computer Science: Investigating Intermittent Software Failures. The supervisor of this work was Dr. Laurie Williams. If she identified herself with any school of software testing, it would probably be Agile, not Context-Driven. But, not surprisingly, the work presented some useful data and suggested interesting ideas. I learned things from it. Should I really stop paying attention to Laurie Williams?
  • Yesterday, Dr. Keith Gallagher gave a guest lecture in my programmer-testing course on program slicing (see Gallagher & Lyle 1991and Gallagher & Binkley, 1996). This is a cluster of testing/maintenance techniques that haven’t achieved widespread adoption. The tools needed to support that adoption don’t exist yet. Creating them will be very difficult. This is classic Analytical School stuff. But his lecture made me want to learn more about it because it presents a glimpse of an interesting future.
  • This evening, I’m reading Kasurinen, Taipale & Smolander’s paper, Software Test Automation in Practice: Empirical Observations. One of my students and I will work through it tomorrow. I’m not sure how these folks would classify themselves (or if they would). Probably if they had to self-classify, it would be Analytical School. Comparing myself to a modern Newtonian physicist and them to outdated Aristoteleans strikes me as one part arrogant and five parts wrong.

I think it’s a Bad Idea to alienate, ignore, or marginalize people who do hard work on interesting problems.

James says later in his post,

“We must have the stomach to keep moving along with our program regardless of the huddled masses who Don’t Get It.

I respect the right of any individual to seek his or her own level of ignorance.

But I see it as a disservice to the craft when thought-leaders encourage narrow-mindedness in the people who look to them for guidance.

When I was an undergraduate, I studied mainly math and philosophy. Of the philosophy, I studied mainly Indian philosophy, about 5 semesters’ worth. My step-grandmother was a Buddhist. Friends of mine had consistent views. I was motivated to take the ideas seriously.

One of the profound ideas in those courses was a rejection of the law of the excluded middle. According to that law, if A is a proposition, then A must be true or Not-A must be true (but not both). Some of the Indian texts rejected that. They demanded that the reader consider {A and Not-A} and {neither A nor Not-A}. In terms of the logic of mathematics, this makes no sense (and it is not a view I associate with Indian logicians). But in terms of human affairs, I think the rejection of the law of the excluded middle is a powerful cognitive tool.

I have thought that for about 40 years. I brought that with me in my part of the crafting of the context-driven principles. Something can be the right thing for your context and its opposite can be the right thing for my context.

I think we need to look more sympathetically at more contexts and more solutions. To ask more about what is right with alternative ideas and what we can learn from them. And to develop batteries of skills to work with them. For that, I think we need to get past the politics of The One School of context-driven testing.