Last year, Rex Black and I debated the legitimacy of the idea of schools of software testing.
We now have an audio recording of the debate and have merged slides with it to create a video see my notes at kaner.com for both.
Last year, Rex Black and I debated the legitimacy of the idea of schools of software testing.
We now have an audio recording of the debate and have merged slides with it to create a video see my notes at kaner.com for both.
Please sign this petition: http://www.ipetitions.com/petition/stop29119
I rarely rant about software engineering standards because it seems like a waste of time. Many of the participants in the software engineering standards movement are fine people who I respect. Some of them I call friends, or would be happy to have as friends. But several groups stand to benefit from being able to claim that they are following, selling, or training a collection of processes that are simplistic and easily described, even if they are ineffective and enormously wasteful. These groups can afford to invest a lot of money dominating the standards committees that, in turn, have come to serve their interests. They can also afford to invest a lot in public relations to promote the perceived legitimacy of those committees’ work.
My experiences with the IEEE software engineering standards (which are the main basis for these ISO standards), began when I first came to Silicon Valley in 1983. They have been uniformly negative. I finally left IEEE in 2010 or 2011, at that point a Senior Member who had been recognized by IEEE for my work on their standards and even been appointed by Congress to the United States’ Election Assistance Commission’s Technical Guidelines Development Committee at IEEE’s request. (TGDC wrote technical standards and much of its work was guided by an IEEE standard that I had worked on.) I left IEEE as a protest against a software engineering standards process that I see as a closed vehicle that serves the interests of a relatively small portion of the software engineering community.
Context-driven testing developed as the antithesis of what is being pushed through ISO. They represent opposite points of view.
Standards are political documents and sometimes legal ones. The existence of a standard makes it easier for a court (or a regulator) to rule that the standard-approved approach is the professionally correct one, and the non-approved approaches (or the ones that conflict with the approved one) are professionally incorrect and therefore improper. The imposition of a standard that imposes practices and views on a community that would not otherwise agree to them, is a political power play.
If ISO 29119 is adopted and then broadly accepted as a legitimate description of good professional practice in software testing, then context-driven testing will be an example of something you should not do, a way of thinking you should avoid.
I don’t think it will do much good to sign a petition against ISO 29119, but I would rather say that I protested against it than simply accept its consequences in silence.
I recommend that you do the same.
At STPCon last week, Rex Black and I had a debate on “Schools of Testing: Useful Paradigm or Negative Influence?” The STP folks will be posting an audio recording of the debate one of these days. Along with those, here are my notes (the materials I put together to prepare for the debate) (http://kaner.com/wp-content/uploads/2014/04/SchoolsStpConSlides.pdf).
Not surprisingly, Rex and I found a lot to disagree about, and we found some common ground.
I think there’s a lot of potential value in having more of these public discussions. One of the key reasons that we originally proposed the idea of “schools of thought” in software testing was to clarify the differences and provide structure for discussions about those.
In the long run, the best ideas of the context-driven school will prove wrong. That’s how it works in scientific/empirical enterprises. We come up with great ideas that get replaced by better ones. (Similar note: https://flowchainsensei.wordpress.com/2014/03/11/i-dont-want-agile-back/.)
Evolution of ideas happens
I don’t usually let myself get dragged into the TwitterStorms orchestrated by the Rabid Software Testing crowd. But I decided to make an exception for this one because it has sucked some good people into endorsing what I think is a very bad petition.
In essence, the petition asks ISTQB to open its quality-control records. ISTQB is involved in training and certifying software testers, so I will write this to you (the reader) as if you are a software tester. The basic question is whether ISTQB has seen any problems in their exams and what they have done about it. (Underneath that are some more specifically-pointed details, which I’ll come to later.)
In many countries (such as the United States), companies have a right to study the quality of their goods or services in private. That’s what I understand that ISTQB has been doing.
By the way, that’s what companies do when they test their software. Those companies (ISTQB, and software companies) have a right to hire employees and consultants and to require those people to keep what they’ve learned private. When you test your company’s or client’s software, it is normally a condition of your job or your consultancy that you keep your results private.
This petition asks ISTQB to do what (for the most part) the companies you work for would not do, and would not dream of letting you do with their quality-control data.
When I studied law, I learned that the public policy of the United States (and many other countries) favors investigative privilege. That is, we encourage companies to conduct aggressive, detailed internal investigations, to find their own problems and to improve their products, their services and their advertising based on what they learn from the investigation. We want them to investigate because we want the improvement.
These types of studies expose the companies to risk–people who get their hands on the investigative data, and don’t like those company that investigates itself, can use any report of any problem, whether they understand the problem or not, whether the problem is actually major or not, whether the company has dealt with the problem in a reasonable way or not–to attack the company. Therefore, to encourage companies to do the studies, society grants them privacy–we treat the studies as trade secret–because the companies won’t do serious studies without privacy.
When I practiced commercial law, my most significant project involved drafting legislation about the law of software quality. My most significant contribution was a very narrow disclosure rule that would preserve software vendors’ incentives to do internal investigations (i.e. test their own software harshly) while holding them accountable for significant defects that they knew about but did not make public. It took 6 years to craft the language for this. It took 6 years because the idea of almost-forced disclosure of (some) test results triggered allergic reactions among lawyers who normally defended companies, among lawyers who normally attacked companies, and among judges. Finding a way to balance the interests here was enormously difficult because the support for allowing companies to investigate their products and services in private runs so very deep. And, in my view, it should.
And if you want to be able to do significant work as testers, you should support that privilege too.
And that means you should show some respect for that privilege, even when the company asserting it is a company that you (or some people who impress you by screaming at others in public) don’t like.
Now, we have a petition from someone who frequently attacks ISTQB, asking them to open their private quality-control records associated with their certification exams. Will I sign it?
No, Keith. I will not sign that petition.
Back in 2010, I saw some materials that allegedly summarized unauthorized disclosures from an ISTQB meeting. I suspect that this (2010) material underlies the specificity of the questions in this (2013) petition to ISTQB.
The materials were not authenticated and there might not be any truth in them at all. The gist of the materials was that ISTQB had commissioned a psychometric study of one or more of their tests, that a “test reliability coefficient” was a bit low, and that ISTQB was planning to use this information to improve the quality of their exams.
I believe that this is the type of research that we, as a society (and as a profession of testers), want to see done. Look for problems. When you find problems, raise them with other decision-makers in the organization and look for ways to improve them.
To keep this going, society respects these studies as trade secret–grants them privacy. There is no expectation, zero, none, that they will tell us about the studies or what they did to address them.
(There are some exceptions to that rule. I will note them later, to suggest that they are probably not relevant here.)
Certain people encouraged me to get involved in attacking ISTQB with these allegations. I refused.
One thing I said to those people, back in 2010, was:
“I’m not a huge fan of attacking competitors. I fall into it, but I think it is a distraction. I think it adds to the personality circus of the field and detracts from the content of the field.”
Another of the things I said was:
“As to the ethics, I am slightly outraged by the fact that ISTQB apparently will not make this public. But I am also slightly outraged that this was leaked, if it was. You’re being offered the opportunity to play the role of one hostile competitor viciously attacking another competitor. Your attack will not look ethical.”
A little later, I looked more carefully at this “test reliability coefficient”. I am not an expert in this area of statistics, but I know a little bit.
I’ve dealt with lots of metrics like this in my career. They are good enough for private, informal, self-evaluation or evaluation by a friendly coach. But the idea of attacking a company based on not-terrible-but-not-great values of these statistics, well, I don’t think that’s a valid use of this statistic. Not without a whole lot of other converging evidence.
You may have seen me quoted about these types of metrics once or twice. The quote goes:
“Metrics that are not valid are dangerous.”
So, once I studied the statistic a little, I was no longer even a little bit outraged that ISTQB wasn’t making the data public. It was not only their privilege to keep it private. Keeping it private was also, in my view, reasonable.
I think that using metrics like this to publicly attack people (or companies) is unethical. Even if they are people or companies that I disagree with, or don’t like.
So, Keith, No, Keith. I will not sign that petition.
The Big Public Attack on ISTQB over this didn’t happen in 2010 and I expected never to hear of these allegations again.
Sometimes, society forces companies to disclose their private data.
For example, if a company commits crimes (you know, like rigging LIBOR, or engaging in tax evasion, or laundering money), its victims (and prosecutors representing society as a whole) can demand to see relevant records and the courts will enforce the demand.
Suppose that ISTQB made specific comments about the reliability of their tests, and they were in possession of data that indicated that those specific comments were false. That would be fraud. In that case, it should be possible to obtain ISTQB’s internal data.
I’m not a big fan of ISTQB’s syllabus or exams, or of their marketing. My attitudes are not as extreme as some other people’s. Some twits seem to believe that ISTQB’s materials are worthless, that everyone finds them worthless, and that no reasonable person thinks the exams or the certifications are worth anything.
That’s not exactly my evaluation, but I have no reason to doubt the integrity of these other people, or doubt their experience.
But because I’m skeptical of ISTQB, every now and again, I look at their marketing materials. I haven’t looked at everything. I haven’t even looked at most of the materials. But in what I have seen, even though I don’t agree with the views being expressed or the conclusions they would have me draw:
So. Keith. No, Keith. Absolutely not, Keith. I will not sign that petition.
I encourage you to withdraw it.
Last weekend, I attended the 12th annual Workshop on Teaching Software Testing (WTST 2013). This year, we focused on high-volume automated testing (and how to teach it).
High-volume automated testing (HiVAT) refers to a family of testing techniques that enable the tester to create, run and evaluate the results of arbitrarily many tests.
We had some great presentations and associated discussions. For example:
Personally, I found this very instructive. I learned new ideas, gained new insights, stretched the bounds of what I see as high-volume test automation and learned about new contexts in which the overall family can be applied (and how those contexts differ from the ones I’m most familiar with).
A few of times during the meeting, we were surprised to hear echos of an antiautomation theme that has been made popular by a few testing consultants. For example:
I don’t much care whether someone decides to to politicize common words as part of their marketing or, more generally, to hate on automated testing. Over time, I think this is probably self-correcting. But some people want to make the additional assertion that the use of (allegedly) non-sapient (allegedly) non-tests conflicts with the essence of context-driven testing. It’s at that point that, as one of the founders of the context-driven school, I say “hooey!”.
The distinction between manual and automated is a false dichotomy. We are talking about a matter of degree (how much automation, how much manual), not a distinction of principle.
I think there is a distinction to be made between tests that testers use to learn things that they want to know versus tests that some people create or run even though they have no plan or expectation for getting information value from them. We could call this a distinction of sapience. But to tie this to the use of technology is misguided.
Thinking within the world of context-driven testing, there are plenty of contexts that cry out for HiVAT support. If we teach testers to be suspicious of test automation, to treat it as second-class (or worse), to think that only manual tests have the secret sauce of sapience, we will be preparing those testers to fail in any contexts that are best served with intense automation. That is not what we should teach as context-driven testing.
And then there is that strange dichotomization of testing and checking. As I understand the notion of checking, when I run a test I can check whether the program’s behavior conforms to behavior I expect in response to the test. Thus I can say, Let’s check whether this adds these numbers correctly and Let’s check the display and even Let’s check for memory leaks. Personally, I think these look like things I might want to find out while testing. I think testing is a search for quality-related information about a product or service under test, and if I execute the program with the goal of learning some quality-related information, that execution seems to me to be a test. Checking is something that I often do when I am doing what I think is good testing.
Suppose further that I have a question about the quality of the software that is well-enough formed, and my programming skills are strong enough, that I can write a program to do the checking for me and tell me the result. That is, suppose that on my instruction, my program runs the software under test in order to quickly provide an answer to my question. To me, this still looks like a test, even though it is automated checking.
But not everyone agrees. Instead, some people assert that checking is antithetical to testing (checking versus testing). They say that testing is a sapient activity done by humans but that checking is not testing.
If comparison of the program’s behavior to an anticipated result makes something a not-test, then what if we check whether the program’s behavior today is consistent with what it did last month? Is that a not-test? What if we check whether a program’s calculations are consistent with those done by a respected competitor? Or consistent with claims made in a specification? These are just more examples of cases in which testers might (often) have a reasonably clear set of expectations for the results of a test. When they have those expectations, they can assess the test result in terms of those expectations. Clearly, this is checking. But in the land of Sapient Testing, these particular expectations are enumerated as Consistency Heuristics that sapient testers rely on (e.g. http://www.developsense.com/blog/2012/07/few-hiccupps/). Apparently, at least some sapient testing is checking.
Let me try to clarify the distinction:
From this, let me suggest a conclusion:
Most tests are checking. Most tests are also not-checking. Checking versus testing is another false dichotomy.
The value of high-volume automation is that it gives testers a set of tools for hunting for bugs that are too hard to hunt without tools. It lets us hunt for bugs that are very hard to find (think of calculation errors that occur on fewer than 0.00000001% of a function’s inputs, like Hoffman’s MASPAR bug). It lets us hunt for bugs that live in hard-to reach places (think of race conditions). It lets us hunt for bugs that cause intermittent failures that we (incorrectly) don’t think are possible or have no idea how to hunt with traditional methods (including manual exploration).
We do almost all of the high-volume test work with tools. In most cases, we compare test results with expected values. And yet, this is purposeful work being done by humans to answer quality-related questions. It is automated. It is checking. It is testing.
Tests are tests whether they are automated or not, whether they are checking or not. Tests can be excellent tests, and fully appropriate for their contexts, whether they are automated or not, whether they are checking or not. And tests can be worthless, or completely unsuited to their context, whether they are automated checks or intentionally-designed manual tests.
The basic premise of context-driven testing is that a test approach that works well in one context may fail utterly in some other context. There are no universal best practices (not even manual exploratory testing organized into testing sessions).
Years ago, one of the common pieces of advice we gave to testers who were interested in context-driven testing was to look for three counter-examples. If you were thinking about a really good practice, look for three situations in which an alternative would work much better. If you were thinking about a practice that you didn’t like (especially one that other people took more seriously than you considered reasonable), look for three situations in which that practice would be appropriate and valuable. I think we learned this from Brian Marick but I remember it taking hold with most (or all) of the folks who were early adopters of the context-driven approach. I came to see it as a core skill of context-driven testers. Drawing sharp lines, condemning common and often-valuable testing activities as non-testing or non-sapient, is completely out of step with this. It denies the importance of “context” in “context-driven.”
A few weeks ago, a colleague of mine tweeted an assertion that the context-driven testing school “censures” people who advocate for best practices.
I responded: “No one has the authority to censure people on behalf of the school” and “Surely we can disagree with people without censuring them.”
The discussion resurfaced on the “software-testing” listserv on yahoogroups, with statements like these:
A few people on software-testing–most notably Fiona Charles–stated their disagreement with these generalizations. But not many. Some agreed. Most were silent.
I think that calling people liars for advocating a commonly-espoused point of view is beyond unreasonable. To me, if seems like hate speech: an unfair and inaccurate characterization of many of the criticized group, likely to stir up negative emotions and to reinforce negative stereotypes.
Context-driven testing is about finding ways to do excellent testing in the actual context of the project. We adapt to the project. That includes adapting to the views and practices of the people doing that project. To do this, it is a fundamental skill for context-driven testers to be able to listen sympathetically, find common ground, and work effectively with people who have other points of view. Slamming groups of people as liars is antithetical to this. It promotes closed-mindedness, which for a context-driven tester, means ineffectiveness.
Suppose that Joe says that in a certain situation, using Testing Technique X is a “best practice.” What does this mean?
One interpretation is that Technique X is the genuinely best thing to do in this situation. That we know this to be true because we have done the enormous amount of research that would be needed to demonstrate its superiority.
Unfortunately, we don’t do this caliber of research in software engineering. It is difficult and expensive to do empirical research in our field, especially the type of research that can be credibly generalized to complex, real-world projects.
I think it should be obvious that under this definition, there are no best practices in software testing.
So when someone says that we should adopt some of Testing’s best practices, should we really think that this is what they mean?
Sometimes, some individuals give every indication of actually meaning exactly this. I see them peddle “best practices” to executives, government officials, and other people who have influence or authority but not enough technical sophistication. I think those individuals, when they do this, are behaving badly. Readers who know me will probably remember situations in which I have demonstrated that I’m not shy about confronting individuals and telling them that I think they are behaving badly.
But I don’t read these statements as being about a few, specific, unprincipled people. What I think I’m seeing is a general claim that anyone who talks about best practices is either a liar or a fool (incompetent, ignorant, take your pick).
I don’t think it is productive to assume that a popular assertion is made only by liars or fools. I think it is more likely that most of the people who make the assertion mean something else, something that is not obviously incorrect.
Context-driven testers often set aside the idea that any term in testing has One True Definition. I think you have to get passed the assumption (or insistence) that any term has One True Definition or you can’t do context-driven testing. People use the same words to mean different things.
For example, many advocates of context-driven testing point out that “test case” has lots of different meanings. When someone tells you that they want you to create “test cases”, there is no value in putting your favorite meaning into their mouth. You have to ask them what they mean or you will give them the “wrong” thing.
We learned this lesson as necessary for guiding our technical practice. Why is it so hard to learn it more generally?
Rather than putting a One True Definition into the mouth of someone who says “best practice”, I think the context-driven way is to ask them what they mean by that term, and if they use an unexpected definition, to say “Oh, now I know how to interpret what you are saying” instead of saying “If you use that term with that meaning, you are telling lies.”
When Joe says that in this situation, using Technique X is a best practice, Joe might mean that he has used Technique X and it worked well, that he has seen Technique X used by others and it worked well, and/or he has read reports from people he trusts that Technique X worked well. If, in Joe’s knowledge and experience, Technique X has worked better, or more reliably, than anything else he’s seen tried, Joe is likely to call Technique X “a best practice.”
At least, that’s my experience. When I’ve asked software practitioners what they mean by “best practice”, they tell me about examples that they have experienced, seen or read about. When I press practitioners for scientific evidence, they don’t have any, they typically don’t claim to have any, and they often think I’m unreasonable for expecting them to have any. (It is difficult and expensive to do well-controlled scientific research in our field, so of course they don’t have any.)
So when Joe says X is a best practice, we could interpret what Joe says as an honestly-intended assertion that X is the best thing he is aware of for this situation and he has seen enough indications that it works well that he recommends it. In my experience, that’s probably what Joe actually means.
As a specific example (because his name has come up in this discussion), my impression from talks with Rex Black is that this is what he typically means by “best practice.” He certainly has more experience with some practices than others (so some of his recommendations are better-grounded than others), but I don’t doubt that he believes that he is giving good advice.
I don’t agree with all of Rex’s recommendations. I think he favors some practices that I think are pretty terrible. But I can disagree with the man without thinking he is a liar or a fool.
And if I am willing to set aside the straw man (the version of the definition that, if true, would make Rex a liar or a fool), then I can give myself the opportunity to learn from him.
Software development is a young field. Our knowledge of the field is evolving.
Let me put this more strongly. 100 years from now, when people look back at our beliefs and generalizations, they will call them “quaint” (which is a gentle way of saying “obviously wrong”).
We all work under conditions of uncertainty. We don’t know the “truth”, so we muddle through as well as we can.
This isn’t unique to software development. Professor Billy V. Koen (a winner of ASEE’s Sterling Olmsted Award) writes about the pervasiveness of heuristics in all areas of engineering: see his books: Discussion of the Method: Conducting the Engineer’s Approach to Problem Solving (2003) and the shorter but to me more powerfully written Definition of the Engineering Method (1985). Or read his historical article, Billy V. Koen: The engineering method and the heuristic: A personal history (“This was the beginning of a 37 year quest to find one thing that was not a heuristic.”)
When we work under uncertainty, we accept and apply ideas that will later turn out to be wrong. And we reject ideas that will later turn out to be reasonable.
I have personally experienced plenty of both. For example, I advocated for exploratory testing back when it was deeply unfashionable. As to my mistakes, read The Ongoing Revolution in Software Testing for some examples.
For the field to advance, we have to be willing to learn from people we disagree with–to accept the possibility that in some ways, they might be right.
When we reject people who disagree with us as liars and fools, we close our minds and we shut down discussion. We cut ourselves off from a kind of debate, and a kind of critical thinking, that I think is essential to the personal growth of senior practitioners and to the progress of the field.
In my experience, people who espouse best practices (whatever they mean by that) start from the practice when they are deciding what to do. Whether they are fans of automated GUI-level test scripts or session-based test management, they start from the position that their preferred thing is probably the solution to your problem. Then they analyze the situation and probably adapt to it, which might mean modifying the practice or picking an alternative if the first choice won’t work in your situation.
In contrast, the context-driven tester starts from the situation. What’s going on here? What do people want? What are their constraints? After they have an understanding of much of the context, they decide what to do.
Some of my colleagues would disagree with my summary (I admit that it is unflattering), but as I see it:
I think this is an important difference.
But we don’t have to demonize each other over it.
To this point, I’ve argued that it is wrong (meaning “bad”) to censure people who advocate for “best practices.”
There is another issue: Does the context-driven school censure these people?
My answer is no. The context-driven school doesn’t censure anyone and it misrepresents the school to claim otherwise.
So, who gets to say that “the context-driven community censures… anything?”
I don’t think anyone has the authority to censure anyone else in the name of the context-driven school.
I think it’s fair to make statements like, “I don’t think X is consistent with context-driven views” or “I think these people are liars” — but those statements put the words in the mouth of the speaker, not the mouth of the school. That’s where those words belong.
My last post responded to Michael Bolton’s: “Why Pass vs. Fail Rates Are Unethical“, Michael argued that calculating the ratio of passing tests to failing tests is irresponsible, unethical, unprofessional, unscientific and inhumane. I think this is an example of a growing problem in the rhetoric of context-driven testing–I think it considers too little the value of tailoring what we do to the project’s context. Instead, too often, I see a moralistic insistence on adoption of preferred practices or rejection of practices that we don’t like.
I think it’s easy to convert any disagreement about policy or practice into a disagreement about ethics. I think this is characteristic of movements that are maturing into orthodox rigidity. Unfortunately, I think that’s fundamentally incompatible with a contextualist approach. My post advocated for dialing the rhetoric back, for a stronger distinction between disagreeing with someone and morally condemning them.
Michael responded with a restatement that I think is even more extreme.
I think the best way to answer this is with a series of posts (and perhaps some discussion) rather than one excessively long screed.
(Added 3/22/12) Michael and I plan to discuss this soon. My next post will be informed by that discussion.
The core messages of this first post are fairly simple:
Several years ago, I had a long talk about metrics with Hung Quoc Nguyen. Hung runs LogiGear, a successful test lab. He was describing to me some of the metrics that his clients expected. I didn’t like some of these metrics and I asked why he was willing to provide them. Hung explained that he’d discussed this with several executives. They understood that the metrics were imperfect. But they felt that they needed ways to summarize what the organization knew about projects. They felt they needed ways to compare progress, costs, priorities, and risks. They felt they needed ways to organize the information so that they could compare several projects or groups at the same time. And they felt they needed to compare what was happening now to what had happened in the past. Hung then made three points:
Hung concluded that his clients were within their rights to ask for this type of information and that he should provide it to them.
If I remember correctly, Hung also gently chided me for being a bit of a perfectionist. It’s easy to refuse to provide something that isn’t perfect. But that’s not helpful when the perfect isn’t available. He also suggested that when it comes to testers or consultants offering a “better alternative”, every executive has both the right and the responsibility to decide which alternative is the better one for her or his situation.
By this point, I had joined Florida Tech and wasn’t consulting to clients who needed metrics, so I had the luxury of letting this discussion settle in my mind for a while before acting on it.
A few years later, I started studying quantitative finance. I am particularly interested in the relationship between model evaluation in quantitative finance and exploratory testing. I also have a strong personal interest–I apply what I learn to managing my family’s investments.
The biggest surprise for me was how poor a set of core business metrics the investors have to work with. I’m thinking of the numbers in balance sheets, statements of cash flow, and income statements, and the added details in most quarterly and most annual investment reports. These paint an incomplete, often inaccurate picture of the company. The numbers are so subject to manipulation, and present such an incomplete view, that it can be hard to tell whether a company was actually profitable last year or how much their assets are actually worth.
Investors often supplement these numbers with qualitative information about the company (information that may or may not present a more trustworthy picture than the numbers). However, despite the flaws of the metrics, most investors pay careful attention to financial reports.
I suppose I should have expected these problems. My only formal studies of financial metrics (courses on accounting for lawyers and commercial law) encouraged a strong sense of skepticism. And of course, I’ve seen plenty of problems with engineering metrics.
But it was still a surprise that people actually rely on these numbers. People invest enormous amounts of money on the basis of these metrics.
It would be easy to rant against using these numbers. They are imperfect. They can be misleading. Sometimes severely, infuriatingly, expensively misleading. So we could gather together and have a nice chant that using these numbers would be irresponsible, unethical, unprofessional, unscientific and inhumane.
But in the absence of better data, when I make financial decisions (literally, every day), these numbers guide my decisions. It’s not that I like them. It’s that I don’t have better alternatives to them.
If someone insisted that I ignore the financial statistics, that using them would be irresponsible, unethical, unprofessional, unscientific, and inhumane, I would be more likely to lose respect for that person than to stop using the data.
I teach software metrics at Florida Tech. These days, I start the course with chapters from Tockey’s Return on Software: Maximizing the Return on Your Software Investment. We study financial statistics and estimate future cost of a hypothetical project. The students see a fair bit of uncertainty. (They experience a fair bit of uncertainty–it can be a difficult experience.) I do this to help my students gain a broader view of their context.
When an executive asks them for software engineering metrics, they are being asked to provide imperfect metrics to managers who are swimming in a sea of imperfect metrics.
It is important (I think very important) to pay attention to the validity of our metrics. It is important to improve them, to find ways to mitigate the risks of using them, and to advise our clients about the characteristics and risks of the data/statistics we supply to them. I think it’s important to use metrics in ways that don’t abuse people. There are ethical issues here, but I think the blanket condemnation of metrics like pass/fail ratios does not begin to address the ethical issues.
In the context-driven principles, we wrote (more precisely, I think, I wrote) “Metrics that are not valid are dangerous.” I still mostly (*) agree with these words but I think it is too easy to extend the statement into a position that is dogmatic and counterproductive. If I was writing the Principles today, I would reword this statement in a way that acknowledges the difficulty of the problem and the importance of the context.
(*) The statement that “Metrics that are not valid” is inaccurately absolute. It is not proper to describe a metric as valid (see Trochim and Shadish, Cook & Campbell, for example). Rather, we should talk about metrics as more valid or less valid (shades of gray). The wording “not valid” was a simplification at the time, and in retrospect, should be seen as an oversimplification.
In context-driven testing, we respect the fact that contexts differ.
I think it means that in different contexts, the people who are our clients:
Contexts don’t just differ for the testers. They differ for the project managers too. The project managers have to report to other people who want whatever information they want.
We don’t manage the project managers. We don’t decide what information they have to give to the people they report to.
Sometimes, a client will ask how many test cases the testers have run:
Sometimes a client will ask about defect removal efficiency:
But defect removal efficiency (DRE) is a fairly popular metric. It’s in lots of textbooks. People talk about it at conferences. So no matter what I say about it, my client might still want that number. Maybe my client’s boss wants it. Maybe my client’s customer wants it. Maybe my client’s regulator wants it. This is my client’s management context. I don’t think I’m entitled to know all the details of my client’s working situation, so maybe my client will explain why s/he needs this number and maybe s/he won’t.
So if the client says, “No, really, I need the DRE“, I accept that statement as a description of my client’s situation and I say, OK and give the number.
One more example: ratio of passing to failing tests. Michael Bolton presents several reasons for disliking this metric and I generally agree with them. In particular, I don’t know what the ratio measures (it has no obvious construct validity). And if the goal is to make the number big, there are lots of ways to achieve this that yield weak testing (see Austin on measurement dysfunction, for discussion of this type of problem.)
Really? UNETHICAL?!?
If you give this metric to someone (after they ask, and you say it’s not very good, and they say, really-I-want-it):
I used to associate shrill accusations of unethicalness with conservatives who were losing control of the hearts and minds of the software development community and didn’t like it, or who were pushing a phony image of community consensus as part of their campaigns to get big contracts, especially big government contracts, or who were using the accusation of unethical as a way of shutting down discussion of whether an idea (unethical!) was any good or not.
Maybe you’ve met some of these people. They said things like:
It seemed to me that some of the people (but not all of the people) who said these things were trying to prop up a losing point of view with fear, uncertainty, doubt — they were using demagoguery as their marketing technique. That I saw as unethical.
Much of my contribution to the social infrastructure of software testing was a conscious rebellion against a closed old boys network that defended itself with dogma and attacked non-conformers as unethical.
So what’s with this “Using a crummy metric is unethical” ?
Over the past couple of years, I’ve seen a resurgence of ethics-rhetoric. A new set of people have a new set of bad things to condemn:
I don’t think these are usually good ideas. In fact, most of the time, I think they’re wrong.
But _U_N_E_T_H_I_C_A_L_?_!_?
I’m not a moral relativist. I think there is evil in the world and I sometimes protest loudly against it. But I think it is essential to differentiate between:
Let me illustrate the difference. Michael Bolton is a friend of mine. I have a lot of respect for him as a person, including as an ethical being. His blog post is a convenient example of something I think is a broader problem, but please read my comments on his article as an assertion that I think Michael is wrong (not Wrong).
To the extent that we lose track of the difference between wrong and Wrong, I think we damage our ability to treat people who disagree with us with respect. I think we damage our ability to communicate about our professional differences. I think we damage our ability to learn, because the people we most agree with probably have fewer new things to teach us than the people who see the world a little differently.
The difference between wrong and Wrong is especially important for testers who want to think of ourselves (or market ourselves) as context-driven.
Because we understand that what is wrong in some contexts is right in some others.
Contexts differ.
So, did I really say that context-driven testing is dead? No, that was some other guy (Scott Barber) who’s using the buzz to launch a different idea. It’s effective marketing, and Scott has interesting ideas. But that’s his assertion, not mine.
What I wrote a few days ago was this:
If there ever was one context-driven school, there is not one now.
A “school” provides an organizing social structure for a body of attitudes and knowledge. Schools are often led by one or a few highly visible people.
Over the past few years, several people have gained visibility in the testing community who express ideas and values that sound context-driven to me. Some call themselves context-driven, some don’t. My impression is that some are being told they are not welcome. Others are uncomfortable with a perceived orthodoxy. They like the approach but not the school. They like the ideas, but not the politics.
The context-driven school appeared for years to operate with unified leadership. This appearance was a strength. But it was never quite true: Brian and Bret left early (but they left quietly). I’ve repeatedly raised concerns about the context-driven rhetoric, but relatively quietly. James and I haven’t collaborated successfully for years–this is old news–but for most of that time, our public disagreements were pretty quiet.
I think it is time to go beyond the past illusion of unity, to welcome a new generation of leadership. Not just a new generation of followers. A new generation of leaders. And to embrace their diversity.
There is not one school. There might be none. There might be several. I’m not sure what our real status is today. There will be an evolution and I look forward to seeing the result.
For now, I continue to be enthusiastic about the approach. I still endorse the principles. But what I understand to be the meanings and implications of the principles might not be exactly the same as what you understand. I think that’s OK.
In terms of the politics of The One School, my perception is of an exclusionary tone that has become more emphatic over time. I think this can make good marketing–entertaining presentations, lots of excitement. But does it serve its community? What is the impact on the people who are actually doing the testing: looking for work; looking for advancement in their own careers; striving to increase their skills and professionalism?
For many people, the impact is minimal–they follow their own way.
But for people who align themselves with the school, I think there are risks.
I wasn’t able to travel to CAST last year (health problem), so I watched sessions on video. Watching remotely let me look at things with a different perspective. One of the striking themes in what I saw was a mistrust of test automation. Hey, I agree that regression test automation is a poor bases for an effective comprehensive testing strategy, but the mistrust went beyond that. Manual (session-based, of course) exploratory testing had become a Best Practice.
In the field of software development, I think that people who don’t know much about how to develop software are on a path to lower pay and less job security. Testing-process consultants can be very successful without staying current in these areas of knowledge and skill. But the people they consult to? Not so much.
It was not the details that concerned me. It was the tone. I felt as though I was watching the closing of minds.
I have been concerned about this ever since people in our community (not just our critics–us!) started drawing an analogy between context-driven testing and religion.
As James put it in 2008, “I have my own testing religion (the Context-Driven School).” I objected to it back then, and since then. This is deeply inconsistent with what I signed up for when we declared a school.
An analogy to religion often carries baggage: Divine sources of knowledge; Knowledge of The Truth; Public disagreement with The Truth is Heresy; An attitude that alternative views are irrelevant; An attitude that alternative views are morally wrong.
Here’s an illustration from James’ most recent post:
This illustrates exactly what troubles me. In my view, there are legitimate differences in the testing community. I think that each of the major factions in the testing community has some very smart people, of high integrity, who are worth paying attention to. I’ve learned a lot from people who would never associate themselves with context-driven testing.
Let me illustrate that with some notes on my last week (Feb 27 to March 2):
I think it’s a Bad Idea to alienate, ignore, or marginalize people who do hard work on interesting problems.
James says later in his post,
I respect the right of any individual to seek his or her own level of ignorance.
But I see it as a disservice to the craft when thought-leaders encourage narrow-mindedness in the people who look to them for guidance.
When I was an undergraduate, I studied mainly math and philosophy. Of the philosophy, I studied mainly Indian philosophy, about 5 semesters’ worth. My step-grandmother was a Buddhist. Friends of mine had consistent views. I was motivated to take the ideas seriously.
One of the profound ideas in those courses was a rejection of the law of the excluded middle. According to that law, if A is a proposition, then A must be true or Not-A must be true (but not both). Some of the Indian texts rejected that. They demanded that the reader consider {A and Not-A} and {neither A nor Not-A}. In terms of the logic of mathematics, this makes no sense (and it is not a view I associate with Indian logicians). But in terms of human affairs, I think the rejection of the law of the excluded middle is a powerful cognitive tool.
I have thought that for about 40 years. I brought that with me in my part of the crafting of the context-driven principles. Something can be the right thing for your context and its opposite can be the right thing for my context.
I think we need to look more sympathetically at more contexts and more solutions. To ask more about what is right with alternative ideas and what we can learn from them. And to develop batteries of skills to work with them. For that, I think we need to get past the politics of The One School of context-driven testing.