Metrics, Ethics, & Context-Driven Testing (Part 2)

My last post responded to Michael Bolton’s: “Why Pass vs. Fail Rates Are Unethical“, Michael argued that calculating the ratio of passing tests to failing tests is irresponsible, unethical, unprofessional, unscientific and inhumane. I think this is an example of a growing problem in the rhetoric of context-driven testing–I think it considers too little the value of tailoring what we do to the project’s context. Instead, too often, I see a moralistic insistence on adoption of preferred practices or rejection of practices that we don’t like.

I think it’s easy to convert any disagreement about policy or practice into a disagreement about ethics. I think this is characteristic of movements that are maturing into orthodox rigidity. Unfortunately, I think that’s fundamentally incompatible with a contextualist approach. My post advocated for dialing the rhetoric back, for a stronger distinction between disagreeing with someone and morally condemning them.

Michael responded with a restatement that I think is even more extreme.

I think the best way to answer this is with a series of posts (and perhaps some discussion) rather than one excessively long screed.

(Added 3/22/12) Michael and I plan to discuss this soon. My next post will be informed by that discussion.

The core messages of this first post are fairly simple:

Executives are Entitled and Empowered to Choose their Metrics

Several years ago, I had a long talk about metrics with Hung Quoc Nguyen. Hung runs LogiGear, a successful test lab. He was describing to me some of the metrics that his clients expected. I didn’t like some of these metrics and I asked why he was willing to provide them. Hung explained that he’d discussed this with several executives. They understood that the metrics were imperfect. But they felt that they needed ways to summarize what the organization knew about projects. They felt they needed ways to compare progress, costs, priorities, and risks. They felt they needed ways to organize the information so that they could compare several projects or groups at the same time. And they felt they needed to compare what was happening now to what had happened in the past. Hung then made three points:

  1. These are perfectly legitimate management goals.
  2. Quantification (metrics) is probably necessary to achieve these goals.
  3. The fact that there is no collection of metrics that will do this perfectly (or even terribly well) doesn’t eliminate the need. Without a better alternative, managers will do the best they can with what they’ve got.

Hung concluded that his clients were within their rights to ask for this type of information and that he should provide it to them.

If I remember correctly, Hung also gently chided me for being a bit of a perfectionist. It’s easy to refuse to provide something that isn’t perfect. But that’s not helpful when the perfect isn’t available. He also suggested that when it comes to testers or consultants offering a “better alternative”, every executive has both the right and the responsibility to decide which alternative is the better one for her or his situation.

By this point, I had joined Florida Tech and wasn’t consulting to clients who needed metrics, so I had the luxury of letting this discussion settle in my mind for a while before acting on it.

Finance Metrics Illustrate the Executives’ Context

A few years later, I started studying quantitative finance. I am particularly interested in the relationship between model evaluation in quantitative finance and exploratory testing. I also have a strong personal interest–I apply what I learn to managing my family’s investments.

The biggest surprise for me was how poor a set of core business metrics the investors have to work with. I’m thinking of the numbers in balance sheets, statements of cash flow, and income statements, and the added details in most quarterly and most annual investment reports. These paint an incomplete, often inaccurate picture of the company. The numbers are so subject to manipulation, and present such an incomplete view, that it can be hard to tell whether a company was actually profitable last year or how much their assets are actually worth.

Investors often supplement these numbers with qualitative information about the company (information that may or may not present a more trustworthy picture than the numbers). However, despite the flaws of the metrics, most investors pay careful attention to financial reports.

I suppose I should have expected these problems. My only formal studies of financial metrics (courses on accounting for lawyers and commercial law) encouraged a strong sense of skepticism. And of course, I’ve seen plenty of problems with engineering metrics.

But it was still a surprise that people actually rely on these numbers. People invest enormous amounts of money on the basis of these metrics.

It would be easy to rant against using these numbers. They are imperfect. They can be misleading. Sometimes severely, infuriatingly, expensively misleading. So we could gather together and have a nice chant that using these numbers would be irresponsible, unethical, unprofessional, unscientific and inhumane.

But in the absence of better data, when I make financial decisions (literally, every day), these numbers guide my decisions. It’s not that I like them. It’s that I don’t have better alternatives to them.

If someone insisted that I ignore the financial statistics, that using them would be irresponsible, unethical, unprofessional, unscientific, and inhumane, I would be more likely to lose respect for that person than to stop using the data.

Teaching Metrics

I teach software metrics at Florida Tech. These days, I start the course with chapters from Tockey’s Return on Software: Maximizing the Return on Your Software Investment. We study financial statistics and estimate future cost of a hypothetical project. The students see a fair bit of uncertainty. (They experience a fair bit of uncertainty–it can be a difficult experience.) I do this to help my students gain a broader view of their context.

When an executive asks them for software engineering metrics, they are being asked to provide imperfect metrics to managers who are swimming in a sea of imperfect metrics.

It is important (I think very important) to pay attention to the validity of our metrics. It is important to improve them, to find ways to mitigate the risks of using them, and to advise our clients about the characteristics and risks of the data/statistics we supply to them. I think it’s important to use metrics in ways that don’t abuse people. There are ethical issues here, but I think the blanket condemnation of metrics like pass/fail ratios does not begin to address the ethical issues.

The Principles

In the context-driven principles, we wrote (more precisely, I think, I wrote) “Metrics that are not valid are dangerous.” I still mostly (*) agree with these words but I think it is too easy to extend the statement into a position that is dogmatic and counterproductive. If I was writing the Principles today, I would reword this statement in a way that acknowledges the difficulty of the problem and the importance of the context.

(*) The statement that “Metrics that are not valid” is inaccurately absolute. It is not proper to describe a metric as valid (see Trochim and Shadish, Cook & Campbell, for example). Rather, we should talk about metrics as more valid or less valid (shades of gray). The wording “not valid” was a simplification at the time, and in retrospect, should be seen as an oversimplification.

13 thoughts on “Metrics, Ethics, & Context-Driven Testing (Part 2)

  1. Hi Cem, I like the comparison of testing metrics to the quantitative data collected in other fields used for decision-making. Regarding the statement “Metrics that are not valid are dangerous” why don’t you update it then? As written, it does seem absolute and not very open to context. In fact, it may even conflict with Principle #2.

    That statement is carved in reused electrons, not in stone. It’s not even a main principle – it’s an illustration or example. Change it. Many testers and individuals are still discovering context-driven testing through practice and experience and have yet to visit this site. What is preventing you from rewording that example?

    Cheers. Paul.

  2. Paul – I read in this site’s About that Cem is leaving the landing page as it was originally written:

    When you land on this site, you see the context-driven-testing.com landing page (the Principles) as it was when we originally published it. I’ll keep it that way (with the same set of Principles), because several people have found it useful.

  3. Fabulous post & perspective Cem! I am in complete agreement regarding the Executive’s “Entitlement & Empowerment”.

    I like to think about topics like this in terms of “rights and responsibilities” — it’s not a perfect metaphor, but I find it helps me clarify my thoughts.

    I’ve used that metaphore to share my thoughts regarding the “rights and responsibilities” of both Executives and suppliers of the data/metrics they request on my blog in case anyone is interested.

    http://scott-barber.blogspot.com/2012/03/business-value-of-software-test-metrics.html

    Scott

  4. Hi Cem and all,
    I certainly recognize the problem; I’ve often been asked to provide metrics such as pass/fail ratio, number of test cases executed per day, et cetera, which I feel are very badly flawed and can be misleading. Usually I talk with the executive concerned to find out what he or she really needs to know and find out if there’s an alternative he’ll accept for reporting on what’s happening. Sometimes that works, but sometimes executives insist on certain metrics, and I have to choose between providing them or moving on to somewhere else. So what I will usually do is make the metrics, but accompany them with qualitative information about the testing activities and a brief explanation of the pitfalls of the metrics and I let people know that they need to read both the ‘quantitative’ metrics AND the written, ‘qualitative’ information to get the real picture. It’s not a complete solution to people using poor metrics, which happens in almost every line of business I know of, but for me at least it means I feel that I’ve acted responsibly and tried to inform my client properly and as fully as possible. I don’t feel that it’s unethical or unprofessional to provide flawed metrics SO LONG AS you qualify those metrics and make the pitfalls clear. After all, as testers we provide a service to decision makers; we’re not the decision makers ourselves.

    all the best, Michael

  5. Pingback: Five Blogs – 22 March 2012 « 5blogs

  6. Cem,

    I have worked for an organization that requested and acted upon flawed data (and it caused the opposite affect than they thought it would). When I read Michael’s post, I did not feel that he was challenging that situation. I did not think he was calling out my former employer. He was challenging the reader, me. And I think the reader was/is a software tester or somebody slightly up the chain. He didn’t ask anybody to quit their job. He provided an alternative.

    I met James Bach a couple years ago. Aside from acting giddy from meeting one of my test heroes, I told him that I respected his views but I couldn’t always choose to go that path because I am the breadwinner for my family. He said that he understood.

    Sometimes words are too strong. Hyperbole seems to be the norm these days. Still, we should give Michael the benefit of the doubt.

    Thanks,

    Dave

    Thanks for your note, Dave. I’m not trying to attack Michael. Michael is a friend of mine. It’s not a matter of giving (or not giving) him personally the benefit of the doubt. I’m criticizing the words, the theme, and the underlying assumptions.

    I think the rhetoric of context-driven testing (including this particular series by Michael) has drifted in ways that need to be dialed back and refocused on the “context” in context-driven testing. My intention was to raise that flag at my keynote at CAST last year. The design of that talk was a little sharper in tone than these posts, but it addressed themes and not specific statements by specific people. As a live entertainment, it might have gained the attention and discussion that I think we need. Unfortunately, I couldn’t travel to CAST (got sick). I offered to do the keynote remotely (videotaped talk with an online discussion) but the program chair preferred a live presentation, which I couldn’t deliver. So, no keynote. After pondering alternatives for 6 months, I decided that raising the themes through a set of blog posts was my best way to raise some concerns that have been bothering me since about 2007. This medium is different from keynotes. I think the better way to use this medium is to identify current, credible presentations of ideas and draw contrasts with them. It’s my best shot at fostering a debate.

    — Cem

  7. Pingback: Ledge Psychology | Exploring Uncertainty

  8. This is excellent food for thought. The blank request for metrics has been made to my team from management. Part of the hesitation on our behalf is determining the metrics that are valid and informative. This provides me with a different perspective when choosing which metrics to pass along. Although no one on our team has made the argument against the validity or ethical use of metrics, I think the sentiment is always there from a good QA team. By differentiating between perfect and necessary, it can help us move along in the decision making process.

    Natalie

  9. Pingback: Like It Or Not – Metrics are there | About life

  10. Cem,

    You are also one of my testing heroes so I want to explain what I meant – that Michael’s words were strong. In my eyes, you were not attacking him. I just saw his series of articles differently than you characterized it to be. Clearly he and James have strong views about metrics (and test automation, and certifications, and on and on). The industry needs that kind of strong beliefs because of the management pressure on testers.

    Dave

  11. Cem

    I agree with you when you say that without alternatives, we will have to do with the status quo. Over the years, I have been moving away from the mathematical/statistical models applied to a primarily human endeavor (software development).

    Managers and executives have real pressure on them to justify spending money on testers – they need some tangible accounting of progress and state of testing. The onus is then on testers to show value AND show progress towards completion.

    However, managers need to stop attributing metrics to people, which is the root of all problems. Elizabeth Hendrickson has a good post on what alternative metrics can be used, but these need more acceptance across industries and publishers of software metrics benchmarks (Capers Jones, for example) need to use such metrics.

    Looking forward to more discussion on this from you.

  12. Dear Cem,

    I revisited this post after many months and decided to add a comment. I consider myself a context-driven tester because I like to ‘listen sympathetically, find common ground, and work effectively with people who have other points of view’.

    So what made me come back and share my thoughts on Testing Metrics? A real life scenario! This response is just one of the ways to tell people why metrics are important too. Even though I am not a great fan of them as well and in certain scenarios I have also opposed the idea of providing pass-fail ratio and test case coverage metrics to stakeholders.

    In the last few years I have started managing large and ultra large programs of work where testing is one of the activities. My definition of large is a project below $50 million and ultra large is above $50 Mn. One of the last projects I managed as test manager was to ensure that the new aircraft (costing nearly $300 Mn) my company has bought fulfils compliances of FAA as well as Hong Kong Civil Aviation Department and all other aviation policies. I was asked to coordinate the effort and to ensure we test all channels of communications when the aircraft takes off from Seattle and connects to Hong Kong office until it reaches Hong Kong. Did we provide Pass-Fail ratio of all test cases? Yes we did because it was important to provide those not only to executives, but possibly also to regulators. Does it make my metrics and me irresponsible, unethical, unprofessional, unscientific and inhumane? I don’t think so! At least these helped me keep my job. Failing to provide might have helped me lose my job if not got me arrested.

    My current program is close to $100 million which aims to provide real-time connectivity to the aircrafts even when they are far away from base location or above 40,000 ft in air. As you said, testing is an important activity, but it is a small piece of a much larger network of activities. Testing is a very important activity when we talk about making changes to an airlines’ systems; and a much more important activity especially when it includes making changes to an aircraft’s designs. The executives want to verify each possible scenario the new design may bring and they want me to ensure we test all of those. This is obvious that these tests will be performed at many stages that include a test lab, a simulator, an automated environment, operational tests for many months etc. Executives still want to see the quantitative presentation of testing activities which include test cases, pass-fail ratio, test case coverage, defect metrics and many more.

    Can I voice out that some people in my trade think these metrics are irresponsible, unethical, unprofessional, unscientific and inhumane? They are not because they will be scientifically collected by trained (skilled) testers who are supported by trained professional like engineers and pilots.

    The executives have all the rights to ask for the data they believe is important to make decisions, specially when it is for something which is mission & life critical and costs millions of dollars. It is my (a test manager’s) duty to make sure that the metrics are based on facts, are quantitative and are scientific. It is also my duty to raise a concern if I do not agree with a metric or the metric is entirely imperfect.

  13. Pingback: Metric Madness | Road Less Tested

Comments are closed.