The NPS Trap: When Measuring Satisfaction Hurts Service

I’ve started noticing the same broken pattern everywhere, and I mean everywhere. My car service center does it. My credit card issuer does it. I’d bet your bank, your ISP, and your last food delivery order do it too.

Somewhere along the way, service organizations stopped measuring whether customers are happy and started measuring whether customers gave them a good number. Those are not the same thing, and the gap between them has gotten wide enough that the score itself has stopped meaning much.

This isn’t an argument against measuring customer sentiment. It’s an argument that the way most of us have operationalized one specific metric, The Net Promoter Score, has quietly defeated its own purpose.

What NPS Was Actually Built To Do

NPS came out of Fred Reichheld’s 2003 work at Bain & Company, built on a fairly simple finding: when you ask customers “how likely are you to recommend us to a friend or colleague” on a 0–10 scale, the answer correlates with real growth behavior such as repurchase, referrals, lifetime value better than most other satisfaction questions tested at the time.

The now-familiar buckets, Detractors (0–6), Passives (7–8), Promoters (9–10), weren’t picked because they felt intuitive. They came out of correlation analysis: people who scored 9–10 behaved like genuine advocates, 7–8 scorers behaved like fence-sitters, and 0–6 scorers showed churn risk and a tendency to warn others away.

The important detail practitioners tend to skip: this was validated as a population-level predictive tool. Across thousands of responses, the buckets predict aggregate growth reasonably well. They were never built to be explained back to an individual respondent, and they were never built to sit inside one technician’s or one agent’s performance scorecard.

That’s exactly where things start to break.

Comparison table showing how Net Promoter Score (NPS) was designed for aggregate customer insights but is often deployed as an individual employee performance metric in service organizations. — *NPS was designed to measure customer sentiment across large populations, not to evaluate individual technicians or agents.*

The Scale Doesn’t Have the Resolution We Pretend It Does

Ask yourself honestly: what separates a 3 from a 4 on a recommendation scale? What separates an 8 from a 9? Most customers can’t tell you, because the scale was never designed to carry that kind of precision at the individual response level.

It gets worse when companies start explaining the scale back to the customer. I’ve seen surveys that spell it out directly: 0–6 is bad, 7–8 is okay, 9–10 is good. One I received recently had pushed the goalposts further and 9 was labeled “average” and only 10 counted as “good.” A credit card survey I got asked likelihood to recommend with 5 marked “neutral” and 10 marked “most likely.”

[Image callout: this is where your WhatsApp/screen-grab collage belongs — side by side, showing the scale-label drift across categories. The credit card example is particularly useful here since it shows the same ambiguity problem outside aftersales, which broadens the article’s relevance beyond service teams.]

The moment a company tells a customer what the numbers mean, the exercise stops being a spontaneous gut reaction and becomes a comprehension test. You’re no longer capturing sentiment but capturing whether the customer read and followed the instructions correctly.

The Hounding Problem and How It Poisons the Data

Here’s the part that doesn’t get talked about enough: what happens after a customer gives a low score.

Anyone below a 9 gets followed up. Called. Messaged. Sometimes more than once. The stated goal is “service recovery.” The actual behavior, from where I sit, looks a lot more like score recovery.

This produces two effects that quietly corrupt the entire dataset:

A 9 or 10 can easily be a former detractor who rated high just to make the hounding stop. Once that happens, your Promoter bucket is contaminated with appeasement scores that have nothing to do with advocacy.
Some customers simply stop responding. They’ve seen where this goes, and they opt out of the survey altogether rather than deal with the follow-up calls. Those customers, possibly your most genuinely dissatisfied ones, vanish from the data entirely instead of showing up as Detractors.

By the time these scores get aggregated into a dashboard, the number isn’t measuring customer sentiment anymore. It’s measuring how skilled your customers have become at avoiding a phone call.

Naming the Actual Disease

What’s happening here has a name: Goodhart’s Law. Once a measure becomes a target, it stops being a good measure.

NPS was designed for aggregate prediction and got weaponized for individual accountability. The moment a single agent’s bonus, a single technician’s scorecard, or a single team’s monthly review depends on moving one customer from a 7 to a 9, gaming the number isn’t a failure of integrity, it’s the rational response to the incentive you built. Anyone in that seat would do the same thing.

This is the same pattern I wrote about in When the Dashboard Becomes the Project. It’s just different costume but identical disease. Give people a visible proxy and tie consequences to it, and the proxy becomes the actual job. The work the proxy was supposed to represent quietly stops mattering.

It’s Not Just NPS

This problem isn’t unique to Net Promoter Score. The same fate awaits almost any proxy metric your team tracks the moment it gets tied to compensation or performance review:

First-time-fix rate gets gamed by reclassifying repeat visits, not by actually fixing more on the first try
Average handle time gets gamed by rushing calls or transferring them, not by resolving issues faster
CSAT suffers the identical scale and hounding problems as NPS, just with different question wording

If your team has a number that everyone seems strangely good at hitting while the underlying complaints don’t change, this is probably why. Part of why these metrics get embedded so deep into team operations in the first place is that they arrive pre-built into the FSM platform’s dashboard in a polished, real-time, and convincing sales demo. Nobody in the room asks whether the metric is being used at the level it was designed for. I’ve written before about how this plays out at the FSM platform-selection stage were the demo looks like it has all capabilities, but it’s often that nobody has stress-tested it for business complexity.

What Could Actually Fix This

I won’t pretend there’s a clean answer here, but a few things move in the right direction:

Decouple the score from individual compensation. The moment one person’s pay depends on one customer’s number, that number stops being trustworthy. Use NPS at the aggregate, portfolio level it was designed for, not as a line item in someone’s monthly review.
Pair the number with qualitative follow-up that resists gaming. A score alone tells you nothing about why. Open-ended comments, reviewed in aggregate rather than chased individually, hold more signal than the number itself.
Track outcome durability instead of a single point-in-time score. Repeat issue rate, recurrence of the same complaint, whether the customer actually returns, these are harder to game and closer to what you actually care about.

None of this guarantees a perfect system. But it stops rewarding the exact behavior of chasing the number instead of the problem that’s currently undermining the whole exercise.

What You See	What It Might Actually Mean
Customer rates 9 after multiple follow-up calls	Appeasement, not advocacy
Customer doesn’t respond to the survey at all	Possible silent detractor, not a neutral non-event
Team’s NPS stays high while complaint volume stays flat	Score is being managed, not the experience
Customer rates 7-8 consistently	Quiet churn risk being misread as “passive but fine”

Table: What Your NPS Score Might Actually Be Telling You

The Part That Should Bother You as a Practitioner

The next time a survey lands in your own inbox, car service, bank, airline, or anything else, pay attention to what you actually do with it. Do you rate honestly? Do you rate high just to avoid a callback? Do you skip it entirely?

You’ve lived this from both sides. That’s exactly why it should bother you when you’re the one designing the survey.

I’d love to hear how this shows up in your org!

Share your experience via the contact form. Practitioner inputs shape the content on this site.