Noise

Daniel Kahneman & Olivier Sibony & Cass R. Sunstein

January 2023

Read

Highlights

ScienceBusiness

Kahneman, Sibony, and Sunstein explore how cognitive biases and inconsistencies in judgment affect decision-making in organizations and society.

← All books

From the perspective of noise reduction, a singular decision is a recurrent decision that happens only once. Whether you make a decision only once or a hundred times, your goal should be to make it in a way that reduces both bias and noise. And practices that reduce error should be just as effective in your one-of-a-kind decisions as in your repeated ones.

· · ·

A matter of judgment is one with some uncertainty about the answer and where we allow for the possibility that reasonable and competent people might disagree.

· · ·

System noise is inconsistency, and inconsistency damages the credibility of the system.

· · ·

The advantages of Gauss’s approach were quickly recognized by other mathematicians. Among his many feats, Gauss used MSE (and other mathematical innovations) to solve a puzzle that had defeated the best astronomers of Europe: the rediscovery of Ceres, an asteroid that had been traced only briefly before it disappeared into the glare of the sun in 1801. The astronomers had been trying to estimate Ceres’s trajectory, but the way they accounted for the measurement error of their telescopes was wrong, and the planet did not reappear anywhere near the location their results suggested. Gauss redid their calculations, using the least squares method. When the astronomers trained their telescopes to the spot that he had indicated, they found Ceres!

· · ·

A widely accepted maxim of good decision making is that you should not mix your values and your facts. Good decision making must be based on objective and accurate predictive judgments that are completely unaffected by hopes and fears, or by preferences and values.

· · ·

“Level noise is when judges show different levels of severity. Pattern noise is when they disagree with one another on which defendants deserve more severe or more lenient treatment. And part of pattern noise is occasion noise—when judges disagree with themselves.”

· · ·

First, assume that your first estimate is off the mark. Second, think about a few reasons why that could be. Which assumptions and considerations could have been wrong? Third, what do these new considerations imply? Was the first estimate rather too high or too low? Fourth, based on this new perspective, make a second, alternative estimate.

· · ·

mood has a measurable influence on what you think: what you notice in your environment, what you retrieve from your memory, how you make sense of these signals. But mood has another, more surprising effect: it also changes how you think.

· · ·

People who are in a good mood are more likely to let their biases affect their thinking.

· · ·

The illusion of validity is found wherever predictive judgments are made, because of a common failure to distinguish between two stages of the prediction task: evaluating cases on the evidence available and predicting actual outcomes. You can often be quite confident in your assessment of which of two candidates looks better, but guessing which of them will actually be better is an altogether different kettle of fish.

· · ·

The robust finding that the model of the judge is more valid than the judge conveys an important message: the gains from subtle rules in human judgment—when they exist—are generally not sufficient to compensate for the detrimental effects of noise. You may believe that you are subtler, more insightful, and more nuanced than the linear caricature of your thinking. But in fact, you are mostly noisier.

· · ·

“People believe they capture complexity and add subtlety when they make judgments. But the complexity and the subtlety are mostly wasted—usually they do not add to the accuracy of simple models.”

· · ·

“There is so much noise in judgment that a noise-free model of a judge achieves more accurate predictions than the actual judge does.”

· · ·

“When there is a lot of data, machine-learning algorithms will do better than humans and better than simple models. But even the simplest rules and algorithms have big advantages over human judges: they are free of noise, and they do not attempt to apply complex, usually invalid insights about the predictors.”

· · ·

pundits should not be blamed for the failures of their distant predictions. They do, however, deserve some criticism for attempting an impossible task and for believing they can succeed in it.

· · ·

There is essentially no evidence of situations in which people do very poorly and models do very well with the same information.

· · ·

The Delphi method has worked well in many situations, but it can be challenging to implement. A simpler version, mini-Delphi, can be deployed within a single meeting. Also called estimate-talk-estimate, it requires participants first to produce separate (and silent) estimates, then to explain and justify them, and finally to make a new estimate in response to the estimates and explanations of others. The consensus judgment is the average of the individual estimates obtained in that second round.

· · ·

(A well-known response to this criticism, sometimes attributed to John Maynard Keynes, is, “When the facts change, I change my mind. What do you do?”)

· · ·

Rather than form a holistic judgment about a big geopolitical question (whether a nation will leave the European Union, whether a war will break out in a particular place, whether a public official will be assassinated), they break it up into its component parts. They ask, “What would it take for the answer to be yes? What would it take for the answer to be no?” Instead of offering a gut feeling or some kind of global hunch, they ask and try to answer an assortment of subsidiary questions.

· · ·

Forced ranking was advocated by Jack Welch when he was CEO of General Electric, as a way to stop inflation in ratings and to ensure “candor” in performance reviews. Many companies adopted it, only to abandon it later, citing undesirable side effects on morale and teamwork.

· · ·

You would not sneer at the leniency of the National Aeronautics and Space Administration’s performance management procedures if you heard that all the astronauts on a successful space mission have fully met expectations.

· · ·

For example, relative ratings might make sense when, regardless of people’s absolute performance, only a fixed percentage of them can be promoted—think of colonels being evaluated for promotion to general. But forcing a relative ranking on what purports to measure an absolute level of performance, as many companies do, is illogical. And mandating that a set percentage of employees be rated as failing to meet (absolute) expectations is not just cruel; it is absurd. It would be foolish to say that 10% of an elite unit of the army must be graded “unsatisfactory.”

· · ·

One meta-analysis estimates a correlation of .74 (PC = 76%). This means that you and another interviewer, after seeing the same two candidates in the same panel interview, will still disagree about which of two candidates is better about one-quarter of the time.

· · ·

As we can often find an imaginary pattern in random data or imagine a shape in the contours of a cloud, we are capable of finding logic in perfectly meaningless answers.

· · ·

Research has shown that work sample tests are among the best predictors of on-the-job performance.

· · ·

“we are all reasonable people and we disagree, so this must be a subject on which reasonable people can disagree.”

· · ·

Joan did not explain her logic, but she had learned this lesson the hard way. She knew that, particularly with important decisions, people reject schemes that tie their hands and do not let them use their judgment. She had seen how decision makers game the system when they know that a formula will be used. They change the ratings to arrive at the desired conclusion—which defeats the purpose of the entire exercise.

· · ·

1. At the beginning of the process, structure the decision into mediating assessments. (For recurring judgments, this is done only once.) 2. Ensure that whenever possible, mediating assessments use an outside view. (For recurring judgments: use relative judgments, with a case scale if possible.) 3. In the analytical phase, keep the assessments as independent of one another as possible. 4. In the decision meeting, review each assessment separately. 5. On each assessment, ensure that participants make their judgments individually; then use the estimate-talk-estimate method. 6. To make the final decision, delay intuition, but don’t ban it.

· · ·

If algorithms make fewer mistakes than human experts do and yet we have an intuitive preference for people, then our intuitive preferences should be carefully examined.

· · ·

Because rules have clear edges, people can evade them by engaging in conduct that is technically exempted but that creates the same or analogous harms. (Every parent of a teenager knows this!) When we cannot easily design rules that ban all conduct that ought to be prohibited, we have a distinctive reason to tolerate noise, or so the objection goes.