by Douglas W. Hubbard

Cover

Outline

Measuring is not counting: anything lowering the level of uncertainty can be counted as measurement.

  • The first step of measurement is establishing what need to be measured - what is valuable is usually not easy to measure (calculated through value of information). If information is not measurable it might be decomposed into more measurable information.
  • The second step is to determine what precision is required - sometimes only a rough range is giving most of the information.
  • The next step would be to chose a measuring instrument and use it. The book is presenting plenty, including calibrated estimates, monte-carlo simulations, sampling, bayesian statistics, a few human-based measurements (willingness to pay, risk tolerance, subjective trade-offs, Rasch Models, Lens Model and linear models), and a few new-technology methods.

Brilliant ideas

  • To have a value, estimates must be calibrated: we must determine how good we are at estimates by measuring how good we are at estimates. (See calibration exercise)
  • Information has value. The Expected Value of Information is equivalent to the reduction in Expected Opportunity Loss. EVI = EOL before info - EOL after info where EOL = chance of being wrong x cost of being wrong.
  • The vast majority of variables have an information value of zero (current level of uncertainty about that variable is acceptable) - no further measurement is justified. It usually is worth measuring more if the confidence in the new measurement is higher than the existing threshold.
  • Start by documenting - others most probably thought and worked on measuring it.
  • When sampling, beware of the observer bias (Heisenberg & Hawthorne bias) - observing causes changes in behavior. Also expectancy and selection bias are lowering results’ quality.
  • When sampling, if sample are completely random in an homogeneous distribution, a very small sample (5 or less) is giving a 93% CI measure (simple statistics). Samples also help a lot with calibrated estimates (e.g. if asked to estimate average jelly bean weight, knowing that the first two weigh 1 gram is good information)
  • If information does not exist to confirm or infirm hypotheses, create it by experimentation
  • Scores and weighed score cards are usually very bad at measuring. They are often used where actual quantitative measurements could be done and much more suited. A way to make them better is to calculate the std-dev of each criteria and adjust each score to make it more calibrated.

Other Ideas

  • Determine whether to measure by asking is there any measurement method that can reduce the uncertainty enough to justify the cost of measurement
  • Calibration exercise: try to estimate a 90% confidence range on some questions and verify how frequently we are correct, difference between 90% and actuals is how good we are. Look at each bound independently and ask: am I 95% sure value is over/under that estimate. Same thing with binary tests: estimate validity of statements and give an estimate that you’re true. Average of confidence compared to actual result tells if you’re under or over confident.
  • Risk paradox: if an organization uses risk analysis, it is usually for the routine operational decisions, rarely for risky decisions
  • Don’t start a brand new topic by Google but Wikipedia
  • Observation - does the information we’re searching for leaves a trail of any kind? If not, can we observe it directly? If not, can we observe it through proxies? If not, can we “force it” through experiments?
  • Release-Recatch is how they calculate the population of animals: catch 1000 birds, tag them, release, wait for some time for them to blend again, then catch a 1000 again. If you catch 50 with tags, means that 1000 is 5% of the population and you have 20.000 birds. Move to a range by calculating simple variance, then combine with t-statistics → 20.000 + 20.000 * variance x 1.645 and 20.000 - 20.000 * variance x 1.645
  • Correlation in data is expressed by a number between 1 and -1. 1 means variable increase in direct relationship (there is a transfer function from one to the other), 0 means there is no relation whatsoever. When put on a graph, correlations can be easy to spot
  • Rasch - to be completed
  • New methods include using markets or large portions of internet to get opinions.

Errors in measurement

  • Systemic: tendency to have a consistent error in one direction
  • Random error: error that is not predictable, not consistent
  • Accuracy: characteristic of a measurement having low systemic errors
  • Precision: characteristic of a measurement having low random errors levels.

Estimates

Estimates usually have a value only if the estimator is calibrated (actually knows what 90% confidence is and verified she is 90% right all the time) 1) Estimates can be done by range with a confidence indicator: 90% confidence means that you think the actual value has 95% chance of being higher that lower bound and 95% chance of being lower than higher bound. 2) Binary estimates can be done by statements + confidence indicator (CI)

Simple Monte-Carlo

  • Decompose the value to measure into smaller variables
  • Generate random normally-distributed values using Excel: =norminv(rand(), average, (upper-lower) / stddev in values).
  • Observe results and look at the distribution
  • Conclude

Assemble an instrument

If the instrument is not directly visible:

  • Imagine what the consequences through absurd
  • How would others do it?
  • Iterate
  • Just do it.

T-statistics and samples

Samples are great to reduce uncertainty when it’s very large, and is usually helping to narrow down calibrated estimates.

To calculate the error margin for a small sample set, multiply std-dev by the corresponding T-score for 90% CI: ||Sample Size||t-score|| |2|6.38| |3|2.92| |4|2.35| |5|2.13| |6|2.02| |8|1.89| |12|1.80| |16|1.75| |28|1.70| |More|1.645|

Bayesian stats

P(A|B) = P(A) x P(B|A) / P(B). This can be used with calibrated estimates: give an estimate, then refine using the topic, on both A and B → refined estimate. That can be translated into “inverted-bayesian”: what is the chance of seeing X if the truth is Y is equivalent to what is the chance the truth is Y if I see X.

Willingness to pay

Measuring happiness can be achieved by establishing correlation between income, life events, and happiness. (A. Oswald did that and achieved the calculation that a healthy marriage is happiness equivalent to an additional $100k / y).

Measurements just help making enlightened decisions, they don’t necessarily call for making the most cost-effective justified decisions → e.g. if you prefer giving money to a local business that is more expensive that a big competitor. That is art buying problem.

Risk tolerance and trade-offs

It is possible to measure trade-offs and acceptable thresholds by measuring several variables using different predetermined setups. The goal is to find points on the “boundary”:

Boundary

It is possible to take additional parameters into account by determining severable boundaries (e.g. for $100k investments, then $120k investments, …). Point of the same boundary are equally valuable. This can be very useful when comparing options with different strong points, such as performance comparison across people.

Using human judges

A good way to get estimates is gathering people with know-how who can naturally estimate things through their experience. Beware of biases:

  • anchoring (being influenced by other unrelated numbers)
  • halo-effect (if somebody favours a solution, she might interpret every new information about the solution under a positive light) - respectively horns effect
  • bandwagon effect - we are influenced by other people’s opinions
  • emerging preferences - post-rationalizing judgement criteria because we like a solution