Sampled (Underfoot)

Some interesting statistics from the American sociologist Elizabeth Wrigley-Field:

Here are three puzzles.

• American fertility fluctuated dramatically in the decades surrounding the Second World War. Parents created the smallest families during the Great Depression, and the largest families during the postwar Baby Boom. Yet children born during the Great Depression came from larger families than those born during the Baby Boom. How can this be?

• About half of the prisoners released in any given year in the United States will end up back in prison within five years. Yet the proportion of prisoners ever released who will ever end up back in prison, over their whole lifetime, is just one third. How can this be?

• People whose cancers are caught early by random screening often live longer than those whose cancers are detected later, after they are symptomatic. Yet those same random screenings might not save any lives. How can this be?

And here is a twist: these are all the same puzzle.

• Answers here: Length-Biased Sampling by Elizabeth Wrigley-Field


Proxi-Performative Post-Scriptum

The title of this post is, of course, a radical reference to core Led Zeppelin track “Trampled Underfoot” (1975).

Get Your Tox Off

There’s only one word for it: toxic. The proliferation of this word is an incendiarily irritating abjectional aspect of contemporary culture. My visit to Google Ngram has confirmed my worst suspicions:

Toxic in English

Toxic in English

Toxic in English fiction

Toxic in English fiction

“Feral” isn’t irritating in quite the same way, but has similarly proliferated:

Feral in English

Feral in English

Feral in English fiction

Feral in English fiction

Noxious note: In terms of majorly maximal members of the Maverick Messiah community (such as myself), it goes without saying that when we deploy such items of Guardianese, we are being ironic dot dot dot


Previously pre-posted (please peruse):

Septics vs Dirties
Ex-term-in-ate!
Reds Under the Thread
Titus Graun

Live and Let Dice

How many ways are there to die? The answer is actually five, if by “die” you mean “roll a die” and by “rolled die” you mean “Platonic polyhedron”. The Platonic polyhedra are the solid shapes in which each polygonal face and each vertex (meeting-point of the edges) are the same. There are surprisingly few. Search as long and as far as you like: you’ll find only five of them in this or any other universe. The standard cubic die is the most familiar: each of its six faces is square and each of its eight vertices is the meeting-point of three edges. The other four Platonic polyhedra are the tetrahedron, with four triangular faces and four vertices; the octahedron, with eight triangular faces and six vertices; the dodecahedron, with twelve pentagonal faces and twenty vertices; and the icosahedron, with twenty triangular faces and twelve vertices. Note the symmetries of face- and vertex-number: the dodecahedron can be created inside the icosahedron, and vice versa. Similarly, the cube, or hexahedron, can be created inside the octahedron, and vice versa. The tetrahedron is self-spawning and pairs itself. Plato wrote about these shapes in his Timaeus (c. 360 B.C.) and based a mathemystical cosmology on them, which is why they are called the Platonic polyhedra.

An animated gif of a tetrahedron

Tetrahedron


An animated gif of a hexahedron

Hexahedron

An animated gif of an octahedron

Octahedron


An animated gif of a dodecahedron

Dodecahedron

An animated gif of an icosahedron

Icosahedron

They make good dice because they have no preferred way to fall: each face has the same relationship with the other faces and the centre of gravity, so no face is likelier to land uppermost. Or downmost, in the case of the tetrahedron, which is why it is the basis of the caltrop. This is a spiked weapon, used for many centuries, that always lands with a sharp point pointing upwards, ready to wound the feet of men and horses or damage tyres and tracks. The other four Platonic polyhedra don’t have a particular role in warfare, as far as I know, but all five might have a role in jurisprudence and might raise an interesting question about probability. Suppose, in some strange Tycholatric, or fortune-worshipping, nation, that one face of each Platonic die represents death. A criminal convicted of a serious offence has to choose one of the five dice. The die is then rolled f times, or as many times as it has faces. If the death-face is rolled, the criminal is executed; if not, he is imprisoned for life.

The question is: Which die should he choose to minimize, or maximize, his chance of getting the death-face? Or doesn’t it matter? After all, for each die, the odds of rolling the death-face are 1/f and the die is rolled f times. Each face of the tetrahedron has a 1/4 chance of being chosen, but the tetrahedron is rolled only four times. For the icosahedron, it’s a much smaller 1/20 chance, but the die is rolled twenty times. Well, it does matter which die is chosen. To see which offers the best odds, you have to raise the odds of not getting the death-face to the power of f, like this:

3/4 x 3/4 x 3/4 x 3/4 = 3/4 ^4 = 27/256 = 0·316…

5/6 ^6 = 15,625 / 46,656 = 0·335…

7/8 ^8 = 5,764,801 / 16,777,216 = 0·344…

11/12 ^12 = 3,138,428,376,721 / 8,916,100,448,256 = 0·352…

19/20 ^20 = 37,589,973,457,545,958,193,355,601 / 104,857,600,000,000,000,000,000,000 = 0·358…

Those represent the odds of avoiding the death-face. Criminals who want to avoid execution should choose the icosahedron. For the odds of rolling the death-face, simply subtract the avoidance-odds from 1, like this:

1 – 3/4 ^4 = 0·684…

1 – 5/6 ^6 = 0·665…

1 – 7/8 ^8 = 0·656…

1 – 11/12 ^12 = 0·648…

1 – 19/20 ^20 = 0·642…

So criminals who prefer execution to life-imprisonment should choose the tetrahedron. If the Tycholatric nation offers freedom to every criminal who rolls the same face of the die f times, then the tetrahedron is also clearly best. The odds of rolling a single specified face f times are 1/f ^f:

1/4 x 1/4 x 1/4 x 1/4 = 1/4^4 = 1 / 256

1/6^6 = 1 / 46,656

1/8^8 = 1 / 16,777,216

1/12^12 = 1 / 8,916,100,448,256

1/20^20 = 1 / 104,857,600,000,000,000,000,000,000

But there are f faces on each polyhedron, so the odds of rolling any face f times are 1/f ^(f-1). On average, of every sixty-four (256/4) criminals who choose to roll the tetrahedron, one will roll the same face four times and be reprieved. If a hundred criminals face the death-penalty each year and all choose to roll the tetrahedron, one criminal will be reprieved roughly every eight months. But if all criminals choose to roll the icosahedron and they have been rolling since the Big Bang, just under fourteen billion years ago, it is very, very, very unlikely that any have yet been reprieved.

Pest Test

Health warning: I am not a mathematician. That said, here is a mathematical question:

Suppose there is a 99% accurate test for a medical condition – say a symptomless infection. You take the test and get a positive result. What are your chances of having the infection?

That obvious answer might seem to be 99%. But the obvious answer is wrong. The accuracy of the test is only half the information you need to answer the question. You also need to know how common the infection is. Say it occurs once in every hundred people. On average, then, if you test a hundred people, one of whom has the infection, you will get two positive results: one that is accurate and one that is inaccurate, i.e., a false positive. Under those circumstances, a positive result means that you have a ½, or 50%, chance of having the infection (see appendix for further discussion). Under some other circumstances, a positive result on an 80% or 90% accurate test would mean that you have a higher chance of having the infection. Here’s a graphic to illustrate this apparent paradox:

Graph illustrating confidence rates for medical tests of various accuracy

The x-axis represents infection rate per 10,000 of the population, the y-axis represents one’s chance of being infected, from 0%, for no chance, to 100%, for complete certainty. The coloured curves represent tests of different accuracy: 1% accurate, for the bottom curve, and 99% accurate, for the uppermost curve. The curves between the two represent tests of 10% to 90% accuracy. Note how the curves mirror each other: the 99% accurate test rises towards certainty very quickly, but takes a long time to finally get there. The 1% accurate test stays near complete uncertainty for a long time, then finally rises rapidly towards certainty. In other words, a positive result on a 99% accurate test is equivalent to a negative result on a 1% accurate test, and vice versa. Ditto for the 90% and 10% accurate tests, and so on. But a positive (or negative) result on a 50% accurate test is useless, because it never tells you anything new: your chance of being infected, given a positive result, is the same as the rate of infection in the population. And when exactly half the population is infected, your chance of being infected, given a positive result, is the same as the accuracy of the test, whether it’s 1%, 50%, or 99%.

Here is a table illustrating the same points:

Accuracy of test →


Infection rate ↓

1% 10% 20% 30% 40% 50% 60% 70% 80% 90% 99%
1/100 <1% 0.1% 0.3% 0.4% 0.7% 1% 1.5% 2.3% 3.9% 8.3% 50%
10/100 0.1% 1.2% 2.7% 4.5% 6.9% 10% 14.3% 20.6% 30.8% 50% 91.7%
20/100 0.3% 2.7% 5.9% 9.7% 14.3% 20% 27.3% 36.8% 50% 69.2% 96.1%
30/100 0.4% 4.5% 9.7% 15.5% 22.2% 30% 39.1% 50% 63.2% 79.4% 97.7%
40/100 0.7% 6.9% 14.3% 22.2% 30.8% 40% 50% 60.9% 72.7% 85.7% 98.5%
50/100 1% 10% 20% 30% 40% 50% 60% 70% 80% 90% 99%
60/100 1.5% 14.3% 27.3% 39.1% 50% 60% 69.2% 77.8% 85.7% 93.1% 99.3%
70/100 2.3% 20.6% 36.8% 50% 60.9% 70% 77.8% 84.5% 90.3% 95.5% 99.6%
80/100 3.9% 30.8% 50% 63.2% 72.7% 80% 85.7% 90.3% 94.1% 97.3% 99.7%
90/100 8.3% 50% 69.2% 79.4% 85.7% 90% 93.1% 95.5% 97.3% 98.8% 99.9%
99/100 50% 91.7% 96.1% 97.7% 98.5% 99% 99.3% 99.6% 99.7% 99.9% >99.9%
100/100 100% 100% 100% 100% 100% 100% 100% 100% 100% 100% 100%

Appendix

We’ve seen that we have to take false positives into account, but what about false negatives? Suppose that the rate of infection is 1 in 100 and the accuracy of the test is 99%. If the population is 10,000, then 100 people will have the disease and 9,900 will not. If the population is tested, on average 100 x 99% = 99 of the infected people will get an accurate positive result and 100 x 1% = 1 will get an inaccurate negative result, i.e., a false negative. Similarly, 9,900 x 1% = 99 of the non-infected people will get a false positive. So there will be 99 + 99 = 198 positive results, of which 99 are accurate. 99/198 = 1/2 = 50%.