# counting is hard, part 2

In the previous post on this topic, we suggested that counting is hard and used the example of counting words in a book.  There are three heuristics for communicating uncertainty (3Q) that we will use to help us organize our thoughts, as we explore the solution of the word counting problem.

1. Express the outcome both qualitatively and quantitatively.
2. Describe the range of possible outcomes.
3. Assess the frequency of potential outcomes.

Express the outcome both qualitatively and quantitatively.
Express the outcome quantitatively? If I could do that, we wouldn’t be having this discussion, so instead, let me have a moment to express what I know qualitatively.  It is possible to count the exact number of words in the book. This will take us a lot of time, and we believe that estimating the number of words will give us an answer that is approximately correct.  Furthermore, we believe that an estimate will suffice based on how we intend to use the data. If I had to guess…well, I’m not very good at guessing how many words are in a book. (This kind of statement may be particularly true when evaluating flight test data.)

Describe the range of possible outcomes.
When we say the “range of possible outcomes”, we are really asking two related questions: 1) What number is the most words possible?, and 2) What is the least words possible?

One source states that the longest book in the world is 38 million words long in 106 volumes (https://themillions.com/2007/09/world-longest-novel.html).  That’s more than 358,000 words per volume.  This seems like a reasonable upper bound.

Instead, I flip through the pages of my book and stop on page 177.  It looks particularly dense. I count the words in each paragraph, which takes me 4 minutes and 8 seconds. I was tired of the task after two paragraphs, but I finish all 8 paragraphs on the page.  When I finish, I notice that I counted two words for every hyphenated word at the end of a line. There were three such instances on the page. I didn’t think I would make many mistakes, but if accuracy is key, these are three mistakes on the first page counted.  It takes me one more minute to put the data in a spreadsheet. The figures I recorded sum to 426 words, but based on the hyphens, I conclude there are 423 words. There are 304 pages in the book. A more reasonable upper bound for the range of possible outcomes of *this* book is 128,592 words.  (My guess was 75,000 before I computed this estimate. I was way off.)

What lower bound is reasonable? Zero is a reasonable lower bound, but I think I can do better without much effort.  One word per page would give me 304 words, and even though some pages have no words, being filled instead with illustrations, I’m certain that the word count would not dip below this number.  All of a sudden, I realize that I counted 426 words on one page, and that would make an even more reasonable lower bound. (Is “more reasonable” a thing?)

How well do I need to estimate the range? It depends, but you already knew that. The nature of the problem will give us a hint at how to answer this problem.

Before I started writing this series, I would have guessed that 65,000 words (approximately half of my upper bound) was a reasonable estimate for the lower bound of the range, but exploring the problem yielded some interesting findings.  For example, I found that of the 304 pages in the book, only 128 are “normal pages” like the one I counted above (page 177). The beginning of each chapter is shorter than a “normal page.” This is true for every chapter. The last page of the chapter varies, but in general, they are shorter than normal.  There are also 69 pages in the book with illustrations, some of which are full-page, and some of which contain words.

One of the major points here is this: the cognitive exercise–actually forcing myself to think through each of the 3Q–is worthwhile in ways we don’t expect.  I’ve learned things: about how many words are in the book but also about my “engineering judgment” (which I had a lot of faith in before starting this task)..

Assess the frequency of potential outcomes.
If we estimated the number of words in the book several different ways, how often would we get 128,592 words?

This is an important question, and it deserves its own, full-length explanation. So…next time?