On my desk is a novel written by one of my favorite authors.  It has a black cover with an image that appears to be the x-ray of a starfish. There are two-hundred and forty three pages in the book, which seems to be quite lengthy for a children’s book. I’m not certain it’s a children’s book.

A quote on the back cover catches my eye: “…tense with hurry and frustration, the story rushes ahead, never losing momentum.” Intriguing, but something else caught my eye and spurred me to pull out a notepad and pen and start counting the words on each line.

There are a fixed number of words printed in the pages of this book, and if I had the time or a computer program, I could count them all.

I pause to think and another, smaller book catches my eye.  The paperback book is about the size of A5 paper. It is thin, but not too thin.  I get the impression that reading it would feel like an accomplishment and would not take an inordinate amount of time.  Each chapter is short, which makes it the kind of book one can pick up and read even if there is only a moment to spare. All of these things make it emotionally satisfying, but none of the data I’ve described answers the question that first distracted me from the words on the page.

How many words are in the book?

(The astute reader will recall that counting is hard, something I argued in the December Flight Test Safety Fact. I also suggested that the comment applies to flight test safety, a point I will make shortly.)

Many of the early chapters are only two pages long, and I could easily count the words in such a chapter and multiply by number of chapters to estimate total words in the book.  Several of the later chapters are longer, and using this rudimentary estimate would not capture the data in those chapters. I could also count all of the words in the book. That method could arrive at the exact answer, but I believe there is a chance I would lose count.

I quickly decide to count words on the page and use that together with the total number of pages to estimate the word count.  This is a heuristic approach. I don’t know if it’s the best way to accomplish the task, but I don’t believe I need the best way to accomplish this task.  As I begin the task, I change my strategy. I count the number of words per line and multiply by number of lines. I won’t count every line, and I attempt to randomize the lines I choose for this counting task.

The number of words in this book is a knowable quantity, but we introduce uncertainty when we count or estimate the number of words.

Statistics has something to say about this process, and when understood properly, classical statistics makes a lot of sense in this case.  At the end of my counting, I will have an estimate for the total number of words together with a confidence interval that may contain the true number of words in the book. The five hundred words I’ve used to describe this problem should suggest to the reader what I’ve been saying: “Counting is hard.”

1. It may be too hard to conduct an exhaustive count.
2. There are many methods and choosing a method is not simple.
3. Estimating the number of words in the book introduces error.

### A Flight Test Example

How much does aircraft heading deviate from runway centerline? I don’t think that many of us have an intuition for this, but suppose for the sake of argument that I need to count the number of times that this difference, |heading_aircraft – heading_runway|, exceeds 5 degrees.

I might need to count it because it indicates some anomaly with the nose wheel steering or gives clues about crosswind limit.

In this example, each sortie (or flight) is like a chapter in the book described above. The takeoff is one page of the chapter, and the landing is another page. Counting words on the page is like finding instances of 5 degree heading deviation.

As enumerated above, it may be too difficult to conduct an exhaustive count. (Many test programs have thousands of hours of flight time and as many sorties.) There may be many ways to estimate the count, but each of these methods will introduce error. And just as before, statistics has something to say about this process.

The next post will explain just what it is that statistics has to say.

