Evaluating readability measures

4/9/2014

This piece first appeared on RealClearEducation.com on March 26.

How do you know that whether a book is at the right level of difficulty for a particular child? Or when thinking about learning standards for a state or district, how do we make a judgment about the text difficulty that, say, a sixth-grader ought to be able to handle?

It would seem obvious that an experienced teacher would use her judgment to make such decisions. But naturally such judgments will vary from individual to individual. Hence the apparent need for something more objective. Readability formulas are intended as just such a solution. You plug some characteristics of a text into a formula and it combines them into a number, a point on a reading difficulty scale. Sounds like an easy way to set grade-level standards and to pick appropriate texts for kids.

Of course, we’d like to know that the numbers generated are meaningful, that they really reflect “difficulty.”

Educators are often uneasy with readability formulas; the text characteristics are things like “words per sentence,” and “word frequency” (i.e., how many rare words are in the text). These seem far removed from the comprehension processes that would actually make a text more appropriate for third grade rather than fourth.

To put it another way, there’s more to reading than simple properties of words and sentences. There’s building meaning across sentences, and connecting meaning of whole paragraphs into arguments, and into themes. Readability formulas represent a gamble. The gamble is that the word- and sentence-level metrics will be highly correlated with the other, more important characteristics.

It’s not a crazy gamble, but a new study (Begeny & Greene, 2014) offers discouraging data to those who have been banking on it.

The authors evaluated 9 metrics, summarized in this table:

The dependent measure was student oral reading fluency, which boils down to number of words correctly read per minute. Oral fluency is sometimes used as a convenient proxy for overall reading skill. Although it obviously depends heavily on decoding fluency, there is also a contribution from higher-level meaning processing; if you are understanding what you’re reading, that primes expectations as you read, which makes reading more fluent.

In this experiment, second, third, fourth, and fifth graders each read six passages taken from the DIBELS test: two passages each from below, at, and above their grade level, for a total of six passages.

Previous research has shown that the various readability formulas actually disagree about grade levels (e.g., Ardoin et al, 2005). In this experiment, oral reading fluency was to referee the disagreement. Suppose that according to PSK, passage A is appropriate for second graders and passage B is appropriate for third graders. Meanwhile Spache says both are third-grade passages. If oral reading fluency is better for passage A than passage B, that supports the PSK. (“Faster” was not evaluated only in absolute terms, but accounted for the standard error of the mean).

The researchers used an analytic scheme to evaluate how good a job each metric did of predicting the patterns of student oral reading fluency. Each prediction was considered binary: the grade level assignment predicted that there should be a difference (or not) in oral reading fluency: was a difference observed? Chance, therefore, would be 50%. The data are summarized in the Table

All of the readability formulas were more accurate for higher ability than lower ability students. But only one—the Dale-Chall—was consistently above chance.

So (excepting the Dale-Chall), this study offers no evidence that standard readability formulas provide reliable information for teachers as they select appropriate texts for their students. As always, one study is not definitive, least of all for a broad and complex issue. This work ought to be replicated with other students, and with outcome measures other than fluency. Still, it contributes to what is, overall, a discouraging picture.

References

Ardoin, S. P., Suldo, S. M., Witt, J., Aldrich, S., & McDonald, E. (2005). Accuracy of readability estimates’ predictions of CBM performance. School Psychology Quarterly, 20, 1 – 22.

Begeny, J. C., & Greene, D. J. (2014). Can readability formuas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198-215.

Tom Berend

4/9/2014 12:21:28 pm

Readability and decoding only makes sense up to grade 4 books. Grade 5 books require a huge jump in comprehension skills that is not captured in readability.

The earlier books tend to be self-contained, directly-told, and plot-driven. Grade-5 books start to use imagery, setting, themes, and writing style to tell the story. Characters become more complex, their motivations more ambiguous. World knowledge starts to become more important.

We ran into that in an intensive reading intervention for grade-8 impaired readers. For the first six weeks we focused on the basics of word recognition: blending and segmenting, phonics, and morphology. We practiced finger-point reading and fluency drills. We raced them through grade 2, 3, and 4 books. They developed confidence and began to enjoy reading. Then we hit the limits of word recognition.

For a grade-5 text, we selected "The City of Ember" by Jeanne DuPrau. A proficient grade-5 reader immediately realizes that the city is a post-apocalypse underground survival bunker. But that is never stated directly - the book is written from the point of view of two 12-year-olds who have never imagined a larger world. Their strange little city is lavishly described from their point of view, the pleasure of the book is watching them discover Ember's true nature.

Skilled readers build a mental model of a story and bring their world knowledge, personal experiences, and knowledge of literary conventions to it. They challenge the author. They build a tentative hypothesis of the text's meaning, revising as new information arrives. They monitor their understanding and their own attention, re-reading where necessary. Afterwards, they reflect on what they have read.

Here's what poor comprehension looks like. Our students did not grasp that Ember was underground until they were half-way through the book - in spite of the barrage of comprehension questions that we posed to them after each chapter. They accurately decoded the words and captured the surface gloss - they knew from the first pages that Ember's sky was always dark, that Ember was running out of light bulbs, and that the flood-lamps frequently went out. They knew that the timekeeper changes the date sign and tells the light director when to turn the lights on and off, but that strange behavior did not fire any corresponding light bulbs in their heads.

Happily, there is guidance from research on how to teach reading comprehension. Palinczar & Brown (1984) is a good starting point.

Ben Rogers

4/9/2014 08:46:13 pm

My school has recently begun the Accelerated Reader programme which uses it's own reading levels. Tom has just written the response that I was planning to write (only more eloquantly).

The only extra comment I would make is that reading non-fiction has as an additional variable the learner's existing subject knowledge (very Hirsch). I am sure this inhibits secondary school (high school) teachers from using texts effectively. If there is any research on this, I would be very pleased to read it.

Gavin Haque link

4/10/2014 03:07:53 am

So, are teachers left to use their own judgment as paragraph two indicates? Ben and Tom, I'm starting to read more about Lexile Level scores. I find your comments to be incredibly insightful.

Tom Berend

4/10/2014 03:20:42 am

Prior subject knowledge is a necessary foundation, but a poor comprehender will struggle with learning even at his level of subject mastery. The most attractive solution is to teach comprehension skills integrated with subject matter.

Our intervention was limited to 'literacy' and we selected biography and history for our non-fiction, where the focus was on style and the subject matter was secondary. In retrospect I wish we had attempted comprehension skills training using a science textbook. The challenge of course is to set expectations properly, a reading intervention is not expected to teach physics, and not allocated the necessary time.

Science teachers have the opportunity to embed comprehension into their course, but it is a time-consuming project. If someone wants to try it, a sensible-sounding set of 14 guidelines are suggested in:

Brown, R. (2002). Straddling two worlds: Self-directed comprehension instruction for middle schoolers. In C.C. Block & M Pressley (Eds.), Comprehension Instruction (pp. 337-350). New York, NY: Guilford Press.

Nicole

4/14/2014 12:34:01 pm

Can you offer any practical solutions to this problem that can be used right now? I am asking because I frequently volunteer to read with children in my daughter's grade one classroom. The teacher has bins full of books labelled A-M, but some don't have labels and some are obviously labelled at the wrong level. I would like to help her out by reorganizing and relabelling all the books she has, so that when a child chooses a book, they are getting one they can handle. We are talking about roughly 500 books I would say. What is best practice in levelling these books? How do elementary teachers normally handle this?

Tom

4/14/2014 02:33:26 pm

Nicole: Consider using http://www.lexile.com They have a very large database of books measured against a single readability algorithm. .As this blog post points out, that metric may not be reliable.

Tom

4/14/2014 02:33:52 pm

Nicole: Consider using http://www.lexile.com They have a very large database of books measured against a single readability algorithm. .As this blog post points out, that metric may not be reliable.

Comments are closed.

Evaluating readability measures

Purpose

Archives

Categories