Daniel Willingham--Science & Education
Hypothesis non fingo
  • Home
  • About
  • Books
  • Articles
  • Op-eds
  • Videos
  • Learning Styles FAQ
  • Daniel Willingham: Science and Education Blog

What the PISA problem-solving scores mean--or don't.

4/24/2014

 
This blog posting first appeared on RealClearEducation on April 8, 2014.

The 2012 results for the brand-new PISA problem-solving test were released last week. News in various countries predictably focused on how well local students fared, whether they were American, British, Israeli, or Malaysian. The topic that should have been of greater interest  was what the test actually measures.

How do we know that a test measures what it purports to measure? There are a few ways to approach this problem.

One is when the content of the test seems, on the face of it, to represent what you’re trying to test. For example, math tests should require the solution of mathematical problems. History tests should require that test-takers display knowledge of history and the ability to use that knowledge as historians do. 

Things get trickier when you’re trying to measure a more abstract cognitive ability like intelligence. In contrast to math, where we can at least hope to specify what constitutes the body of knowledge and skills of the field, intelligence is not domain-specific. So we must devise other ways to validate the test. For example, we might say that people who score well on our test show their intelligence in other commonly accepted ways, like doing well in school and on the job.

Another strategy  is to define what the construct means—“here’s my definition of intelligence”—and then make a case for why your test items measure that construct as you’ve defined it.

So what approach does PISA take to problem-solving? It uses a combined strategy that ought to prompt serious reflection in education policymakers.

There is not any attempt to tie performance on the test to everyday measures of problem-solving. (At least, none have been offered so far, but there is more detail on the construction of the test to come, in an as-yet-unpublished technical report.)

From the scores report, it appears that the problem-solving test was motivated by a combination of the other two methods.

First, the OECD describes a conception of problem solving—what they think the mental processes look like. That includes the following processes:

·         Exploring and understanding

·         Representing and formulating

·         Planning and executing

·         Monitoring and reflecting

So we are to trust that the test measures problem solving ability because these are the constituent processes of problem solving, and we are to take it that the test authors could devise test items that tap these cognitive processes.

Now, this candidate taxonomy of processes that go into problem-solving seems reasonable at a glance, but I wouldn’t say that scientists are certain it’s right, or even that it’s the consensus best guess. Other researchers have suggested that different dimensions of problem solving are important—for example, well-defined problems vs. ill-defined problems. So pinning the validity of the PISA test on this particular taxonomy reflects a particular view of problem-solving.

But the OECD uses a second argument as well. They take an abstract cognitive process—problem solving—and vastly restrict its sweep by essentially saying “sure, it’s broad, but there is a limited way that we really care about how it’s implemented. So we just test those.”

That’s the strategy adopted by the National Adult Assessment of Literacy. Reading comprehension, like problem solving, is a cognitive process, and, like problem solving, it is intimately intertwined with domain knowledge. We’re better at reading about topics we already know something about. Likewise, we’re better at solving problems in domains we know something about. So in addition to (as best they could) requiring very little background knowledge for the test items, the designers of the NAAL wrote questions that they could argue reflect the kind reading people must do for basic citizenship. Things like reading a government-issued pamphlet about how to vote, and reading a bus schedule, and reading the instructions on prescription medicine.

The PISA problem solving test does something similar. They authors sought to present problems that students might really encounter, like figuring out how to work a new MP3 player, or finding the quickest route on a map, and or figuring out how to buy a subway ticket from an automated kiosk.

So with this justification, we don’t need to make a strong case that we really understand problem-solving at a psychological level at all. We just say “this is the kind of problem solving that people do, so we measured how well students do it.” 

This justification makes me nervous because the universe of possible activities we might agree represent “problem solving” seems so broad, much broader than what we would call activities for “citizenship reading.” A “problem” is usually defined as a situation in which you have a goal and you lack a ready process in memory that you’ve used before to solve the problem or one similar to it. That covers a lot of territory. So how do we know that the test fairly represents this territory?

The taxonomy is supposed to help with that problem. “Here’s the type of stuff that goes into problem solving, and look, we’ve got some problems for each type of stuff.” But I’ve already said that psychologists don’t have a firm enough grasp of problem-solving to advance a taxonomy with much confidence.

So the PISA 2012 is surely measuring something, and what it’s measuring is probably close to something I’d comfortably call “problem-solving.” But beyond that, I’m not sure what to say about it.

I probably shouldn’t get overwrought just yet—as I’ve mentioned, there is a technical report yet to come that will, I hope, leave all of us with a better idea of just what a score on this test means. Gaining that better idea will entail some hard work for education policymakers. The authors of the test have adopted a particular view of problem solving—that’s the taxonomy—and they have adopted a particular type of assessment—novel problems couched in everyday experiences. Education policymakers in each country must determine whether that view of problem solving syncs with theirs, and whether the type of assessment is suitable for their educational goals.

The way that people conceive of the other PISA subjects (math, science and reading) is almost surely more uniform than the way they conceive of problem-solving. Likewise, the goals for assessing those subjects is also more uniform. Thus, the problem of interpreting the problem-solving PISA scores is formidable compared to interpreting other scores. So no one should despair or rejoice over their country’s performance just yet.

Reading to kids, and the use of scientific findings in education

4/16/2014

 
This post first appeared at RealClearEducation on April 1, 2014.

Our scientific understanding is always evolving, changing. Thus, one of the ongoing puzzles in education research is how confident one must be in a set of findings before one concludes it ought to be the basis of educational practice. If the data show that X is true, but X seems really peculiar, do we assume X is probably true, or do we assume that we just don't understand things very well yet? A new study provides something of an object lesson in this problem; in this case "X" was "parents teaching reading at home doesn't help much after kindergarten."

Here's the background on that counterintuitive finding. The work was inspired by the home literacy model (Senechal & LeFevre, 2000). It posits two dimensions of home literacy experience: formal experiences are those in which the parent focuses the child's attention on print, for example by teaching letters of the alphabet, or pointing out that two words look the same, or that we read from left to right.

Informal experiences are those for which print is present, but is not the focus of attention; reading aloud to one's child would be an example. Children usually look at pictures, not print, during a read-aloud.

Previous research from this research team, and others, has shown that formal and informal experiences have different effects. Formal experiences are associated with early literacy skills like knowing letters, and later, with word reading. Informal experiences, in contrast, are associated with growth in vocabulary and general knowledge.

But data supporting the home literacy model have usually been concurrent, not predictive, and have been limited to preschool, kindergarten, and early 1st grade. That is, the research shows an association between the relevant factors measured now, as opposed to showing that the home factors at, say, kindergarten, predict growth in reading outcomes for the 1st grade and beyond. That's peculiar.

There are at least two possible reasons. One is that the home literacy environment does have an impact on literacy growth, but researchers have been looking for the effect in 1st grade - just at the time that school instruction is so heavy. So perhaps the impact of home literacy environment on literacy growth is overwhelmed by the effect of school instruction. A second possible reason is that the home literacy environment may change as a consequence of how the parents perceive their child is doing in school.

A new study (Senechal & LeFevre, 2014) used a clever design to examine both possibilities. Subjects were 84 children in Quebec who spoke English at home, but for whom the language of instruction at school was French. So researchers could test progress in English, and thereby examine the impact of the home literacy environment independent of schooling. The research measured various aspects of children's literacy -- reading and oral language -- from kindergarten until spring of second grade. In addition, researchers used a number of measures to characterize their formal and informal literacy experiences at home.

The results provided strong support for the Home Literacy Model. Formal literacy activities at home were linked not only to performance in reading English, but, in contrast to prior work, a relationship was observed with growth in reading English from kindergarten to 1st Grade. Thus, there is some support of the idea that previous studies failed to observe the relationship because the experiences at school overwhelmed any effect that home experiences might have had.

But that can't be the whole story, because the relationship was no longer observed in 2nd grade. This is where parental responsiveness comes in. English instruction, one hour daily, began in 2nd grade, and so parents began to get feedback from schools about their child's English reading at that time.

Researchers found that the degree to which parents taught their children English at home was positively associated with student outcomes in kindergarten and 1st grade. But there was a negative association in 2nd grade. A straightforward interpretation is that many parents engaged in some English teaching at home during kindergarten and 1st grade, and the more of it they did, the better for their kids. Then in 2nd grade, parents get feedback from the school about their child's reading in English. If their child is doing well, parents ease off on the teaching at home. If their child is doing poorly, they increase reading. Indeed, researchers found that most parents -- 76 percent -- changed their formal literacy practices in response to their child's reading performance in 2nd grade. So you end up with a negative correlation of parental instruction and child performance in 2nd grade. The kids who are doing the worst in reading are the ones whose parents are teaching them the most.

The impact of informal literacy activities like read-alouds did not change; they were consistently linked to growth in vocabulary and other measures of oral language from kindergarten through second grade.

It should be noted that the parents in this study had greater than average education - more than half had a university degree. It's a good bet then, that the baseline home literacy environment was atypically high and that these parents may have been more responsive to their child's literacy outcomes than others would have been. We should not generalize these findings broadly.

Still, in this case, "X" turned out to be explicable and sensible. It appeared that parents teaching literacy at home did not help children's literacy because other variables had gone uncontrolled. This study doesn't solve the broader problem - we never know if our understanding of an issue is incomplete to the point of inaccuracy - but that's one issue on which we are at least closer to the truth. 

References:

Senechal, M., & LeFevre, J. (2002). Parental involvement in the development of children’s reading skill: A 5-year longitudinal study. Child Development, 73, 445–460.

Senechal M. & Lefevre, J. (in press). Continuity and change in the home literacy environment as predictors of growth in vocabulary and reading. Child Development.


Evaluating readability measures

4/9/2014

 
This piece first appeared on RealClearEducation.com on March 26.

How do you know that whether a book is at the right level of difficulty for a particular child? Or when thinking about learning standards for a state or district, how do we make a judgment about the text difficulty that, say, a sixth-grader ought to be able to handle?

It would seem obvious that an experienced teacher would use her judgment to make such decisions. But naturally such judgments will vary from individual to individual. Hence the apparent need for something more objective. Readability formulas are intended as just such a solution. You plug some characteristics of a text into a formula and it combines them into a number, a point on a reading difficulty scale. Sounds like an easy way to set grade-level standards and to pick appropriate texts for kids.

Of course, we’d like to know that the numbers generated are meaningful, that they really reflect “difficulty.”

Educators are often uneasy with readability formulas; the text characteristics are things like “words per sentence,” and “word frequency” (i.e., how many rare words are in the text). These seem far removed from the comprehension processes that would actually make a text more appropriate for third grade rather than fourth.

To put it another way, there’s more to reading than simple properties of words and sentences. There’s building meaning across sentences, and connecting meaning of whole paragraphs into arguments, and into themes. Readability formulas represent a gamble. The gamble is that the word- and sentence-level metrics will be highly correlated with the other, more important characteristics.

It’s not a crazy gamble, but a new study (Begeny & Greene, 2014) offers discouraging data to those who have been banking on it.

The authors evaluated 9 metrics, summarized in this table:

Picture
The dependent measure was student oral reading fluency, which boils down to number of words correctly read per minute. Oral fluency is sometimes used as a convenient proxy for overall reading skill. Although it obviously depends heavily on decoding fluency, there is also a contribution from higher-level meaning processing; if you are understanding what you’re reading, that primes expectations as you read, which makes reading more fluent.

In this experiment, second, third, fourth, and fifth graders each read six passages taken from the DIBELS test: two passages each from below, at, and above their grade level, for a total of six passages. 

Previous research has shown that the various readability formulas actually disagree about grade levels (e.g., Ardoin et al, 2005). In this experiment, oral reading fluency was to referee the disagreement. Suppose that according to PSK, passage A is appropriate for second graders and passage B is appropriate for third graders. Meanwhile Spache says both are third-grade passages. If oral reading fluency is better for passage A than passage B, that supports the PSK. (“Faster” was not evaluated only in absolute terms, but accounted for the standard error of the mean).

The researchers used an analytic scheme to evaluate how good a job each metric did of predicting the patterns of student oral reading fluency. Each prediction was considered binary: the grade level assignment predicted that there should be a difference (or not) in oral reading fluency: was a difference observed?  Chance, therefore, would be 50%. The data are summarized in the Table
Picture
All of the readability formulas were more accurate for higher ability than lower ability students. But only one—the Dale-Chall—was consistently above chance.

So (excepting the Dale-Chall), this study offers no evidence that standard readability formulas provide reliable information for teachers as they select appropriate texts for their students. As always, one study is not definitive, least of all for a broad and complex issue. This work ought to be replicated with other students, and with outcome measures other than fluency. Still, it contributes to what is, overall, a discouraging picture.

References

Ardoin, S. P., Suldo, S. M., Witt, J., Aldrich, S., & McDonald, E. (2005). Accuracy of readability estimates’ predictions of CBM performance. School Psychology Quarterly, 20, 1 – 22.

Begeny, J. C., & Greene, D. J. (2014). Can readability formuas be used to successfully gauge difficulty of reading materials? Psychology in the Schools, 51(2), 198-215.

    Enter your email address:

    Delivered by FeedBurner

    RSS Feed


    Purpose

    The goal of this blog is to provide pointers to scientific findings that are applicable to education that I think ought to receive more attention.

    Archives

    April 2022
    July 2020
    May 2020
    March 2020
    February 2020
    December 2019
    October 2019
    April 2019
    March 2019
    January 2019
    October 2018
    September 2018
    August 2018
    June 2018
    March 2018
    February 2018
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    February 2017
    November 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    December 2015
    July 2015
    April 2015
    March 2015
    January 2015
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    February 2012

    Categories

    All
    21st Century Skills
    Academic Achievement
    Academic Achievement
    Achievement Gap
    Adhd
    Aera
    Animal Subjects
    Attention
    Book Review
    Charter Schools
    Child Development
    Classroom Time
    College
    Consciousness
    Curriculum
    Data Trustworthiness
    Education Schools
    Emotion
    Equality
    Exercise
    Expertise
    Forfun
    Gaming
    Gender
    Grades
    Higher Ed
    Homework
    Instructional Materials
    Intelligence
    International Comparisons
    Interventions
    Low Achievement
    Math
    Memory
    Meta Analysis
    Meta-analysis
    Metacognition
    Morality
    Motor Skill
    Multitasking
    Music
    Neuroscience
    Obituaries
    Parents
    Perception
    Phonological Awareness
    Plagiarism
    Politics
    Poverty
    Preschool
    Principals
    Prior Knowledge
    Problem-solving
    Reading
    Research
    Science
    Self-concept
    Self Control
    Self-control
    Sleep
    Socioeconomic Status
    Spatial Skills
    Standardized Tests
    Stereotypes
    Stress
    Teacher Evaluation
    Teaching
    Technology
    Value-added
    Vocabulary
    Working Memory