Daniel Willingham--Science & Education
Hypothesis non fingo
  • Home
  • About
  • Books
  • Articles
  • Op-eds
  • Videos
  • Learning Styles FAQ
  • Daniel Willingham: Science and Education Blog

How to make edu-blogging less boring

7/30/2013

 
I read a lot of blogs. I only comment when I think I have something to add (which is rare, even on my own blog) but I read a lot of them.

Today, I offer a plea and a suggestion for making education blogs less boring, specifically on the subject of standardized testing.

I begin with two Propositions about human behavior
  • Proposition 1: If you provide incentives for X, people are more likely to do what they think will help them get X. They may even attempt to get X through means that are counterproductive.
  • Proposition 2: If we use procedure Z to change Y in order to make it more like Y’, we need to measure Y in order to know whether procedure Z is working. We have to be able to differentiate Y and Y’.

A lot of blog posts on the subject of testing are boring because authors pretend that one of these propositions is false or irrelevant.

On Proposition 1: Standardized tests typically gain validity by showing that scores are associated with some outcome you care about. You seldom care about the items on the test specifically. You care about what they signify. Sometimes tests have face validity, meaning test items look like they test what they are meant to test—a purported history test asks questions about history, for example. Often they don’t, but the test is still valid. A well-constructed vocabulary test can give you a pretty good idea of someone’s IQ, for example.

Just as body temperature is a reliable, partial indicator of certain types of disease, a test score is a reliable, partial indicator of certain types of school outcomes. But in most circumstances your primary goal is not a normal body temperature; it’s that the body is healthy, in which case body temperature will be normal as a natural consequence of the healthy state.
Picture
Bloggers ignoring basic propositions about human behavior? What's up with that?
If you attach stakes to the outcome, you can’t be surprised if some people treat the test as something different than that. They focus on getting body temperature to 98.6, whatever the health of the patient. That’s Proposition 1 at work. If a school board lets an administrator know that test scores had better go up or she can start looking for another job. . . well, what would you do in those circumstances? So you get test-prep frenzy. These are social consequences of tests, as typically used.

On Proposition 2: Some form of assessment is necessary. Without it, you have no idea how things are going. You won’t find many defenders of No Child Left Behind, but one thing we should remember is that the required testing did expose a number of schools—mostly ones serving disadvantaged children—where students were performing very poorly. And assessments have to be meaningful, i.e., reliable and valid. Portfolio assessments, for example, sound nice, but there are terrible problems with reliability and validity. It’s very difficult to get them to do what they are meant to do.

So here’s my plea. Admit that both Proposition 1 and Proposition 2 are true, and apply to testing children in schools.

People who are angry about the unintended social consequences of standardized testing have a legitimate point. They are not all apologists for lazy teachers or advocates of the status quo. Calling for high-stakes testing while taking no account of these social consequences, offering no solution to the problem . . . that's boring.

People who insist on standardized assessments have a legitimate point. They are not all corporate stooges and teacher-haters. Deriding “bubble sheet” testing while offering no viable alternative method of assessment . . . that's boring.

Naturally, the real goal is not to entertain me with more interesting blog posts. The goal is to move the conversation forward. The landscape will likely change consequentially in the next two years. This is the time to have substantive conversations.

Fighting stereotype threat in African American and in female students.

7/22/2013

 
Part of the fun and ongoing fascination of science of science is "the effect that ought not to work, yet does."

The impact of values of affirmation on academic performance is such an effect.

Values-affirmation "undoes" the effect of stereotype threat (also called identity threat). Stereotype threat occurs when a person is concerned about confirming a negative stereotype about his or her group. In other words a boy is so consumed with thinking "Everyone expects me to do poorly on this test because I'm African-American" that his performance actually is compromised (see Walton & Spencer, 2009 for a review).

One way to combat stereotype threat is to give the student better resources to deal with the threat--make the student feel more confident, more able to control the things that matter in his or her life.

That's where values affirmation comes in.

In this procedure, students are provided a list of values (e.g., relationships with family members, being good at art) and are asked to pick three that are most important to them and to write about why they are so important. In the control condition, students pick three values they imagine might be important to someone else.

Randomized control trials show that this brief intervention boosts school grades (e.g., Cohen et al, 2006).

Why?

One theory is that values affirmation gives students a greater sense of belonging, of being more connected to other people.

(The importance of social connection is an emerging theme in  other research areas. For example, you may have heard about the studies showing that people are less anxious when anticipating a painful electric shock if they are holding the hand of a friend or loved one.)

A new study (Shnabel et al, 2013) directly tested the idea that writing about social belonging might be a vital element in making values affirmation work.

In Experiment 1 they tested 169 Black and 186 White seventh graders in a correlational study. They did the values-affirmation writing exercise, as described above. The dependent measure was change in GPA (pre-intervention vs. post-intervention.) The experimetners found that writing about social belonging in the writing assignment was associated with a greater increase in GPA for Black students (but not for White students, indicating that the effect is due to reduction in stereotype threat.)

In Experiment 2, they used an experimental design, testing 62 male and 55 female college undergraduates on a standardized math test. Some were specifically told to write about social belonging and others were given standard affirmation writing instructions. Female students in the former group outscored those in the latter group. (And there was no effect for male students.)

The brevity of the intervention relative to the apparent duration of the effect still surprise me. But this new study gives some insight into why it works in the first place.

References:

Cohen, G. L., Garcia, J., Apfel, N., & Master, A. (2006). Reducing
the racial achievement gap: A social-psychological interven-tion. Science, 313, 1307-1310.

Shnabel, N., Purdie-Vaughns, V., Cook, J. E., Garcia, J., & Cohen, G. L. (2013). Demystifying values-affirmation interventions: Writing about social belonging is a key to buffering against identity threat. Personality and Social Psychology Bulletin,

Walton, G. M., & Spencer, S. J. (2009). Latent ability: Grades and test
scores systematically underestimate the intellectual ability of negatively stereotyped students. Psychological Science, 20, 1132-1139.

Out of Control: Fundamental Flaw in Claims about Brain-Training

7/15/2013

 
One of the great intellectual pleasures is to hear an idea that not only seems right, but that strikes you as so terribly obvious (now that you've heard it) you're in disbelief that no one has ever made the point before.

I tasted that pleasure this week, courtesy of a paper by Walter Boot and colleagues (2013).

The paper concerned the adequacy of control groups in intervention studies--interventions like (but not limited to) "brain games" meant to improve cognition, and the playing of video games, thought to improve certain aspects of perception and attention.
PictureControl group
To appreciate the point made in this paper, consider what a control group is supposed to be and do. It is supposed to be a group of subjects as similar to the experimental group as possible, except for the critical variable under study.

The performance of the control group is to be compared to the performance of the experimental group, which should allow an assessment of the impact of the critical variable on the outcome measure.

Now consider video gaming or brain training. Subjects in an experiment might very well guess the suspected relationship between the critical variable and the outcome. They have an expectation as to what is likely to happen. If they do, then there might be a placebo effect--people perform better on the outcome test simply because they expect that the training will help just as some people feel less pain when given a placebo that they believe is a analgesic.

PictureActive control group
The standard way to deal with that problem is the use an "active control." That means that the control group doesn't do nothing--they do something, but it's something that the experimenter does not believe will affect the outcome variable. So in some experiments testing the impact of action video games on attention and perception, the active control plays slow-paced video games like Tetris or Sims.

The purpose of the active control is that it is supposed to make expectations equivalent in the two groups. Boot et al.'s simple and valid point is that it probably doesn't do that. People don't believe playing Sims will improve attention.

The experimenters gathered some data on this point. They had subjects watch a brief video demonstrating what an action video game was like or what the active control game was like. Then they showed them videos of the measures of attention and perception that are often used in these experiments. And they asked subjects "if you played the video game a lot, do you think it would influence how well you would do on those other tasks?"

PictureOut of control group
And sure enough, people think that action video games will help on measures of attention and perception. Importantly, they don't think that they would have an impact on a measure like story recall. And subjects who saw the game Tetris were less likely to think it would help the perception measures, but were more likely to say it would help with mental rotation.

In other words, subjects see the underlying similarities between games and the outcome measures, and they figure that higher similarity between them means a greater likelihood of transfer.

As the authors note, this problem is not limited to the video gaming literature; the need for an active control that deals with subject expectations also applies to the brain training literature.

More broadly, it applies to studies of classroom interventions. Many of these studies don't use active controls at all. The control is business-as-usual.

In that case, I suspect you have double the problem. You not only have the placebo effect affecting students, you also have one set of teachers asked to do something new, and another set teaching as they typically do. It seems at least plausible that the former will be extra reflective on their practice--they would almost have to be--and that alone might lead to improved student performance.

It's hard to say how big these placebo effects might be, but this is something to watch for when you read research in the future.

Reference

Boot, W. R., Simons, D. J., Stothart, C. & Stutts, C. (2013). The pervasive problems with placebos in psychology: Why active control groups are not sufficient to rule out placebo effects. Perspectives in Psychological Science, 8, 445-454.

Better studying = less studying. Wait, what?

7/8/2013

 
Readers of this blog probably know about "the testing effect," later rechristened "retrieval practice." It refers to the fact that trying to remember something can actually help cement things in memory more effectively than further study.

A prototypical experiment looks like this (rows = subject groups; columns = phases of the experiment).
Picture
The critical comparison is the test in Phase three of the experiment; those who take a test during Phase 2 do better than those who study more.. There are lots of experiments replicating the effect and accounting for alternative explanations (e.g., motivation. See Agarwal, Bain & Chamberlain, 2012 for a review).

A consistent finding is that the benefit to memory is larger if the test is harder. But of course if the test is harder, then people might be more likely to make mistakes on the test in Phase 2. And if you make mistakes, perhaps you will later remember those incorrect responses.

But data show that, even if you get the answer wrong during Phase 2 you'll still see a testing benefit so long as you get corrective feedback. (Kornell, Hays & Bjork, 2009).

A tentative interpretation is that you get the benefit because the right answer is lurking in the background of your memory and is somewhat strengthened, even though you didn't produce it.

So that implies that the testing effect won't work if you simply don't know the answer at all. Suppose, for example, that I present you with an English vocabulary word you don't know and either (1) provide a definition that you read (2) ask you to make up a definition or (3) ask you to choose from among a couple of candidate definitions. In conditions 2 & 3 you obviously must simply guess. (And if you get it wrong I'll give you corrective feedback.) Will we see a testing effect?

That's what Rosalind Potts & David Shanks set out to find, and across four experiments the evidence is quite consistent. Yes, there is a testing effect. Subjects better remember the new definitions of English words when they first guess at what the meaning is--no matter how wild the guess.

Guessing by picking from amongst meanings provided by the experimenter provides no advantage over simply reading the definition. So there is something about the generation in particular that seems crucial.
Picture
Results of four experiments in Potts & Shanks, performance on final test. Error bars = standard errors.
What's behind this effect? Potts & Shanks think it might be attention. They suggest that you might pay more attention to the definition the experimenter provides when you've generated your own guess because you're more invested in the problem. Selecting one of the experimenter-provided definitions is too easy to provide this feeling of investment.

This account is speculation, obviously, and the authors don't pretend it's anything else. I wish that they were equally circumspect in their guess at the prospects for applying this finding in the classroom. Sure, it's an important piece of the overall puzzle, but I can't agree that "this line of research is relevant to any real world situation where novel information is to be learned, for example when learning concepts in science, economics, politics, philosophy, literary theory, or art."

The authors in fact cite two other studies that found no advantage for generating over reading, but Potts & Shanks think they have an account for what made those studies not very realistic (relative to classrooms) and what makes their conditions more realistic. They may yet be proven right, but college students in a lab studying word definitions is still a far cry from "any real world situation where novel information is to be learned."

The today-the-classroom-tomorrow-the-world rhetoric is over the top, but it's an interesting finding that may, indeed, prove applicable in the future.

References:

Agarwal, P. K., Bain, P. M. & Chamberlain, R. W. (2012). The value of applied research: Retrieval practice improves classroom learning and recommendations from a teacher, a principal, and a scientist. Educational Psychology Review, 24,  437-448.

Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval
attempts enhance subsequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 35, 989–998

Potts, R., & Shanks, D. R. (2013, July 1). The Benefit of Generating Errors During Learning. Journal of Experimental Psychology: General. Advance online publication. doi:10.1037/a0033194






Book Reviews: Theory and practice

7/1/2013

 
James Paul Gee, a professor at Arizona state, is known as a pioneer in thinking about the educational uses of gaming. His book, What Video Games Have to Teach Us About Learning and Literacy, is considered a landmark in the field.

Thus his new book, The Anti-education Era: Creating Smarter Students through Digital Learning, is bound to attract interest.

Unfortunately, the book ultimately disappoints. Chief among the problems is that—despite the subtitle--there is very little solid advice here about how to change education.

In fact, the first 150 pages scarcely mentions education at all. It is a laundry list—16 chapters in all—of the weaknesses of human cognition. This is territory that has been well covered in other popular books by Chabris & Simon, Ariely, Schacter, Kahneman, and others.

I can’t really fault Gee for not doing as creditable a job in describing human cognition as these authors. It is, after all, their bread & butter, not Gee’s. But the presentation is slow-paced and there are some errors. For example, Gee gets the definition of grit wrong (p. 202) He flatly states that we think well only when we care about what we are doing (p. 12) but the relationship between motivation and performance depends on the complexity of the task and the expertise of the performer.
Picture

It’s only the last 60 pages of the book that address ways that digital technologies might come to our aid in addressing the frailties of human cognition. Here Gee is on his home turf, but it’s too well-trod: getting people to work together, ensuring that people feel safe,

The problem is not that people need to be persuaded that these are good ideas. The problem is that we have evidence in hand that they don’t always work. That means that we need a more nuanced understanding about the conditions under which these ideas work. Gee half recognizes this need, and on occasion warns that solutions will not be simple. But he never takes the next step and outlines the complexities for us.

For example Gee retells (via Jonah Lehrer) the story of a building at MIT that housed professors from a wide variety of disciplines, with a concomitant flowering of intellectual cross-fertilization. Gee quotes (with approval, I guess) Lehrer: “The lesson of Building 20 is that when the composition of the group is right—enough people with different perspectives running into one another in unpredictable ways—the group dynamic will take care of itself.”

As an academic who has been doing interdisciplinary work for 20 years, I would counter: “Like hell it does.”

Virtually every school of education is housed in a building with people trained in different disciplines, and interdisciplinary work remain rare. For reasons I won’t get into here (and much to the despair of university administrators) interdisciplinary work is hard.

So despite the title, educators will find little of interest here.

Common sense strikes back

Gee had better hope he does not meet up with Tom Bennett in a dark alley. Bennett is a British teacher who has been in the classroom since 2003, and has written for the Times Educational Supplement since 2009. (If you’re a reader outside the UK, you may not know that this is a very widely read weekly.)

Bennett’s fourth book, just out, is titled Teacher Proof: Why Research in Education Doesn’t Always Mean What it Claims, and What You Can Do about It.

The book is comprised of three sections: in the first, Bennett provides an overview of education research. In the second he evaluates some education theories, and in the third he suggests a better way forward.

As I read Teacher Proof , I kept thinking “This is one pissed off teacher.” The language is not at all bitter—in fact, it’s frequently quite funny, and Bennett is a marvelous writer—but you can tell that he feel cheated.

Cheated of his time, sitting in professional development sessions that advise an experienced teacher to change his practice based an evidence-free theory.

Cheated of the respect he is due, as researchers with no classroom experience presume to tell him his job, and blame him (or his students) if their magic beans don’t grow a beanstalk.

Cheated of the opportunity to devote all of his attention to his students, given that researchers are not simply failing to help him his do his job, but are actively getting in his way, to the extent that their cockamamie ideas infect districts and schools.


Picture
So what does this angry teacher have to say?

The first third of the book contrasts science and social science. The upshot, as Bennett describes it, is that social sciences aspire to the precision of the “hard” sciences but can’t get there. They are nevertheless full of pretentions, “walking around in mother’s heels and pearls,” as Bennett says, pretending to be a more mature version of itself.

There’s not much nuance in this view. As Bennett describes it, education research is not just badly done science, it is pretty much impossible-to-do-well science, given the nature of the subject matter.

This section of the book struck me as odd, both because it didn’t match my impression of the author’s view, based on his other writings, and in fact conflicts with the second section of the book.

This section offers a merciless, overdue, and often funny skewering of speculative ideas in education: multiple intelligences, Brain Gym, group work, emotional intelligence, 21st century skills, technology in education, learning styles, learning through games. Bennett has a unerring eye for the two key problems in these fads: in some cases, the proposed “solutions” are pure theory, sprouting from bad (or absent) science (eg., learning styles, Brain Gym); others are perfectly sensible ideas transmogrified into terrible practice when people become too dogmatic about their application  (group learning, technology).

Bennett ends each chapter with a calm, pragmatic take, e.g., “yes, I use technology a lot. Here’s where I find it useful.” As he says early on, Experience trumps theory every time.”

But here’s where I think the second section of the book conflicts with the first. Bennett’s consistent criticism of these ideas is that there is no evidence to back them up. To me, this indicates that Bennett doesn’t think that social science research is impossible—he’s just fed up with social science research that’s done badly. In the third section of the book Bennett tells us what different actors in the education world ought to do. It is the briefest section by far--less than ten pages—and the brevity matches the tone of the advice: “Look, a lot of this really isn’t that complicated, gang.”

Namely:

  • Researchers need to take a good long look in the mirror.
  • Media outlets need to be less gullible.
  • And teachers should appear to comply with the district’s latest lunacy, but once the door closes stick to the basics, and Bennett lays out his version of the basics in 8 spare points.

To the “what people should do” list, I’d add another directive: schools of education should raise their standards for what constitutes education research. Bennett is right—too much of it is second-rate.

There is an ugly system of self-interest that has produced the terrible research (and in turn, the need for Bennett’s book). Professors want to publish in peer-reviewed journals because that brings prestige. So publishers create “peer-reviewed” journals that have very low standards because journals bring them money. Institutional libraries buy these terrible journals (keeping them in business) because faculty say that they are needed so that faculty and students can keep up with the latest research. And universities are reluctant to blow the whistle on the whole charade because schools of education—second-rate or not--bring tuition dollars.

Teacher Proof is a worthy read. There have been scattered criticisms of the theories that Bennett takes on, but seldom collected in one place in such readable prose, and seldom (if ever) with a teacher's eye for the details of practice.

Teacher Proof is also a timely read. In the UK, impatience with the influence that shoddy science has had on teaching practice is mounting. Teachers are sick of being told what to do, with phantom “research” used as the excuse.  Would that the same will happen in the US! Teacher Proof may help.

    Enter your email address:

    Delivered by FeedBurner

    RSS Feed


    Purpose

    The goal of this blog is to provide pointers to scientific findings that are applicable to education that I think ought to receive more attention.

    Archives

    April 2022
    July 2020
    May 2020
    March 2020
    February 2020
    December 2019
    October 2019
    April 2019
    March 2019
    January 2019
    October 2018
    September 2018
    August 2018
    June 2018
    March 2018
    February 2018
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    April 2017
    March 2017
    February 2017
    November 2016
    September 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    December 2015
    July 2015
    April 2015
    March 2015
    January 2015
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    March 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    May 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    August 2012
    July 2012
    June 2012
    May 2012
    April 2012
    March 2012
    February 2012

    Categories

    All
    21st Century Skills
    Academic Achievement
    Academic Achievement
    Achievement Gap
    Adhd
    Aera
    Animal Subjects
    Attention
    Book Review
    Charter Schools
    Child Development
    Classroom Time
    College
    Consciousness
    Curriculum
    Data Trustworthiness
    Education Schools
    Emotion
    Equality
    Exercise
    Expertise
    Forfun
    Gaming
    Gender
    Grades
    Higher Ed
    Homework
    Instructional Materials
    Intelligence
    International Comparisons
    Interventions
    Low Achievement
    Math
    Memory
    Meta Analysis
    Meta-analysis
    Metacognition
    Morality
    Motor Skill
    Multitasking
    Music
    Neuroscience
    Obituaries
    Parents
    Perception
    Phonological Awareness
    Plagiarism
    Politics
    Poverty
    Preschool
    Principals
    Prior Knowledge
    Problem-solving
    Reading
    Research
    Science
    Self-concept
    Self Control
    Self-control
    Sleep
    Socioeconomic Status
    Spatial Skills
    Standardized Tests
    Stereotypes
    Stress
    Teacher Evaluation
    Teaching
    Technology
    Value-added
    Vocabulary
    Working Memory