What aspects of background, personality, or achievement predict success in college--at least, "success" as measured by GPA? A recent meta-analysis (Richardson, Abraham, & Bond, 2012) gathered articles published between 1997 and 2010, the products of 241 data sets. These articles had investigated these categories of predictors: - three demographic factors (age, sex, socio-economic status)
- five traditional measures of cognitive ability or prior academic achievement (intelligence measures, high school GPA, SAT or ACT, A level points)
- No fewer than forty-two non-intellectual measures of personality, motivation, or the like, summarized into the categories shown in the figure below (click for larger image).
Make this fun. Try to predict which of the factors correlate with college GPA.
Let's start with simple correlations.
41 out of the 50 variables examined showed statistically significant correlations. But statistical significance is a product of the magnitude of the effect AND the size of the sample--and the samples are so big that relatively puny effects end up being statistically significant. So in what follows I'll mention correlations of .20 or greater.
Among the demographic factors, none of the three were strong predictors. It seems odd that socio-economic status would not be important, but bear in mind that we are talking about college students, so this is a pretty select group, and SES likely played a significant role in that selection. Most low-income kids didn't make it, and those who did likely have a lot of other strengths.
The best class of predictors (by far) are the traditional correlates, all of which correlate at least r = .20 (intelligence measures) up to r = .40 (high school GPA; ACT scores were also correlated r = .40).
Personality traits were mostly a bust, with the exception of consientiousness (r = .19), need for cognition (r = .19), and tendency to procrastinate (r = -.22). (Procrastination has a pretty tight inverse relationship to conscientiousness, so it strikes me as a little odd to include it.)
Motivation measures were also mostly a bust but there were strong correlations with academic self-efficacy (r = .31) and performance self-efficacy (r = .59). You should note, however, that the former is pretty much like asking students "are you good at school?" and the latter is like asking "what kind of grades do you usually get?" Somewhat more interesting is "grade goal" (r = .35) which measures whether the student is in the habit of setting a specific goal for test scores and course grades, based on prior feedback.
Self-regulatory learning strategies likewise showed only a few factors that provided reliable predictors, including time/study management (r = .22) and effort regulation (r = .32), a measure of persistence in the face of academic challenges.
Not much happened in the Approach to learning category nor in psychosocial contextual influences.
We would, of course, expect that many of these variables would themselves be correlated, and that's the case, as shown in this matrix.
So the really interesting analyses are regressions that try to sort out which matter more.
The researchers first conducted five hierarchical linear regressions, in each case beginning with SAT/ACT, then adding high school GPA, and then investigating whether each of the five non-intellective predictors would add some predictive power. The variables were conscientiousness, effort regulation, test anxiety, academic self efficacy, and grade goal, and each did, indeed, add power in predicting college GPA after "the usual suspects" (SAT or ACT, and high school GPA) were included.
But what happens when you include all the non-intellective factors in the model?
The order in which they are entered matters, of course, and the researchers offer a reasonable rationale for their choice; they start with the most global characteristic (conscientiousness) and work towards the more proximal contributors to grades (effort regulation, then test anxiety, then academic self-efficacy, then grade goal).
As they ran the model, SAT and high school GPA continued to be important predictors. So were effort regulation and grade goal.
You can usually quibble about the order in which variables were entered and the rationale for that ordering, and that's the case here. As they put the data together, the most important predictors of college grade point average are: your grades in high school, your score on the SAT or ACT, the extent to which you plan for and target specific grades, and your ability to persist in challenging academic situations.
There is not much support here for the idea that demographic or psychosocial contextual variables matter much. Broad personality traits, most motivation factors, and learning strategies matter less than I would have guessed.
No single analysis of this sort will be definitive. But aside from that caveat, it's important to note that most admissions officers would not want to use this study as a one-to-one guide for admissions decisions. Colleges are motivated to admit students who can do the work, certainly. But beyond that they have goals for the student body on other dimensions: diversity of skill in non-academic pursuits, or creativity, for example.
When I was a graduate student at Harvard, an admissions officer mentioned in passing that, if Harvard wanted to, the college could fill the freshman class with students who had perfect scores on the SAT. Every single freshman-- 800, 800. But that, he said, was not the sort of freshman class Harvard wanted.
I nodded as though I knew exactly what he meant. I wish I had pressed him for more information.
References: Richardson, M., Abraham, C., Bond, R. (2012). Psychological correlates of university students' academic performance: A systematic review and meta-analysis. Psychological Bulletin, 138, 353-387.
The PIRLS results are better than you may realize.
Last week, the results of the 2011 Progress in International Reading Literacy Study (PIRLS) were published. This test compared reading ability in 4th grade children.
U.S. fourth-graders ranked 6th among 45 participating countries. Even better, US kids scored significantly better than the last time the test was administered in 2006.
There's a small but decisive factor that is often forgotten in these discussions: differences in orthography across languages.
Lots of factors go into learning to read. The most obvious is learning to decode--learning the relationship between letters and (in most languages) sounds. Decode is an apt term. The correspondence of letters and sound is a code that must be cracked.
In some languages the correspondence is relatively straightforward, meaning that a given letter or combination of letters reliably corresponds to a given sound. Such languages are said to have a shallow orthography. Examples include Finnish, Italian, and Spanish.
In other languages, the correspondence is less consistent. English is one such language. Consider the letter sequence "ough." How should that be pronounced? It depends on whether it's part of the word "cough," "through," "although," or "plough." In these languages, there are more multi-letter sound units, more context-depenent rules and more out and out quirks.
Another factor is syllabic structure. Syllables in languages with simple structures typically (or exclusively) have the form CV (i.e., a consonant, then a vowel as in "ba") or VC (as in "ab.") Slightly more complex forms include CVC ("bat") and CCV ("pla"). As the number of permissible combinations of vowels and consonants that may form a single syllable increases, so does the complexity. In English, it's not uncommon to see forms like CCCVCC (.e.g., "splint.")
Here's a figure (Seymour et al., 2003) showing the relative orthographic depth of 13 languages, as well as the complexity of their syllabic structure.
From Seymour et al (2003) Orthographic depth correlates with incidence of dyslexia (e.g., Wolf et al, 1994) and with word and nonword reading in typically developing children (Seymour et al. 2003). Syllabic complexity correlates with word decoding (Seymour et al, 2003).
This highlights two points, in my mind.
First, when people trumpet the fact that Finland doesn't begin reading instruction until age 7 we should bear in mind that the task confronting Finnish children is easier than that confronting English-speaking children. The late start might be just fine for Finnish children; it's not obvious it would work well for English-speakers.
Of course, a shallow orthography doesn't guarantee excellent reading performance, at least as measured by the PIRLS. Children in Greece, Italy, and Spain had mediocre scores, on average. Good instruction is obviously still important.
But good instruction is more difficult in languages with deep orthography, and that's the second point. The conclusion from the PIRLS should not just be "Early elementary teachers in the US are doing a good job with reading." It should be "Early elementary teachers in the US are doing a good job with reading despite teaching reading in a language that is difficult to learn."
References
Seymour, P. H. K., Aro, M., & Erskine, J. M. (2003). Foundation literacy acquisition in European orthographies. British Journal of Psychology, 94, 143-174.
Wolf, M., Pfeil, C., Lotz, R., & Biddle, K. (1994). Towarsd a more universal understanding of the developmental dyslexias: The contribution of orthographic factors. In Berninger, V. W. (Ed), The varieties of orthographic knowledge, 1: Theoretical and developmental issues.Neuropsychology and cognition, Vol. 8., (pp. 137-171). New York, NY, US: Kluwer
Michael Gove, Secretary of State for Education in Great Britain, delivered a speech on education policy last week called "In Praise of Tests" ( text here), in which he argued for "regular, demanding, rigourous examinations." The reasons offered included arguments invoking scientific evidence, and cited my work as examples of such evidence. That invites the question "Does Willingham think that the scientific evidence supports testing, as Gove suggested?" This question really has two parts. Did Gove get the science right? And did he apply it in a way that is likely to work as he expects?The answer to the first question is straightforward: yes, he got the science right. The answer to the second question is that I agree that testing is necessary, but have a different take on the scientific backing for this claim than Gove offered. First, the science. Gove made three scientific claims. First, that people enjoy mental activity that is successful--it's fun to solve challenging problems. Much of the first chapter of Why Don't Students Like School is devoted to this idea, but it's a commonplace observation; that's why people enjoy problem-solving hobbies like crossword puzzles or reading mystery novels. Second, Gove claimed that background knowledge is critical for higher thought, a topic I've written about in several places (e.g., here). The only quibble I have with Gove on this topic is when he says "Memorisation is a necessary precondition of understanding." I'd have preferred "knowledge," to "memorisation" because the latter makes it sound as though one must sit down and willfully commit information to memory. This is a poor way to learn new information--it's much more desirable that the to-be-learned material is embedded in some interesting activity, so that the student will be likely to remember it as a matter of course. It's plain that Gove agrees with me on this point, because he emphasized that exam preparation should not mean a dull drilling of facts, but rather should happen through "entertaining narratives in history, striking practical work in science and unveiling hidden patterns in maths." I think the word "memorisation" may be what led the Guardian to use a headline suggesting Gove was advocating rote learning. Third, Gove argued that people (teachers and others) are biased in their evaluations of students, based on the student's race, ethnicity, gender, or other features that have nothing to do with the students actual performance. A number of studies from the last forty years show that this danger is real.So on the science, I think Gove is on firm ground. What of the policy he's advocating? I lack expertise in policy matters, and I've argued on this blog that the world of education might be less chaotic if each of us stuck a little closer to the home territory of what we know. Worse yet, I know little about the British education system nor about Gove's larger policy plans. With those caveats in place, I'll tread on Gove's territory and offer these thoughts on policy.It's true that successful thought brings pleasure. The sort of effort I (and others) meant was the solving of a cognitive problem. Gove offers the example of a singer finishing an aria or a craftsman finishing an artefact. These works of creative productivity likely would bring the sort of pleasure I discussed. It's less certain that the passing of examination would be "successful thought" in this sense. Why? Because exams seldom call for the creative deployment of knowledge. Instead, they call for the straightforward recall of knowledge. That's because it's very difficult to write exams that call for creative responses, yet are psychometrically reliable and valid. There is a second manner in which achievement can bring pleasure; I haven't written about it, but I think it's the one Gove may have in mind. It's the pleasure of overcoming a formidable obstacle that you were not sure you could surmount. I agree that passing a difficult test could be a profound experience. Some children really don't see themselves as students. They have self-confidence, but it comes from knowing that they are effective in other activities. Passing a challenging exam might prompt child who never really thought of himself as "a student" to recognize that he's every bit as able as other children, and that might redirect the remainder of his school experience, even his life. But there are some obvious difficulties in reaching this goal. How do we motivate the student to work hard enough to actually pass the difficult test? The challenge of the exam is unlikely to do it--the child is much more likely to conclude that he can't possibly pass, so there is no point in trying. The clear solution is to engage creative teachers who have the skill to work with students who begin school poorly prepared and who may come from homes where education is not a priority. But motivation was the problem we began with, the one we hoped to address. It seems to me that the motivational boost we get from kids passing a tough exam might be a good outcome of successfully motivating kids. It's not clear to me that it will motivate them. My second concern in Gove's vision of testing is how teachers will believe they should best prepare kids for a difficult exam that demands a lot of factual recall. Gove is exactly right when he argues that teachers ought not to construe this as a call for rote learning of lists of facts, but rather should ensure that rich factual content is embedded in rich learning activities. My concern is that some British teachers--in particular, the ones whose performance Gove hopes to boost--won't listen to him. I say that because of the experience in the US with the No Child Left Behind Act. In the face of mandatory testing for students, some teachers kept doing what they had been doing, which is exactly what Gove suggests; rich content interwoven with a demand for critical thinking, delivered in a way that motivates kids. These teachers were unfazed by the test, certain that their students would pass. Other teachers changed lesson plans to emphasize factual knowledge, and focused activities on test prep. I've never met a teacher who was happy about this change. Teachers emphasize facts at the expense of all else and engage in disheartening test prep because they think it's necessary. Teachers believed it was necessary because (1) they were uncertain that their old lesson plans would leave kids with the factual knowledge base to pass the test; or (2) they thought that their students entered the class so far behind that extreme measures were necessary to get them to the point of passing; or (3) they thought that the test was narrow or poorly designed and would not capture the learning that their old set of lesson plans brought to kids; or (4) some combination of these factors. So pointing out that exam prep and memorization of facts is bad practice will probably not be enough. Despite these difficulties, I think some plan of testing is necessary. Gove puts it this way: "Exams help those who need support to better know what support they need." A cognitive psychologist would say "learning is not possible without feedback." That learning might be an individual student mastering a subject, OR a teacher evaluating whether his students learned more from a new set of lesson plans he devised compared to last year, OR whether students at a school are learning more with block scheduling compared to their old schedule. In each case, you want to be confident that the feedback is valid, reliable, and unbiased. And if social psychology has taught us anything in the last fifty years, it's that people will believe their informal judgments are valid, reliable, and unbiased, whether they are or not. There's more to the speech and I encourage you to read all of it. Here I've commented only on some of the centerpiece scientific claims in it. Again, I emphasize that I don't know British education and I don't know Gove's plans in their entirety, so what I've written here may be inaccurate because it lacks broader context. I can confidently say this: hard as it is, good science is easier than good policy.
The insidious thing about tests is that they seem so straightforward. I write a bunch of questions. My students try to answer them. And so I find out who knows more and who knows less. But if you have even a minimal knowledge of the field of psychometrics, you know that things are not so simple. And if you lack that minimal knowledge, Howard Wainer would like a word with you. Wainer is a psychometrician who spent many years at the Educational Testing Service and now works at the National Board of Medical Examiners. He describes himself as the kind of guy who shouts back at the television when he sees something to do with standardized testing that he regards as foolish. These one-way shouting matches occur with some regularity, and Wainer decided to record his thoughts more formally. The result is an accessible book, Uneducated Guesses, explaining the source of his ire on 10 current topics in testing. They make for an interesting read for anyone with even minimal interest in the topic. For example, consider the making of a standardized test like the SAT or ACT optional for college applicants, a practice that seems egalitarian and surely harmless. Officials at Bowdoin College have made the SAT optional since 1969. Wainer points out the drawback--useful information about the likelihood that students will succeed at Bowdoin is omitted. Here's the analysis. Students who didn't submit SAT scores with their application nevertheless took the test. They just didn't submit their scores. Wainer finds that, not surprisingly, students who chose not to submit their scores did worse than those who did, by about 120 points. Figure taken from Wainer's blog. Wainer also finds that those who didn't submit their scores had worse GPAs in their freshman year, and by about the amount that one would predict, based on the lower scores.
So although one might reject the use of a standardized admissions tests out of some conviction, if the job of admissions officers at Bowdoin is to predict how students will fare there, they are leaving useful information on the table.
The practice does bring a different sort of advantage to Bowdoin, however. The apparent average SAT score of their students increases, and average SAT score is one factor in the quality rankings offered by US News and World Report.
In another fascinating chapter, Wainer offers a for-dummies guide to equating tests. In a nutshell, the problem is that one sometimes wants to compare scores on tests that use different items—for example, different versions of the SAT. As Wainer points out, if the tests have some identical items, you can use performance on those items as “anchors” for the comparison. Even so, the solution is not straightforward, and Wainer deftly takes the reader through some of the issues.
But what if there is very little overlap on the tests?
Wainer offers this analogy. In 1998, the Princeton High School football team was undefeated. In the same year, the Philadelphia Eagles won just three games. If we imagine each as a test-taker, the high school team got a perfect score, whereas the Eagles got just three items right. But the “tests” each faced contained very different questions and so they are not comparable. If the two teams competed, there's not much doubt as to who would win.
The problem seems obvious when spelled out, yet one often hears calls for uses of tests that would entail such comparisons—for example, comparing how much kids learn in college, given that some major in music, some in civil engineering, and some in French.
And yes, the problem is the same when one contemplates comparing student learning in a high school science class and a high school English class as a way of evaluating their teachers. Wainer devotes a chapter to value-added measures. I won't go through his argument, but will merely telegraph it: he's not a fan.
In all, Uneducated Guesses is a fun read for policy wonks. The issues Wainer takes on are technical and controversial—they represent the intersection of an abstruse field of study and public policy. For that reason, the book can't be read as a definitive guide. But as a thoughtful starting point, the book is rare in its clarity and wisdom.
I want to highlight two incredibly valuable papers, although they are increasingly dated. One paper reports on an enormous project in which observers went into a large sample of US first grade classrooms (827 of them in 295 districts) and simply recorded what was happening. The other paper reported on a comparable project for third grade classrooms (780 students in 250 districts) Both papers are a treasure trove of information, but I want to highlight one striking datum: the percentage of time spent on science. In first grade classrooms it was 4%. In third grade classrooms it was 5%. There are a few oddities that might make you wonder about these figures. In the 1st grade paper, the observations typically took place in the morning, so perhaps teachers tend to focus on ELA in the morning and save science for the afternoon. But the third grade project sampled throughout the day.And although there's always some chance that there's something odd about the method, the estimates accord with estimates using other measures, such as teachers' estimates. (See data from an NSF project here.) And before you blame NCLB for crowding science out of the classroom, note that the data for these studies were collected before NCLB. (1st grade, mostly '97-98; 3rd grade, mostly '00-'01). I don't think there's much reason to suspect that the time spent on science instruction has increased, and smaller scale studies indicate it hasn't.The fact that so little time is spent on science is, to me, shocking.It's even more surprising when paired with the observation that US kids fare pretty well in international comparisons of science achievement. In 2003, when more or less the same cohort of kids took the TIMMS US kids ranked 6th in science. (They ranked 5th in 2008.)How are US kids doing fairly well in science in the absence of science instruction?Possibly US schools are terribly efficient in science instruction and get a lot done in minimum time. Possibly other countries are doing even less. Possibly US culture offers good support for informal opportunities to learn science. It remains a puzzle.There is a lot of talk about STEM instruction these days. In most districts, science doesn't get serious until middle school. US schools could be doing a whole lot more with more time devoted to science instruction. I'll have more to say about time in elementary classrooms next week. NICHD Early Child Care Research Network (2002). The relation of global first-grade classroom environment to structural classroom features and teacher and student behaviors. The Elementary School Journal, 102, 367-387. NICHD Early Child Care Research Network (2005). A day in third grade: A large-scale study of classroom quality and teacher and student behavior. The Elementary School Journal, 105, 305-323.
Am I stupid if I can't turn on my stove? The picture below (or one very similar) appears in most textbooks on human factors psychology.
The arrangement of controls is spatially incompatible with the arrangement of stove elements, so if I want to turn on the back left element, I may very well turn on the front left one. What's notable is that this stove likely came with an instruction book, describing which knob goes with which burner. But something about that feels wrong. It feels like the designer of the stove should have known how my mind works, and taken that into account, rather than shrugging and saying "well, it's in the manual. It's not my fault if you don't read the manual."The stove reminds me of value-added measures of teacher effectiveness.Even the staunchest boosters of value-added measures agree that they should not be the whole story, that there should be multiple measures of teacher effectiveness. But I'm afraid that asking people to remember that fact is a little like asking people to remember which knob goes with which burner on their stove. It's not that people can't do it, but you are swimming upstream of the mind's biases.To be clear, I don't think that there are data to prove this contention, but let me describe why I'm guessing it's true.We're talking about a case of missing information: you tell people: "Teacher Smith's value-added score is X. By the way, value-added scores are incomplete as a measure of teacher effectiveness" How do people interpret information that they know to be incomplete? It varies with the situation. Sometimes they assume the missing information is positive. ("I haven't heard that the roads are closed, so I guess all's well.") Sometimes they assume missing information is negative ("He left 'prior experience' blank, so I guess he doesn't have any.") And sometimes missing information is forgotten or discounted. My guess--and I emphasize that it's a guess--is that will be the case here. I make this guess in part by analogy to the evaluation of college applicants. A student's high school record has lots of "soft" components, the values of which are tricky to evaluate: participation in sports and clubs, leadership positions, recommendations from teachers. . .. even a student's grade point average must be evaluated in light of the difficulty of the courses taken and the competitiveness of the high school. But then there's the SAT. It has the gloss of being numeric, and it is easy to make comparisons across students. Make no mistake, I believe that the SAT does what it's supposed to do--predict success in the freshman year of college. But it's often interpreted to be much more meaningful than that. That's the problem. I'm afraid that value-added measures will have the same problem. They are produced via a fancy formula, they make it simple to make comparisons, and they are numeric, which can lead one to conclude that they are more precise than they really are. And at this point, we don't even have any of the other "soft" measures to round out the picture of teacher effectiveness. I don't think value-added measures are meaningless. But handing people value-added measures with the bland warning "these are incomplete" is like giving me a stove with a bad mapping plus an instruction booklet. The solution to the stove problem is straightforward The solution to teacher evaluation is not straightforward, and I won't attempt to resolve it in a blog posting.
My purpose here is simply to highlight the problem in publishing value-added data for individual teachers, with the caveat "these measures are incomplete." I predict that caveat will go unnoticed or be forgotten.
An article in yesterday’s New York Times covered some recent research on the increasing education achievement gap between rich and poor. It’s worth a read, but it misses a couple of important points. Regarding reasons for the gap, the article dwells on one hypothesis, commonly called the investment theory: richer families have more money to invest in their kids. (The article might have mentioned that richer families not only have more financial capital, but more human capital and social capital.) The article does not mention at all another major theory of the economics of educational achievement; stress theory. Kids (and parents) who live in poverty live under systemic stress. A great deal of research in the last ten years has shown that this stress has direct cognitive consequences for kids, and also affects how parents treat their kids. (Any parent knows that you’re not at your best when you’re stressed.) An open-access review article on this research can be found here. Another important point the article misses concerns what might be done. It ends with a gloomy quote from an expert: “No one has the slightest idea what will work. The cupboard is bare.” I think there is more reason for optimism, because other countries are doing a better job with this problem than we are. The OECD analyzes the PISA results by reported family SES. In virtually every country, high SES kids outperform low SES kids. But in some countries, the gap is smaller, and that’s it’s not just countries that have smaller income gaps. Economic inequality within a country is often measured with a statistic called the Gini coefficient which varies from 0 (everyone has the same net worth) to 1 (one person has all the money, and the other has nothing). Rich children score better than poor children in countries with large Gini coefficients (like the US) and the rich outscore the poor in countries with lower Gini coefficients (like Norway). Being poor predicts lower scores everywhere, but the disparity of wealth means more in the US than it does in other countries. What’s significant is that the relationship between income and test performance is stronger in the US than it is in most countries. (The US has the 3rd strongest relationship between income and student performance in Science and 10th highest for math, in the 2006 PISA results). Some countries, (e.g., Hong Kong), despite an enormous disparity between rich and poor, manage to even the playing field when the kids are at school. The US does a particularly poor job at this task; wealthy kids enjoy a huge advantage over poor kids. People generally argue that the US is different than Hong Kong, we’re a large, heteroogenous country, and so forth. All true, but the defeatist attitude won’t get us anywhere. We need more systematic study of how those countries solve the problem.
|