Please notice that this video doesn't even address  questions like whether we can adequately capture everything that we care about in schooling with a standardized test or whether we will encourage teachers to teach to the test. To me, these questions are open to debate and should be discussed after we are sure that we can use a test to evaluate teachers' efficacy at all. That's the first question to be asked, and I think I think that the answer is "no," or at least, "not yet."   

For more discussion and criticism of using value added measures for the purpose of evaluating individual teachers, see 

A blog entry I wrote earlier in the year

 A blog entry by Eduwonkette 

Missing Data:

The basic problem is that there is a lot of non-randomness in these data sets. The problem mentioned in the video is that the data which happen to be missing are not random--data might be missing because kids miss the test in fall or spring, or because they move to another district or state, and as mentioned, these missing data points come disproportionately from underprivileged kids. Another significant concern is that students are not assigned randomly to teachers. Principals put kids in classrooms based on a number of factors that doubtless vary from principal to principal; few, if any, assign students randomly. And some parents appeal to principals for their child to be in particular classrooms--again, this not a random effect. That's a problem because it means that different teachers might well not be on an equal footing at the start of the year.

Here's a good place to start for a discussion of missing data.  

Reduced Reliability:

I'm assuming that the test administered in the fall and in the spring are, on their own, reliable. Here we're talking about what happens when you try to use both tests simultaneously to draw inferences about what has happened during the year. The reliability problem is difficult to wrap one's mind around.  In fact, this problem is not well known among many psychologists, although it is old stuff for psychometricians. Here's one way to think about it. Suppose fall and spring scores for an entire school were perfectly correlated, say, everyone gained five points during the year.  You wouldn't be able to say anything about who is a good teacher and who isn't by comparing fall and spring scores--every student gained five points. Naturally, the data wouldn't look like that because there is always some random noise in a test. But the higher the correlation between fall and spring, the less opportunity there is for any other factor (including which teacher a student has) to make a difference. The higher the correlation,  the more that all the difference scores look the same and any variation is really just random noise.
If you are familiar with statistics, you may have thought of looking at this issue in a regression framework: the problem is the same, as described here.    
A crucial prediction is that, if value-added scores are not reliable, the apparent effectiveness of teachers will not be stable across years. If they were reliable, we would expect that teachers considered "very good" one year would generally be "very good" in future years. That appears not to be the case: see here and here

The same score gains might not be equivalent:

There are really two issues here. One is the idea that intervals on the test might not actually be equivalent. A test (e.g., the SAT) can be constructed such that the intervals have the same meaning across the scale of the test--the difference between 70 and 75 means the same thing as the difference between 80 and 85. 
The second idea is that, even if intervals on the test mean the same thing across the scale, there may be differences in how difficult it is to teach those intervals. In general, the more you know, the easier it is to learn new things, as described here.  So you would guess that more knowledgeable kids would be easier to teach. But you could imagine this generalization not holding for particular tests. For example, a test may be devised to match state standards, which call for particular factual knowledge. So knowing the facts may get an individual 75% on the test, which might be calibrated to the passing score for a school or district. But to get a better score, students would need deeper conceptual knowledge, which is more difficult to learn and to teach. So getting from %70 to %75 would be easier than getting from %75 to %80

Other people affect what teachers do:

There is no doubt that how much a student learns is strongly affected by his or her teacher. But the school the child attends also matters over and above the teachers in the school. So does the neighborhood in which the school is located.

Should teachers worry only about short-term gains?

It's known that reading pays enormous cognitive dividends, as described here. But this sort of benefit would not be testable at the end of the year; it's a matter of getting kids to read widely and in great variety. I'm not so cynical as to think that most teachers will discard lesson plans that don't contribute to the teacher's pocketbook. But why add that as a factor? Would it be crazy for a teacher to think "Should I spend three weeks on this unit if it's not on the year-end test? Maybe we should just review fractions, which I know will be on the test. . . ."

Effects of peers:

It would seem obvious that having even one disruptive student added to a class might affect the learning of all the other students. But there are also data showing that having high-achieving peers in the class helps students.