Value Added in the Classroom

Value-Added in the Classroom

In our previous post, we examined two value-added methods, and explained how a gain-score method can give teachers the credit they deserve while not blaming them for external factors. Thus we have a solution at the district level for keeping superintendents accountable for performance while accounting for the makeup of the student body.

Today, we're going to explore some fundamental limitations of value-added methods, see how we run smack into these limitations at the classroom level, and what we can do about it.

Recommendations

Handling Assumptions

You know what they say about metrics. There are lies, big lies, and statistics. Sometimes that's true because the metric's assumptions were violated. Assumptions are conditions that must be true for the statistic to be accurate.

What assumptions do value-added metrics make? Here are two from this study from Rose, Henry and Lauen, a recap of two from SAS, plus two I'll mention.

Stable Unit Treatment Value Assumption (SUTVA):

The "...effect of any teacher on any student does not vary according to the composition of that teacher's classroom". This is not necessarily true. Any class has dynamics that change the class experience for all. We must ensure adequate special services at the district level to support students with special needs.

Ignorability:

Each "student's assignment to a specific teacher (A) -- is independent of their potential outcome under that teacher." "The most widely accepted form of ignorable assignment is randomization." Honestly, most districts carefully place students into classrooms, and rightfully so. Therefore, we should select a model (such as random effects) not sensitive to choice assignment.

Stretch:

"There is sufficient stretch in the scales to ensure that progress can be measured for both low-achieving students as well as high-achieving students." We've discussed this in the Performance Index and Growth. We ultimately need curriculum stretch too, in order for high achievers to reach the highest possible growth.

Test Consistency:

"The scales are sufficiently reliable from one year to the next." This has not been true of Ohio tests. Meeting minutes from the Ohio Technical Committee state that paper and online test items were different and that the test items were more difficult than in past years.

Same Content:

Students learn the same content. If classroom A is covering more material than classroom B, and the value-added calculation does not consider this, this will increase teacher A's score. Thus, we need to adjust our growth expectation in tracked classes (possible with the random effects model).

Result vs. Error:

For every statistic, it's important to compare the size of the result to the size of the error. This is also called the signal to noise ratio. Intuitively, any value-added growth will be small. Students won't learn 5 years of material in 1 year. It's more like 1.1 years in 1 year = a value-added growth of 0.1 years. Thus our results could easily be swamped by measurement error. We're not measuring an elephant here.

What can we do? The first reaction is, use more model parameters! If only we could measure what Johnny ate for dinner, his sleep quality and which seat he sat on in the bus, surely we would have perfect results... Pretty soon we all have video cameras strapped to our heads 24/7 and we STILL don't get good results, because of one terrible problem, the bane of statisticians everywhere.

Overfitting.

You know dot-to-dot puzzles? Where you connect all the dots with one crazy line? Yeah, that happens in statistics too, and when it happens, you lose the ability to handle new observations. An overfitted model can't explain anything a little bit different from previous data. Sorry, camera-toting metric enthusiasts, you're not helping and you're actually hurting us.

1. Avoid model overfitting. More data sometimes makes models worse.

2. Measure teacher growth.

3. Give teachers goals derived from highly specific curriculum and classroom management feedback.

Can Teachers Grow?

Instead of plunging down the overfitting rabbit hole, let's step back a minute and think.

What do we want to DO with this data?

The strategy depends on one question. Can teachers grow?

Extreme position 1: Never

In this perspective, teacher effectiveness is a fixed quantity and nothing changes it. This leads to a rank-and-yank management strategy laser-focused on eliminating metric-identified ineffective personnel. What really happens here is, instead of losing low performers, all the high performers get fed up with the oppressive regime and leave because they can.

Extreme position 2: Always

Here, teacher effectiveness is completely a property of the teacher's environment. People who get married because "he'll change" or "she'll change" subscribe to this philosophy. We know how that turns out. Hundreds of chances later and still no accountability.

The truth? It's somewhere in between "never" and "always", as demonstrated by Dr. Sanders's own research. He found that value-added scores increased for teachers moving from higher-poverty schools to lower-poverty ones. It was a statistically significant increase, although not a large one.

Measuring Teacher Growth

As you can see, what we really want to measure is teacher growth. How can we do that?

Let's break it down into two areas, curriculum and classroom management, and provide specifics.

Great news! We already have curriculum improvement information, because we know what test questions students missed. For example, which feedback would you prefer?

A) Your value-added score is -0.15.

B) The majority of your students answered 5 questions incorrectly on long division where a remainder is present.

Did you pick B? All we need to do is generate a detailed report on the missed questions. The teacher can then propose some sort of curriculum supplement or adjustment to try next year. If students throughout the district are missing the same questions, perhaps a new textbook is needed.

Classroom management is a bit broader, covering student social-emotional readiness, special needs support, and the day-to-day logistics of handling a classroom full of students. How can we get specific feedback here?

Student evaluations are one possibility, such as this example (see page 12) of the Tripod survey done in conjunction with the Measures of Effective Teaching project and developed in consultation with the Shaker Heights school district. The project that "...students seem to know effective teaching when they experience it..." and "Most important are students' perception of a teacher's ability to control a classroom and to challenge students with rigorous work." It's still a bit high-level, but it's detailed enough to develop some specific goals.

Supervisor evaluations are another possibility. As Dr. Sanders states, for the Tennessee school system, "There was very strong correlation between teacher effects as determined by the data and subjective evaluations by supervisors."

It's flexible! The takeaway is to give teachers specific feedback to develop small scope, measurable teacher growth goals that are achievable in a year.

Measure Up Ohio

State Report Cards

You'll Love