The Myth of Objectivity in Mathematics Assessment

DISCUSSION

In educational assessment—the myriad processes
by which humans try to determine what other
humans “know”—objectivity is a term that simply
does not apply. Alternatively, we can strive for
“agreed-on subjectivity.” The following two specific
suggestions can help improve the consistency and
usefulness of assessment information gathered by
teachers.

First, design classroom assessment tasks that
are likely to elicit the information that you seek.
Ask yourself the following questions: What is the
mathematics that I am trying to assess here? What
tasks will tap this mathematics most directly? A
teacher of second-year algebra might want to know
what students understand about quadratic equations
and the techniques for solving them . The
question in the previously discussed example—consisting
of the one-word imperative “solve”—does not
directly ask students to supply much information
about their understanding. The set of tasks in figure
7, for example, does so more specifically. Greer
et al. (1999) offer guidelines for creating and adapting
tasks for classroom assessment.

No single
instrument
is likely to
measure
knowledge
in any
consistent
and
meaningful
way

a) Use one of the symbolic methods that were
developed in class to find solutions to the
equation

x² + x – 6 = 0.

b) Explain the method that you used in part
(a).

c) Use the graph of a function to illustrate the
solutions that you found in part (a).

d) Finding no real solutions to a quadratic
equation is possible . Explain how this
result could happen. Give an example that
illustrates your explanation.

Fig. 7
A revised quadratic-equation task

Second, before assigning any task to students,
devise—and share with the students—guidelines
for scoring their work. See Thompson and Senk
(1998) and Greer et al. (1999). Ask yourself what
types of responses you are likely to get from students
to these tasks and what you will accept as
evidence of adequate understanding. Thinking
these questions through before giving the tasks to
students helps clarify the tasks themselves. It also
helps align the tasks with the in-class instruction.
Sharing these guidelines with students communicates
expectations and makes meeting them more
likely. One set of guidelines for scoring student work
on the tasks in figure 7 is proposed in figure 8.

CONCLUSION

Such false dichotomies as “objective versus subjective”
and “traditional versus alternative” derail

Teachers
should
consider
ways to make
assessment
more
consistent
and useful

5 – All the characteristics of 4, plus either a
valid example with a clear explanation for
part (d) or exceptional responses to parts
(a) through (c) along with a response to
part (d) that might have some minor
flaws.

4 – Correct responses to parts (a) through (c):
correct equation solutions, along with a
valid explanation of the method; sketch of
graph with all important features correct
and labeled.

3 – Substantial evidence of understanding of
quadratic equations: some minor errors
(not central to understanding quadratic
equations) are the only information that is
missing from the characteristics of a 4.

- - - - - - - - - - - - - - - - - - - - - - - - - - - -

2 – Some evidence of understanding of quadratic
equations is present: either a symbolic
solution or a graphical illustration,
perhaps with some minor errors.

1 – Little understanding of quadratic equations
is shown: major errors in all parts of
the problem.

0 – No attempt made.

Fig. 8
A scoring rubric for the revised task

meaningful discussions of the important issues in
mathematics assessment. The labels “traditional”
and “alternative” are meaningless; a five-question
classroom quiz can give detailed information about
what students know, or it can furnish very little
information, depending on how it is designed,
scored, and used. No “objective” assessment occurs;
subjective—that is, human—knowledge, beliefs,
judgments, and decisions are unavoidable parts of
any assessment scheme. Teachers should consider
ways to make assessment of students’ mathematical
understanding, as well as the information gathered
through that assessment, more consistent and
useful.

APPENDIX

Computing the measurement error and confidence
interval around test scores depends on the concept of
reliability. The reliability of a test is an answer to the
question, How accurately does this test measure what
it intends to measure? In other words, if the test is
administered many times to the same student, how
close will the results be? If we gave the test to two students
who possess the same amount of the knowledge
or ability being measured, how close would the scores
be? As another example, if we were discussing the
reliability of a thermometer, we would ask how close
thermometer readings are to the actual temperature
and how consistently the thermometer produces these
readings.

One way to determine the reliability of a test is to
correlate students’ scores on repeated administrations
of that test. A perfectly reliable test—one that reports
students’ true scores with no error—would have a
“test-retest” reliability 1. However, no test is
perfectly reliable. If you could repeatedly administer a
test, the set of scores for a particular student would be
distributed around the student’s true score. See figure
A-1. The more reliable the test, the higher the
test-retest correlation and the tighter the distribution
of scores. For the very reliable SAT-I Mathematics
test, = 0.91.

Score

Fig. A-1
The distribution of actual scores around a
student’s true score
(Source: Crocker and Algina 1986)

The standard deviation of this distribution of scores
is the standard error of measurement, . It can be calculated
using the standard deviation of the test
scores, , and the test’s reliability, , using

For the SAT-I Mathematics test, and
0.91, so

Therefore, for this student, 68 percent of the scores
that she would earn if she were to take the test
repeatedly would be within thirty points on either side
of her true score. Similarly, 95 percent of her scores
would be within sixty points on either side of her true
score.

Imagine that the test could be administered repeatedly
to two different students. If the difference
between these two students’ scores is computed every
time that the test is administered, these difference
values would also lie on a distribution, this time
around the true difference score for these students.
Because the distributions of the two scores are independent,
the variance of this difference distribution is
equal to the sum of the two individual variances:

For the SAT-I Mathematics test,

Therefore, the “standard error of the difference”
between two SAT-I Mathematics test scores is

To be 95 percent sure that two actual scores represent
different true scores, the actual scores would have to
differ by at least eighty -four points.