There are (at least) two possible objectives for standardized tests; as a general rule tests should be constructed for either one goal or the other.
Curriculum-based tests are designed to measure a student’s mastery of a specific concept. These are sometimes called “board exams” or “regents exams”. For example, a high-school chemistry test designed to measure mastery of the high-school chemistry curriculum.
Aptitude tests are designed to measure a student’s ability to master concepts they have not yet studied. These are similar to IQ tests, but generally aren’t trying to measure general intelligence.
A major difference is how “teaching to the test” is handled.
For a curriculum-based test, this can be a good thing. While knowing a handful of questions and unduly focusing on memorizing canned answers is problematic, teaching to a battery of 1200 well-constructed questions is a perfectly viable pedagogical approach. If you squint, it is simply “assigning homework problems from a textbook”.
Teaching to the test for an aptitude test both wastes time and invalidates the test results. Aptitude test question batteries are generally not released publicly to try to prevent this from happening.
Case Study: Missouri Science Exams
You can find practice exams online that should reflect the content of the actual exams. There are 5th and 8th grade exams available.
The 8th grade exam is clearly a curriculum exam. The questions require knowledge of science topics, such as:
What is the amplitude and frequency of a wave?
What is kinetic energy?
What are some organelles found in plant cells?
What are natural selection and evolution?
How do ecological systems work?
These are all reasonable questions to expect an 8th grader to be able to understand and answer.
I am genuinely unsure whether the 5th grade exam is supposed to be a curriculum exam or an aptitude exam. Many of the questions are attempting to test an abstract knowledge of the scientific process, yet they assume so little knowledge on behalf of the student that the questions are vague and the answers debatable by experts. Before I comment on the test, I will present some questions that I would consider reasonable.
What two elements make up most of the mass of the sun?
What animal is around the size of a horse and has striped black-and-white skin?
Of plastic, glass, metal, and wood: which are safe to put into a microwave?
What organ of the human body filters waste products out of the blood stream and converts them into urine?
Why is it easier to nail a nail into a piece of wood using a hammer than using your fist?
Those questions are not trivial: Google search struggles to tell me when it is safe to put wood or plastic in a microwave.
However, we get questions instead like:
Analyze a toy model of the water cycle where ice cubes represent the atmosphere.
“Use evidence” from an over-simplified diagram to demonstrate that older rocks are often below newer ones.
What is the examinations’ technical definition of the term “external structure”?
Should bluegrass be planted with buffalo grass in an under-specified field? (Both yes or no are acceptable answers, according to the answer guide.)
It’s not entirely clear what is being tested. Some of these questions are reading comprehension tests; can you read a several paragraph passage (containing words from the curriculum) and answer simple questions? Others seem to be assessing scientific knowledge; but they are not testing basics, rather they are testing specific trivia that happens to be in the state curriculum. A third option is testing critical thinking; yet the scenarios are too simplistic to say anything useful.
My estimate is that the average adult would do better on the 8th grade exam than on the 5th grade exam. Which leads to the question of “what are we measuring”?
Algebra Tests
To avoid issues related to cultural knowledge for now, let us go back to considering Algebra education. There are several different standardized tests that could be used here:
An Algebra Aptitude Test would be given to 6th graders, for educational tracking purposes. I’m not sure a high-quality exam exists. Most of the tests that I see discussed online are simply testing whether students already know the material.
An Algebra Mastery Test should be given at the conclusion of an Algebra I course. The State of Missouri has a practice test; it is not a particularly ambitious Algebra I curriculum.
An Algebra Excellence Test covers more difficult processes that many students will not be able to answer, even after taking Algebra I. An example of this test is the AMC-10 (see 2021 questions). While the tails diverge between “mathematical skill” and “mathematical test-taking skill” at a certain point, at this level the correlation is more than good enough.
It is easier to measure the abstract concept of “mathematical ability” after students have studied algebra. After all, nobody would ever put a reading passage defining the logarithm on a standardized test. For science topics, that testing approach is more common. For example, the ACT does this.
College Admission Examinations
It is a matter of extreme controversy whether the ACT and SAT should be assessing students’ knowledge of a high-school curriculum, or assessing students’ ability to succeed in college in some other way. We first simply look at what the ACT does regarding science.
The science section measures the interpretation, analysis, evaluation, reasoning, and problem-solving skills required in the natural sciences. The section presents several authentic scientific scenarios, each followed by a number of multiple-choice questions.
The content includes biology, chemistry, Earth/space sciences (e.g., geology, astronomy, and meteorology), and physics. Advanced knowledge in these areas is not required, but background knowledge acquired in general, introductory science courses may be needed to correctly answer some of the questions. (act.org)
“Problem solving skills” are notoriously difficult to teach, and are certainly useful for prospective college students to have. And presumably there is quite a bit of research supporting the ACT’s testing methodology. At a glance, the questions are reasonable.
And “are the questions reasonable” is the key question. Some people don’t like tests because they don’t like the facts those results point out; I am extremely willing to ignore those people. Other people don’t like tests because the tests don’t do what they claim to do; in certain situations that seems to be a valid criticism.
And as a public policy concern: it is a truth universally acknowledged that one cannot legislate competence. So how can the State determine that its standardized tests are actually effective? While in the abstract that is a Hard Problem, as a tactical measure we can simply use tests that are known to be effective and not use tests that are not effective. Unfortunately, such advice is easier said than done.