STANYS 2004 Presentation

An Alternative Approach to State Examinations in Science Education

ACASE Presentation at the 2004 annual meeting of Science Teachers’ Association of New York State (STANYS) Presenters: Paul Zachos and William E. J. Doane

This is a version of our presentation at the 2004 meeting of STANYS. We have included links to graphs, slides, and papers mentioned during the presentation. If you have any questions or comments about this material, please feel free to contact us.

We are at a crucial turning point for educational assessment and testing. For the sake of simplicity, I’m going to use the words assessment, testing, and examinations synonymously. There is value in distinguishing them from one another; for today’s discussion it’s not necessary.

The current testing paradigm (whether it takes the form of commercial tests, state designed tests, or teacher-designed tests) has a number of features that are gradually becoming recognized as non-productive. Those who bear the major burden for this problem are the children and the teachers whose lives are increasingly influenced by these tests. We will argue here today that a whole new way of testing is needed.

The primary weak feature of the current system is that it produces scores, grades and ranks for students rather than information that could be used to plan and improve teaching and learning.

What we will present to you today is a new approach to testing. In some ways it will look similar to things that you are familiar with. Actually, it leads to completely different outcomes. It comes from a completely different outlook on schooling – one that is primarily oriented to serving and deepening the relationship of teacher and student. At the same time it will be shown to have powerful data analytic implications that can improve planning, resource allocation, and accountability.

We’ll focus on one feature of the current testing paradigm today: This is that it produces scores, grades, and ranks for students, rather than information that can be used to plan and improve teaching and learning.

To begin with, I must point out that there is something missing in the current testing paradigm – something so simple and seemingly so obvious that is hard to believe that it has been overlooked– this is the importance of having learning goals that are designed for teacher use in planning and evaluating lessons and units.

This morning I invite you to re-visit our hallowed notions of what tests, and particularly State examinations could and should be. If we approach test development anew, it is quite possible, and in fact practical, to create tests that are very different from the ones with which we now are faced— tests that have the following desirable characteristics:

  1. They will be designed primarily to help teachers and students.
  2. They will measure the higher levels of thinking that are currently considered the most difficult thing to measure.
  3. The information that these tests provide will be timely – it will turn around quickly enough so that the teacher can use it to plan instruction.
  4. They will not take an excessive amount of time away from instruction.
  5. These assessments of course must be valid and reliable.
  6. These assessments will be non-threatening; in fact they will be fun.
  7. They will be educational, that is, students will learn when they take them.
  8. They will actually motivate students to learn more about subject matter.

We hear a lot about using assessments to make teachers, students, and schools accountable. But really it is the assessments themselves that need to be made accountable to teachers and students first. This is what we are about.

But we don’t want to just talk about this new approach and its requirements. Rather, we will give you a first hand experience of what such tests might be like. I will do this just as I would do it with a group of middle or high school students. I will play teacher and you will play students…OK?

Using the following materials, a task demonstrating what such a test might be like was administered during the presentation:

Additional materials related to task development and the Cubes & Liquids task in particular are available, too:

What you have just experienced is, for all intents and purposes, a standardized test. Teachers who use these assessments are given rigorous instructions for task administration and precise indications for how to make reliable ratings of the student performance that results. In spite of the fact that you have experienced me telling a story and trading quips with you as students in my class, I never deviated from a rigorous protocol designed to reliably capture targeted pieces of information. I am careful not to say anything or to allow discussion that could divulge information that should be secure to the test.

Our experience has shown that activities of the kind you have just experienced are engaging for middle school and high school students at all levels, irrespective of academic ability or previous level of knowledge, and this includes disenchanted, alienated learners. This is in part because we are working directly with a natural phenomenon, and not with a simulation. Nature becomes the foundation and she has an inherent appeal to the human being.

A middle school student is able to engage with and complete each activity in Cubes & Liquids and yet it remains a challenging task for the graduating high school senior. This again gives an indication of the power of working with natural phenomena.

The tasks demand that students apply the concepts and skills they’ve been learning in school to a practical problem. Simple recall and restatement of concepts will not take the student very far. Furthermore the students are not told what concepts and skills they will need to apply. They must figure this out for themselves. A student taking part in a similar task once complained, “I haven’t had to think so much all year long.”

Students typically leave this experience full of curiosity. What are the substances involved? What are the cubes made of? What are the liquids? Why did they behave as they did? The students are now primed for the instruction that will follow. Did I say follow? Yes the test was given before instruction. A scarcely appreciated fact about formative assessment—it must necessarily take place before instruction!

But what exactly does this test do? What information does it provide? How is it used?

This assessment activity is designed to give teachers information about the degree to which students have attained important learning goals.

Learning goals represent the knowledge, skills and dispositions that we want our students to attain as a result of instruction.

Goals are valued ends. Over the last decade, value-driven social consensus processes at the national and state levels have led to the formulation of broad learning goals that have come to be called educational standards. Examples in Science Education include the AAAS’s Benchmarks for Science Literacy (1993), NRC’s National Science Education Standards (1996), and the New York State Education Department’s Learning Standards for Mathematics, Science and Technology.

These have become guidelines for directing the course of science education programs at all levels.

There are, however, a number of practical problems associated with using these broad value statements:

One problem in particular is that standards are not operational, that is, one typically cannot move directly from the standards to classroom practice. They are generally much too broad to be practical for instructional planning and improvement at the lesson or unit level. These standards have to be converted into goals that are written at a level of specificity that will allow teachers to use them to plan and carry out lessons and groups of lesson. When they are written at this level of specificity we call them Operational Learning Goals.

The activity we just experienced is designed to assess higher order cognitive abilities that are typically missed by conventional tests. This assessment activity is built on the foundation of such operational learning goals. These are aligned with New York State Education Department’s Learning Standards for Mathematics, Science and Technology. They are listed below along with scales that indicate the students’ level of performance with respect to that goal:

Distinguishes Observation from Inference

1. Records observations; makes no unnecessary inferences

0. Makes inferences where only observations are called for

Technical Description

2. Provides an unambiguous technical description of every event

1. Provides an unambiguous technical description of all but one event

0. Does NOT consistently provide unambiguous technical descriptions

Density of Solid Objects — Coordinates Mass and Volume of Solid Objects

2. Correctly coordinates mass and volume of solid objects

1. Attempts to coordinate mass and volume of solid objects

0. Does not coordinate mass and volume of solid objects

Density of Liquids — Coordinates Mass and Volume of Liquid

2. Correctly coordinates mass and volume of liquid

1. Attempts to coordinate mass and volume of liquid

0. Does NOT coordinate mass and volume of liquid

Uses a 2×2 Classification Scheme to Organize Relevant Factors

2. Forms a COMPLETE classification scheme including all levels of both factors

1. Forms an INCOMPLETE classification scheme including all levels of 1 factor

0. Does NOTform a scheme to classify objects

Proportional Reasoning — Coordinating Solid and Liquid Densities

2. Correctly coordinates 2 ratios

1. Attempts to coordinate ratios

0. Does NOT attempt to coordinate ratios

The assessment system provides definitions and examples for each of these levels in sufficient detail to permit reliable rating. For now what I’d like you to notice is that goals specified at this level can be used by the teacher to plan portions of lessons or lessons. Moreover, the teacher’s work of planning instruction can be supported by information about the levels of performance of students in the class. Assessment results can be displayed in a way that facilitates this kind of planning as in:

Setting theoretical and empirical foundations for assessing scientific inquiry and discovery in educational programs.” The Journal of Research in Science Teaching, 37(9), 938-962.
To obtain a printed copy of the article, please email Paul Zachos at

Zachos, Paul. 2004. “Pendulum phenomena and the assessment of scientific inquiry capabilities.”  THE PENDULUM: Scientific, Historical, Philosophical & Educational Perspectives, Part II, 13 (7 & 8).
To obtain a printed copy of the article, please email Paul Zachos at

Zachos, Paul. 2004.”Discovering the True Nature of Educational Assessment.” Research Bulletin, 9(2), 7-12. The Research Institute for Waldorf Education.


Click each image to view a larger version.
Considering a single learning goal for a single class. Considering a single learning goal at various levels of the educational enterprise over time.
Considering a single learning goal for a single class over time with detail for individual students.
Considering multiple learning goals for an entire school over time.

These images provide a readily graspable summary of student performance on the learning goals in both graphical and numerical formats.

This assessment information system is designed so that teachers can have quick turn around time for the information on student performance. In fact they can enter their ratings of student performance and have immediate analysis of the kind depicted in the graph for student, class, or any group’s performance over time.

There are a number of benefits to Operational Learning Goals. They provide:

  1. a practical basis for teachers to plan and evaluate instruction
  2. a meaningful unit for reporting results to students, parents, colleagues, school boards, etc.
  3. a unit that if properly aggregated can also assist school and school district planning, and state level planning, resource allocation, and accountability.

Formative Assessment Requires Operational Learning Goals

One hears much talk about formative assessment these days.  It refers to the us of assessment information to plan and improve instruction. But if assessment results are to be useful for improving instruction they must be available in time to plan instruction but also they must be keyed to how students are performing on learning goals. In fact in order to be practically useful for planning instruction, assessment results must inform the teacher on the degree of student attainment of learning goals at the level of instruction for lessons or units, that is, operational learning goals. Formative Assessment requires operational learning goals!

Assessing Higher Level Thinking

Another major problem in realizing standards has been in the severe limitations in the types of learning goals that are adequately assessed. Traditionally assessments have been constrained in a way that has not permitted higher level thinking to be assessed.

By requiring that students not just give us answers to questions but rather that they apply what they have learned in practical situations, we can find out if students are using higher  level thinking and have attained higher order capabilities. Recall Bloom’s taxonomy it moves from Knowledge to Comprehension at the lower levels to Application to Analysis and Synthesis at the higher (Bloom, et al., 1956).. Note that the Cubes and Liquids assessment requires that students not merely recall concepts and skills but that they apply the concepts and skills that they have learned to solve a practical problem that they are faced with. Thus the floor of this assessment has been set at the level of Application. This means that students performing successfully on this task are operating at the level of Application or higher. You can see that the reasoning concerning predictions and the thought experiments demand that students must operate at Analysis and Synthesis levels as well, in order to perform adequately on these tasks.

Testing (whether state, commercial, or teacher designed) in New York State has tended to focus on New York State Math, Science, and Technology Standard 4 in large part because it is the easiest to measure with conventional tests. This has driven many science programs to focus on this standard and to neglect the others. Note that the assessment activity that you have experienced here today is not bound by this constraint. Assessments like Cubes & Liquids solve a number of the problems that have traditionally plagued performance assessment and assessment of higher level thinking. For example:

  1. Broad standards have been converted to practical learning goals.
  2. The assessment can be administered in a conventional classroom.
  3. Student performance can be scored easily and reliably (with adequate professional development).
  4. Information can be ready in time to plan instruction.
  5. Analysis is immediate and produces summaries in usable reporting formats.
  6. The resulting information is useful for planning, evaluation, resource allocation, and accountability.

We have made the case here today that this type of assessment provides more useful information at all levels of planning, resource allocation, and accountability than conventional tests that generate, scores, ranks, percentages correct, etc.

We propose that the State adopt the concepts and techniques that we’ve presented here today, concepts such as:

  • the Operational Learning Goal (OLG),
  • pre-instruction testing,
  • quick turn around time for reports,
  • maintaining the integrity of information units (that is, not masking information about level of performance by averaging and aggregating across disparate learning goals) ,
  • and reports that are useful for planning instruction.

We have presented some foundations for building a new generation of State Science Examinations, Regents Exams, and classroom tests, exams that will help teachers improve learning and let them know how students are doing all through the year, not just when it’s too late to do anything about it. The State could provide pre- and interim-tests of targeted learning goals so that teachers could keep track of student attainment over the course of the school year. School personnel and students would know which goals needed to be focused on for which students and they would not need to spend the last 3 months of the school year preparing students for exams. One of the benefits of working with higher level skills is that security becomes much less of an issue. They are harder to copy or fake.

We propose that the state focus on core capabilities for these learning goals, as we have done. In other words, focus on concepts, skills, and dispositions that are foundational and applicable across the secondary school science curriculum. These can become vital signs for progress in educational programs. Indeed identifying and developing the core capabilities that are associated with success in math, science, and technology should be a high priority for state education agencies.

We’ve discussed how this approach could solve many of the problems associated with current Regents Examinations in our paper: A New Direction for Regents Examinations in Science Education. Three other papers are also available:

Zachos, Paul, Thomas L. Hick, William E. J. Doane, & Cynthia Sargent. 2000.
Setting theoretical and empirical foundations for assessing scientific inquiry and discovery in educational programs.” The Journal of Research in Science Teaching, 37(9), 938-962.
To obtain a printed copy of the article, please email Paul Zachos at

Zachos, Paul. 2004. “Pendulum phenomena and the assessment of scientific inquiry capabilities.” THE PENDULUM: Scientific, Historical, Philosophical & Educational Perspectives, Part II, 13 (7 & 8).
To obtain a printed copy of the article, please email Paul Zachos at

Zachos, Paul. 2004. “Discovering the True Nature of Educational Assessment.” Research Bulletin, 9(2), 7-12. The Research Institute for Waldorf Education.

Let’s take the remaining time we have available in this session to address technical questions i.e. questions that require either clarification of or more information about anything that was presented. Then we can use the panel discussion, next hour to go more deeply into the ideas and any issues that may have emerged for you.

For more information please email us at

Graphs used in the presentation [PDF]

Our online information system

Slides from the second half of the presentation [PDF]


Bloom, B.S. (Ed.), Engelhart, M.D., Furst, E.J., Hill, W.H., & Krathwohl, D.R. (1956).  Taxonomy of Educational Objectives – The Classification of Educational Goals.  New York: David McKay.