Evaluating Assessment Activities

Evaluating an Assessment Activity


An assessment is valid if it faithfully measures attainment of valued learning goals. These goals are drawn from multiple sources including:

  • State standards documents
  • National standards documents
  • Teacher goals
  • Student goals

Learning goals should:

  • Be highly valued
  • Target core capabilities
  • Target higher level capabilities (e.g., the higher levels of Bloom’s taxonomy)
  • Be useful to teachers
  • Be usable by teachers


Reliability is a critical tool for educational planning, assessment, evaluation, and action research. It answers the question, “How dependable are our measures of student performance?” In our assessments, reliability means that those who have received professional preparation will make essentially the same judgements concerning the extent to which students have attained leaning goals. Reliability is a critical foundation for professional communication, public presentation, and sound decision-making.

One can often obtain high measures of reliability with more superficial concepts and skills. We chose to develop rigorous processes for assessing higher order abilities and to engage in the struggle to make judgements reliable.

In our assessments, reliability is built and maintained in two ways:

  1. by obtaining measures of attributes (for example, several pieces of student performance are used to rate most of our learning goals, and repeated assessments over time are recommended)
  2. through support systems including professional development and detailed instruction for making judgements and ratings which include definitions, rules for making judgements, and actual examples of student work.


The issue of fairness arises when test results are used for non-educational purposes (see Zachos, 2004 for a discussion of this). That is, when test results are used for grading, ranking, screening, promoting, and retaining students. It is recommended that the portions of our assessments that deal with educational attainments (i.e., the extent to which students have attained learning goals) not be used for such purposes.

The assessments are designed for use by teachers to plan and evaluate instruction. When this is the case, the test is used strictly to obtain information to better understand and help each student, and so the question of fairness need not arise.

Our assessments may include grading features that we believe are not educationally destructive. For example the teacher may rate students on the extent to which they:

  • Complete all portions of the assessment
  • Write in complete sentences when requested too
  • Write clearly and neatly

and other desirable features of student performance that do not represent or interfere with the attainment of learning goals.

Assessment results can also be used by schools, LEAs, and State Agencies for planning, evaluation, and resource allocation without damaging effects on students, and so also do not raise the issue of fairness.


In the domain of assessment, efficiency means obtaining the best possible information for the time and effort invested. An assessment that is educational (i.e., students learn from experiencing it) provides instructional efficiency by virtue of the fact that the assessment activity itself promotes student learning and engagement.


Zachos, Paul. (2004).
Discovering the True Nature of Educational Assessment. Research Bulletin 9(2). The Research Institute for Waldorf Education.