The Reliability Coefficient
It is possible to quantify the reliability of a test in the form of a reliability coefficient. Reliability coefficients are like validity coefficients. They allow us to compare the reliability of different tests. The ideal reliability coefficient is one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered. In fact the reliability coefficient that is to be sought will depend also on other considerations, most particularly the importance of the decisions that are to be taken on the basis of the test. The more important the decisions, the greater reliability we must demand: if we are to refuse someone the opportunity to study overseas because of their score on a language test, then we have to be pretty sure that their score would not have been much different if they had taken the test a day or two earlier or later.
The Standard Error of Measurement and the True Score
With a little further calculation, however, it is possible to estimate how close a person’s actual score is to what is called their true score. We are able to make statements about the probability that a candidate’s true score (the one which best represents their ability on the test) is within a certain number of points of the score they actually obtained on the test. In order to do this, we first have to calculate the standard error of measurement of the particular test. It is important to recognize how we can use the standard error of measurement to inform decisions that we take on the basis of test scores. Therefore all published tests should provide users with not only the reliability coefficient but also the standard error of measurement.
How to Make Tests More Reliable
There are two components of test reliability: the performance of candidates from occasion to occasion, and the reliability of the scoring. We will begin by suggesting ways of achieving consistent performances from candidates and then turn our attention to scorer reliability. And now, what should we do to make tests more reliable? There are some ways that we should do: (1) take enough samples of behavior, (2) do not allow candidates too much freedom, (3) write unambiguous items, (4) provide clear and explicit instructions, (5) ensure that tests are well laid out and perfectly legible, (6) candidates should be familiar with format and testing techniques, (7) provide uniform and non-distracting conditions of administration, (8) use items that permit scoring which is as objective as possible, (9) make comparisons between candidates as direct as possible, (10) provide a detailed scoring key, (11) train scorers, (12) agree acceptable responses and appropriate scores at outset of scoring, (13) identify candidates by number, not name, (14) employ multiple, independent scoring.
Reliability and Validity
To be valid a test must provide consistently accurate measurements. It must therefore be reliable. A reliable test, however, may not be valid at all. In our efforts to make tests reliable, we must be wary of reducing their validity.
It is possible to quantify the reliability of a test in the form of a reliability coefficient. Reliability coefficients are like validity coefficients. They allow us to compare the reliability of different tests. The ideal reliability coefficient is one which would give precisely the same results for a particular set of candidates regardless of when it happened to be administered. In fact the reliability coefficient that is to be sought will depend also on other considerations, most particularly the importance of the decisions that are to be taken on the basis of the test. The more important the decisions, the greater reliability we must demand: if we are to refuse someone the opportunity to study overseas because of their score on a language test, then we have to be pretty sure that their score would not have been much different if they had taken the test a day or two earlier or later.
The Standard Error of Measurement and the True Score
With a little further calculation, however, it is possible to estimate how close a person’s actual score is to what is called their true score. We are able to make statements about the probability that a candidate’s true score (the one which best represents their ability on the test) is within a certain number of points of the score they actually obtained on the test. In order to do this, we first have to calculate the standard error of measurement of the particular test. It is important to recognize how we can use the standard error of measurement to inform decisions that we take on the basis of test scores. Therefore all published tests should provide users with not only the reliability coefficient but also the standard error of measurement.
How to Make Tests More Reliable
There are two components of test reliability: the performance of candidates from occasion to occasion, and the reliability of the scoring. We will begin by suggesting ways of achieving consistent performances from candidates and then turn our attention to scorer reliability. And now, what should we do to make tests more reliable? There are some ways that we should do: (1) take enough samples of behavior, (2) do not allow candidates too much freedom, (3) write unambiguous items, (4) provide clear and explicit instructions, (5) ensure that tests are well laid out and perfectly legible, (6) candidates should be familiar with format and testing techniques, (7) provide uniform and non-distracting conditions of administration, (8) use items that permit scoring which is as objective as possible, (9) make comparisons between candidates as direct as possible, (10) provide a detailed scoring key, (11) train scorers, (12) agree acceptable responses and appropriate scores at outset of scoring, (13) identify candidates by number, not name, (14) employ multiple, independent scoring.
Reliability and Validity
To be valid a test must provide consistently accurate measurements. It must therefore be reliable. A reliable test, however, may not be valid at all. In our efforts to make tests reliable, we must be wary of reducing their validity.
Post a Comment