Reliability and Item Analysis Introductory Overview - General Introduction

In many areas of research, the precise measurement of hypothesized processes or variables (theoretical constructs) poses a challenge by itself. For example, in psychology, the precise measurement of personality variables or attitudes is usually a necessary first step before any theories of personality or attitudes can be considered. In general, in all social sciences, unreliable measurements of people's beliefs or intentions will obviously hamper efforts to predict their behavior. The issue of precision of measurement will also come up in applied research, whenever variables are difficult to observe. For example, reliable measurement of employee performance is usually a difficult task; yet, it is obviously a necessary precursor to any performance-based compensation system.

In all of these cases, Reliability & Item Analysis can be used to construct reliable measurement scales, to improve existing scales, and to evaluate the reliability of scales already in use. Specifically, Reliability & Item Analysis will aid in the design and evaluation of sum scales, that is, scales that are made up of multiple individual measurements (e.g., different items, repeated measurements, different measurement devices, etc.). The program will compute numerous statistics that allow the user to build and evaluate scales following the so-called classical testing theory model.

The assessment of scale reliability is based on the correlations between the individual items or measurements that make up the scale, relative to the variances of the items. If you are not familiar with the correlation coefficient or the variance statistic, we recommend that you review the respective discussions provided in the Introductory Overview of Basic Statistics.

The classical testing theory model of scale construction has a long history, and there are many textbooks available on the subject. For additional detailed discussions, you may refer to, for example, Carmines and Zeller (1980), De Gruijter and Van Der Kamp (1976), Kline (1979, 1986), or Thorndyke and Hagen (1977). A widely acclaimed "classic" in this area, with an emphasis on psychological and educational testing, is Nunnally (1970).

Testing hypotheses about relationships between items and tests. Note that STATISTICA includes the general linear structural equation modeling procedure SEPATH. In that module, you can test specific hypotheses about the relationship between sets of items or different tests (e.g., test whether two sets of items measure the same construct, analyze multi-trait, multi-method matrices, etc.). For additional details and illustrations, refer to the Examples section of SEPATH.