Once a marketing researcher has a clear understanding of what he or she wishes to understand in a group of target respondents, the concepts of scaling and measurement should be considered.
These concepts are very important in developing questionnaires or instruments of measurement that will fulfill the research objectives in the most accurate manner.
The term scaling refers to procedures for attempting to determine quantitative measures of subjective and sometimes abstract concepts.
Accurate measurement of constructs is essential in making decisions, and this article addresses the importance of measuring customers’ attitudes and behaviors, and other marketplace phenomena.
The measurement process is described and the central decision rules needed for developing scale measurements.
The focus is on basic measurement issues, construct development, and scale measurements.
Note that construct development is not the same as construct measurement. For true construct measurement to take place, the researcher needs to understand scale measurements as well as the interrelationships between what is being measured and how to measure it.
This article discusses four scales of measurement: nominal, ordinal, interval and ratio. and describes both comparative and non-comparative scaling techniques in detail.
The considerations involved in implementing scaling techniques when researching international markets are discussed.
Often, a researcher is interested in testing whether a construct developed in one country holds in another country, or examining similarities and differences in constructs across countries or geographic areas.
Typically, an operational measure of the construct has already been developed in the base country and the task facing the researcher is to see whether the construct can be meaningfully measured in the same way elsewhere.
Procedures for developing a scale to measure an underlying construct in a single country are relatively straightforward.
Developing a scale in a multi-country environment is considerably more complex and challenging, and presents the researcher with two intertwined issues.
The fundamental question is whether the same construct exists in different countries. A particular construct identified in one country may not exist in another country or may not be expressed in the same terms.
The measurement process
Most questions in marketing research surveys are designed to measure attitudes. What management really wants to understand – and ultimately influence – is behaviour.
For many reasons, however, they are likely to use attitude measures instead of behaviour measures.
First, there is a widely held belief that attitudes are precursors of behaviour. If consumers like a brand, they are more likely to choose that brand over one they like less.
Second, it is generally more feasible to ask attitude questions than to observe and interpret actual behaviour.
Attitude measures offer a greater advantage over behaviour measures in their capacity for diagnosis or explanation.
Measurement is defined as the process of assigning numbers or other symbols to certain characteristics of the object of interest, according to some prespecified rules. How this is done is strongly influenced by the sort of information that is being sought.
Usually, numbers are assigned because of the ease of application in mathematical and statistical analyses.
Certain rules should be followed when assigning numbers for measurement. There should be a one-to-one correspondence between the number and the characteristic, and this assignment should be constant over time.
Scaling is the process of creating a continuum on which objects are located according to the amount of the measured characteristics they possess.
Critical to collecting primary data is the development of well-constructed measurement procedures.
It is important to realize that measurement consists of two different development processes, which can be labelled construct development and scale measurement.
To achieve the overall goal of obtaining high-quality data, researchers must understand what they are attempting to measure before developing the appropriate scale measurement.
The goal of construct development is to identify and define what is to be measured, including any dimensionality traits.
In turn, the goal of scale measurement is to determine how to measure each construct precisely.
Precise definition of marketing constructs begins with defining the purpose of the study and providing clear expressions of the research problem.
Without a clear initial understanding of the research problem before the study begins, the researcher can end up collecting irrelevant or low-quality data.
Construct development can be viewed as an integrative process in which researchers focus their efforts on identifying the subjective properties for which data should be collected for solving the defined research problem.
Identification of which properties should be investigated requires knowledge and understanding of constructs and their dimensionality, validity and operationalization.
At the heart of construct development is the need to determine exactly what is to be measured.
After the decision-maker and researcher establish what objects are relevant to the redefined research problems, the next step is to identify the pertinent objective and subjective properties of each object of concern.
In cases where data are needed for insight into the composition of an object, the research focus is limited to measuring the object’s objective properties.
In contrast, when data are needed to help understand an object’s subjective properties, then the researcher must identify sets of measurable subcomponents that can be used to clarify the abstractness associated with the object’s subjective properties.
In determining what is to be measured, researchers must keep in mind the need to acquire relevant, high-quality data, data structures, and information to support management’s decisions.
Market researchers can use a variety of qualitative data collection methods among a few customers to develop preliminary insights into the set of identifiable and measurable components associated with an abstract construct.
Construct operation alization is a process whereby the researcher explains a construct’s meaning in measurement terms by specifying the activities or operations necessary to measure it.
The process focuses on the design and use of questions and scale measurements to gather the data structures needed.
Since many constructs, such as customer satisfaction, preferences, emotions, quality images, and brand loyalty, cannot be directly observed or measured, the researcher attempts to indirectly measure them through operationalization of their components.
Basic scales of measurement
Scaling is the process of creating a continuum on which objects are located according to the amount of the measured characteristics they possess.
An illustration of a scale that is often used in research is the dichotomous scale for sex. The object with male (or female) characteristics is assigned the number 1 and the object with the opposite characteristics is assigned the number 0.
This scale meets the requirements of the measurement process in that the assignment is one to one and it does not vary with respect to time and object. There are four basic scales: nominal, ordinal, interval and ratio.
Marketing researchers use nominal scales to identify characteristics of their test subjects. These can be gender, social class, race, religion, habits, traits, and physical location.
The categories created by nominal scales must include every test subject or product in only one category for a particular characteristic.
In international marketing research, nominal measures are the simplest type of measures and pose the least burden on the respondent.
They are appropriate for illiterate respondents or those with low level of education. The respondent simply has to decide whether or not the characteristic or category applies.
Such measures do, however, require that the definition of a category is unambiguous and familiar to the respondent.
An ordinal scale is obtained by ranking objects or by arranging them in order with regard to some common variable.
The question is simply whether each object has more or less of this variable than some other object. The scale provides information as to how much difference there is between the objects.
Because the amount of difference between objects is not known, the permissible arithmetic operations are limited to statistics such as the median or mode (but not the mean).
Marketers use ordinal scales to gather a variety of information, such as consumer taste preferences and comparisons involving pricing, packaging, promotion, quality, and performance rankings.
In international marketing research, the most direct way to collect ordinal data is to ask respondents to order objects in relation to some attribute.
For well-educated respondents this is a relatively simple and straightforward task, but when the research is conducted among less literate populations, physical stimuli may be needed.
In an interval scale the numbers used to rank the objects also represent equal increments of the attribute being measured. This means that differences can be compared.
Interval scales have the same characteristics as ordinal scales except that they can show relative differences in rankings.
For example, ordinal scales do not assume that the distance between one and two is equal to the distance between three and four.
With interval scales, these distances are assumed to be the same. Furthermore, in interval scales the distance between one and three is assumed to be equal to the distance between two and four.
Temperature scales, such as thermostats and thermometers, are interval scales. Although interval scaling uses equal intervals between successive ranks, there either is no fixed zero point or the zero point is arbitrary (on a celsius scale 0°C is where water freezes and 100 °C where it boils).
Interval scale distances are sensitive to scale transformations. For example, 10 °C on a centigrade (or Fahrenheit scale) is not twice as hot as 5°C.
Also, the equation 10° = 2*5° does not make any sense. Even transforming from one temperature scale to another causes problems: the distance between 5°C and 10 °C is 5°C. However. once 5°C and 10 °C are converted to Fahrenheit – 41 and 50 – the difference is 9°F.
A ratio scale is a special kind of interval scale that has a meaningful zero point. With such a scale of weight, market share, or euros in savings accounts for example, it is possible to say how many times greater or smaller one object is than another.
This is the only type of scale that permits comparisons of absolute magnitude. For example, if a company’s sales were €1 million in 2002 and €2 million in 2003, we can conclude that sales doubled in one year.
Without an absolute zero point, we cannot draw this conclusion. Ratio scales are the most commonly used scales in business.
The Thurstone scale was the first formal technique for measuring an attitude. It was developed by L. L. Thurstone in 1928, as a means of measuring attitudes towards religion.
It is made up of statements about a particular issue, and each statement has a numerical value indicating how favourable or unfavourable it is judged to be.
People mark each of the statements to which they agree, and a mean score is computed, indicating their attitude.
Researchers use two categories to measure people’s attitudes: comparative and noncomparative rating scales.
In comparative rating scales, respondents compare one characteristic or attribute against a specified standard, according to some predetermined criterion.
Since the standard of comparison is specified, researchers have a reference point. For example, a respondent might be required to compare one brand of cereal against the other brands that they consider when making a purchase in a supermarket.
Results have to be interpreted in relative terms and have ordinal or rank-order properties. The scores obtained indicate that one brand is preferred to another, but not by how much.
The main benefit of comparative scaling is that small differences between stimulus objects can be detected.
As they compare the stimulus objects, respondents are forced to choose between them. In addition, respondents approach the task from the same known reference points.
Consequently, comparative scales are easily understood and can be applied easily.
In non-comparative scales, also referred to as monadic or metric scales, each object is scaled independently of the others in the stimulus set.
In continuous ratings (or graphic ratings) respondents indicate their responses on a continuum.
Between the continuum’s extreme points are responses that represent a gradual progression toward the extremes.
Respondents place a mark at a location on the continuum that reflects their response to the question.
If a blank line is used, the researcher applies a numerical scale after respondents complete the survey.
Either way, the researcher segments all responses into a usable number of groups and then analyzes the information as interval data.
Although continuous rating scales are easy to apply, marketing researchers seldom use them because they are not very reliable. This is because there are usually no standard responses.
These scales also cause problems for international researchers. Less-educated respondents have difficulty conceptualizing a continuous scale with equally divided intervals. Hence, the researcher must allow considerable time to explain the scale.
Itemized rating scales resemble graphic ratings, except that respondents select from a finite number of choices rather than from the theoretically infinite number on a continuum.
Each choice has a number or descriptor associated with it. For instance, respondents may be asked to respond to the statement “When I visit Tesco, it is a pleasant experience” by selecting from the following choices: “strongly agree,” “agree,” “neutral,” “disagree,” and “strongly disagree.”
The strengths of these scales are that respondents can complete each question in a relatively short time and researchers can easily analyze the responses, because quantitative scores can be assigned to each response.
Named after Rensis Likert, this scale is a widely used rating scale that requires the respondents to indicate a degree of agreement or disagreement with each of a series of statements about the stimulus objects (Likert, 1932).
Typically, each scale item has five response categories, ranging from “strongly disagree” to “strongly agree”.
Responses may be analyzed either individually or on a total (“summated”) basis by adding across items.
If the summated basis is used, the scoring must remain consistent throughout the survey. For instance, all favourable responses would be represented by high scores, and all unfavourable responses would be represented by low scores.
When negative statements are included, the scores must be adjusted to maintain the pattern of high scores representing favourable and low scores unfavourable responses.
A balanced scale has the same number of positive and negative categories; a nonbalanced scale is weighted towards one end or the other.
If the researcher expects a wider range of opinions, then a balanced scale probably is in order.
If research has determined that most opinions are positive, then the scale should contain more positive gradients than negative.
This would enable the researcher to ascertain the degree of positiveness toward the concept being researched.
For instance, if the first statement read “My bank does not provide excellent customer service,” then the scale would be reversed so that a “strongly disagree” score was assigned “+5.”
“Strongly disagree” is a positive response in this case, so reversing the scale maintains the pattern of +5 representing the most favourable response.
The plus signs used in the responses show the direction (positive versus negative). Minus signs could have just as effectively been used, where “-2″ indicated “strongly disagree,” “0″ indicated “neutrality,” and “+2″ indicates “strongly agree.”
The semantic differential scale
The semantic differential scale is a specialized scaled-response question format that sprang from the problem of translating a person’s qualitative judgements into quantitative estimates.
Like the modified Likert scale, this one has been borrowed from another area of research, namely the work of Charles Osgood in semantics.
This scale contains a series of bipolar adjectives for the various properties of the object under study, and respondents indicate their impressions of each property by indicating locations along its continuum.
The focus of the semantic differential is on the measurement of the meaning of an object, concept, or person.
Because many marketing stimuli have meaning, mental associations, or connotations, this type of scale works very well when the marketing researcher is attempting to determine brand, store or other images.
The construction of a semantic differential scale begins with the determination of a concept or object to be rated.
The researcher then selects bipolar pairs of words or phrases that could be used to describe the object’s salient properties.
Depending on the object, some examples might be “friendly-unfriendly,” “hot-cold,” “convenient-inconvenient,” “high quality-low quality” and “dependable-undependable.”
Suppose a soft drinks maker wanted to find out what consumers thought about its products. Figure 6.4 shows a sample of pairs that respondents could use to evaluate a drink.
The opposites are positioned at the endpoints of a continuum of intensity, and it is customary, although not mandatory, to use seven separators between each point.
The respondent then indicates their evaluation of the performance of the object, such as a brand, by marking the appropriate line.
The closer the respondent marks to an endpoint on a line, the more intense is his or her evaluation of the object being measured.
The Stapel scale mirrors the semantic differential scale, but instead of using two dichotomous descriptive words or phrases as choices, only one word or phrase is used. This makes the task easier for both the rating developer and the respondent to use.
Furthermore, although points are not assigned numbers in a semantic differential scale, they are assigned numbers in a Stapel scale, typically using a ten-point scale.
Categories may be assigned a range of +5 to -5 (see Figure 6.5). The downside of the Stapel scale is the potential biasing of the respondent by the word choice of the categories.
Comparative rating scales
Comparative rating scales allow respondents to make comparisons according to some predetermined criterion, such as importance of or preference for something. Four common comparative scales are paired comparisons, rank order, constant-sum and Q-sort.
The respondent is presented with two objects at a time and is required to indicate a preference for one of the two according to some stated criterion.
The method yields ordinal scaled data, for example, brand A is better than brand B, or, brand A is cleaner than brand B.
It is often applied in cases where the objects are physical products. One important point about data obtained through paired comparisons is that the ordinal data can be readily converted into interval-scaled data.
Where n indicates the number of individual items being compared. This requisite number demonstrates a shortcoming of the paired-comparison technique.
When several comparisons are required, the technique becomes less effective and less accurate due to respondent fatigue.
For example, with seven brands, twenty-one comparisons are necessary. Another concern among researchers is that when comparisons are made, the order of the items or questions can bias the outcome.
Rank-order scales require respondents to arrange a set of objects with regard to a common criterion: advertisements in terms of interest, product features in terms of importance, or new-product concepts with regard to willingness to buy in the future.
The result is an ordinal scale with the inherent limitation of weak scale properties. Ranking is widely used in surveys, however, because it corresponds to the choice process occurring in a shopping environment where a buyer makes direct comparisons among competing products (brands. flavours, product variations, and so on).
Rank-order scales are not without problems. Ranking scales are more difficult than rating scales because they involve comparisons, and hence require more attention and mental effort.
The ranking technique may force respondents to make choices they might not otherwise make, which raises the issue of whether the researcher is measuring a real relationship or one that is artificially contrived.
Due to the difficulties of rating, respondents usually cannot meaningfully rank more than five or six objects.
The problem is not with the ranking of the first and last objects but with those in the undifferentiated middle.
When there are several objects, one solution is to break the ranking task into two stages. With nine objects, for example, the first stage would be to rank the objects into classes: top three, middle three, and bottom three. The next stage would be to rank the three objects within each class.
Respondents are asked to allocate a number of points, say, 100, among objects according to some criterion, for example, preference or importance.
They are instructed to allocate the points such that if they like brand A twice as much as brand B, they should assign twice as many points to brand A.
The advantages of this scaling technique are that it does not require a large number of individual comparisons, as paired comparisons can, and the point system indicates strengths of preferences.
Nevertheless, as with previous techniques, the options should be limited to a manageable number.
The total points must add up to 100 (or any other predetermined amount), so too many choices may cause problems, because some respondents will invariably assign points that add up to more or less than 100.
Another problem is the requirement that respondents allocate points in a way that indicates their relative preference among items.
This requires that respondents understand proportion, which is not always a realistic expectation.
A final shortcoming is that it has not been definitely established that the data produced uses an interval scale.
This form of scaling has not been thoroughly tested, so most marketers use it with caution.
When the number of objects or characteristics that are to be rated is very large, it becomes tedious for the respondent to rank order or do a pairwise comparison, and problems and biases creep into the study.
To deal with such a situation, the Q-sort scaling process is used. With this technique, respondents are asked to sort the various characteristics or objects that are being compared into groups, such that the distribution of the number of objects or characteristics in each group follows a normal distribution.
A relatively large number of groups or piles should be used to increase the reliability or precision of the results.
For instance, respondents are given one hundred attitude statements on individual cards and asked to place them into eleven piles, ranging from “most highly agree with” to “least highly agree with.” The number of objects to be sorted should neither be less than sixty nor more than one-hundred-and-forty; a reasonable range is sixty to ninety objects. The number of objects to be placed in each pile is pre-specified, often resulting in a roughly normal distribution of objects over the whole set.
What is the difference between ordinal and interval scales? How are they similar?
What is the difference between interval and ratio scales? How are they similar?
How is constant-sum scale different from other comparative rating scales?
What factors do researchers consider when deciding whether to use an even or an odd number of choices on a measurement scale?
What are the arguments for and against the inclusion of a neutral response position in a symmetric scale?
Can random error be avoided? If so, how? If not, why not?
Keywords: marketing researcher, nominal, ordinal, interval, ratio, Measurement, data, Scaling, Marketers, ratings, Likert scale, Itemized rating, continuous ratings, graphic ratings, semantic differential scale, Stapel scale, ranking, Constant-sum scales, Q-sort scaling,