I discussed here some weird things that SPSS does with regard to weighting. Here’s another weird thing, this time in Stata:
The variable Q1 has a minimum of 0 and a maximum of 99,999. For this particular survey question, 99,999 is not a believable response; so, instead of letting 99,999 and other unbelievable responses influence the results, I truncated Q1 at 100, so that all responses above 100 equaled 100. There are other ways of handling unbelievable responses, but this can work as a first pass to assess whether the unbelievable responses influenced results.
The command replace Q1trunc = 100 if Q1 > 100 tells Stata to replace all responses over 100 with a response of 100; but notice that this replacement increased the number of observations from 2008 to 2065; that’s because Stata treated the 57 missing values as positive infinity and replaced these 57 missing values with 100.
Here’s a line from Stata’s help missing documentation:
all nonmissing numbers < . < .a < .b < … < .z
Stata has a reason for treating missing values as positive infinity, as explained here. But — unless users are told of this — it is not obvious that Stata treats missing values as positive infinity, so this appears to be a source of potential error for code with a > sign and missing values.
Here’s how to recode the command so that missing values remains missing: replace Q1trunc = 100 if Q1 > 100 & if Q1 < .