Stata treats missing values as positive infinity

I discussed here some weird things that SPSS does with regard to weighting. Here’s another weird thing, this time in Stata:

StataQ1truncThe variable Q1 has a minimum of 0 and a maximum of 99,999. For this particular survey question, 99,999 is not a believable response; so, instead of letting 99,999 and other unbelievable responses influence the results, I truncated Q1 at 100, so that all responses above 100 equaled 100. There are other ways of handling unbelievable responses, but this can work as a first pass to assess whether the unbelievable responses influenced results.

The command replace Q1trunc = 100 if Q1 > 100 tells Stata to replace all responses over 100 with a response of 100; but notice that this replacement increased the number of observations from 2008 to 2065; that’s because Stata  treated the 57 missing values as positive infinity and replaced these 57 missing values with 100.

Here’s a line from Stata’s help missing documentation:

all nonmissing numbers < . < .a < .b < … < .z

 

Stata has a reason for treating missing values as positive infinity, as explained here. But — unless users are told of this — it is not obvious that Stata treats missing values as positive infinity, so this appears to be a source of potential error for code with a > sign and missing values.

Here’s how to recode the command so that missing values remains missing: replace Q1trunc = 100 if Q1 > 100 & if Q1 < .

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s