PING
0.9
Statistical data handling and processing in production environment
|
Create dummy variables in a dataset, i.e. variables with labels used to describe membership in a category with binary coding.
idsn
: a dataset reference;var
: list of variables to be "dummied";prefix
: (option) prefix(s) used to create the names of dummy variables; you can give one or more strings, in an order corresponding to the var
variables; note that prefix=_VARNAME_
, which will use the name of the corresponding variable followed by an underscore, or prefix=_BLANK_
, which will make the prefix a null string (similar to specifying a null string in the macro argument) are also accepted; default: prefix=D_
;name
: (option) if name=_VAL_
, the dummy variables are named by appending the value of the var
variables to the prefix, otherwise, the dummy variables are named by appending numbers, 1, 2, ... to the prefix; note that the resulting name must be 8 characters or less.; default: name=_VAL_
;base
:(option) indicates the level of the baseline category, which is given values of 0 on all the dummy variables; you can give one or more strings, in an order corresponding to the var
variables; parameters base=_FIRST_
or base=_LOW_
specify that the lowest value of the VAR= variable is the baseline group; base=_LAST_
or base=_HIGH_
specify the highest value of the variable; otherwise, you can specify base=<value>
to make a different value the baseline group; for a character variable, you must enclose the value in quotes, e.g., base=
'M'; default: base=_LAST_
;format
: (option) user formats may be used for two purposes:var
list.fullrank
: (option) boolean flag (yes/no
), set to yes
to indicate that the indicator for the base
category is eliminated; default: fullrank=yes
;ilib
: (option) name of the input library; by default: empty, i.e. WORK
is used;odsn
: (option) name of the output dataset; if not specified, the new variables are appended to the input dataset idsn
;olib
: (option) name of the output library; by default: empty, i.e. WORK
is also used.With the input data set:
y | group | sex |
---|---|---|
10 | A | M |
12 | A | F |
13 | A | M |
18 | B | M |
19 | B | M |
16 | C | F |
21 | C | M |
19 | C | F |
the macro statement:
produces two new variables, D_A
and D_B
in the table test
:
y | group | sex | D_A | D_B |
---|---|---|---|---|
10 | A | M | 1 | 0 |
12 | A | F | 1 | 0 |
13 | A | M | 1 | 0 |
18 | B | M | 0 | 1 |
19 | B | M | 0 | 1 |
16 | C | F | 0 | 0 |
21 | C | M | 0 | 0 |
19 | C | F | 0 | 0 |
since group C
is the baseline category (corresponding to base=_LAST_
). With the input dataset:
produces a dummy for sex
named FEMALE, and two dummies for group
:
y | group | sex | FEMALE | GROUP_A | GROUP_B |
---|---|---|---|---|---|
10 | A | M | 0 | 1 | 0 |
12 | A | F | 1 | 1 | 0 |
13 | A | M | 0 | 1 | 0 |
18 | B | M | 0 | 1 | 1 |
19 | B | M | 0 | 1 | 1 |
16 | C | F | 1 | 1 | 0 |
21 | C | M | 0 | 1 | 0 |
19 | C | F | 1 | 1 | 0 |
%var_dummy
is a wrapper to M. Friendly's original %dummy
macro. Original source code (no license, no disclaimer) is available at http://www.medicine.mcgill.ca/epidemiology/joseph/pbelisle/multitranspose.html. See resources available at DataVis.ca.Given a character or discrete numerical variable, the %var_dummy
macro creates dummy (0/1) variables to represent the levels of the original variable. If the original variable has c
levels, then (c-1)
new variables are produced (or c
variables, if fullrank=yes
).
When the original variable is missing, all dummy variables will be missing (V7+ only).
http://www.math.yorku.ca/SCS/sasmac/dummy.html