DATA Step Options¶

DATA step options provide a set of tools to interact with input and output datasets. They are declared in parentheses after the filenames in data or set statements:

data library.out_filename ( /* data step options */ );
    set library.in_filename ( /* data step options */ );
        /* data step statements */
run;

If a data set option follows the set statement, the options apply to the data being read in. If it follows the data statement, SAS applies the action to the output dataset. The general order of operations is:

SAS applies data step options specified in the set statement to the input data
SAS evaluates the programming statements in the DATA step
SAS applies data step options specified in the data statement to the output data

Last thing to keep in mind is that SAS always evaluates keep/drop options before rename options. This means you need to use the original variable names in the keep/drop options when using a rename option in the same statement.

Important `DATA` Step Options¶

Below we summarize some of the most important data step options and give examples of their use. You can find a comprehensive list here.

`KEEP` and `DROP` Options¶

The keep data step option allows us to only keep certain variables from the input or output dataset. Similarly, the drop data step option allows us to drop variables from the input or output dataset.

To use the keep/drop options, write keep = or drop = followed by the list of variables you would like to keep or drop.

Example¶

Below we drop the kid_married_2015 variable from the in.fake_micro dataset using the drop option. Then we create par_inc_2000 in the data step and only keep pik, mom_pik, dad_pik, and par_inc_2000 in the out.par_income output dataset using the keep option.

data out.par_income(keep = pik mom_pik dad_pik par_inc_2000);
    set in.fake_micro(drop = kid_married_2015); 
        par_inc_2000 = sum(mom_inc_2000, dad_inc_2000); 
run;

Here is the in.fake_micro data:

	pik	mom_pik	dad_pik	kid_married_2015	mom_inc_2000	dad_inc_2000
0	141	173	NaN	1.0	16474.893	NaN
1	138	149	NaN	0.0	21689.520	37348.6450
2	177	NaN	003	0.0	11420.902	14666.4970
3	146	013	004	0.0	NaN	34485.8130
4	104	187	005	1.0	52476.680	6003.8027
5	144	NaN	006	1.0	13744.854	20464.3630
6	083	025	007	0.0	22038.420	17941.8010
7	115	NaN	008	1.0	16084.973	6875.2935
8	170	157	009	1.0	17766.367	31663.4510
9	118	NaN	010	1.0	29356.543	NaN

And the out.par_income dataset:

	pik	mom_pik	dad_pik	par_inc_2000
0	141	173	NaN	16474.8930
1	138	149	NaN	59038.1650
2	177	NaN	003	26087.3990
3	146	013	004	34485.8130
4	104	187	005	58480.4827
5	144	NaN	006	34209.2170
6	083	025	007	39980.2210
7	115	NaN	008	22960.2665
8	170	157	009	49429.8180
9	118	NaN	010	29356.5430

`RENAME` Option¶

The rename data step option allows us to change variable names from the input or output datasets. If you use rename in the data statement, the variable name is changed in the output data set. If you use rename in the set statement, the variable name is changed in the input data set and the new name is used for all the subsequent DATA step commands.

To use the rename option to rename old_name to new_name we would write rename = (old_name = new_name).

Don’t forget that the rename option is evaluated after keep/drop options!

Example¶

In the example below, we load in the pik variable from in.fake_micro and immediately rename it to kid_pik — this is the name that should be used in subsequent data step commands. Then we restrict the data set to rows where the last digit of pik is 4 using substr and an if statement (more on these later). Finally, we only keep the kid_pik variable and rename kid_pik to pik. Note that we reference kid_pik in the keep statement, this is because keep is evaluated before we rename kid_pik to pik.

data out.kid_piks(keep = kid_pik rename = (kid_pik = pik));
    set in.fake_micro(keep = pik rename = (pik = kid_pik));
    
        * Keep PIKs that end in a 4;
        if substr(kid_pik, 3, 1) = "4"; 
        
run;

Here is the input data in.fake_micro:

	pik	mom_pik	dad_pik	kid_married_2015	mom_inc_2000	dad_inc_2000
0	141	173	NaN	1.0	16474.893	NaN
1	138	149	NaN	0.0	21689.520	37348.6450
2	177	NaN	003	0.0	11420.902	14666.4970
3	146	013	004	0.0	NaN	34485.8130
4	104	187	005	1.0	52476.680	6003.8027
5	144	NaN	006	1.0	13744.854	20464.3630
6	083	025	007	0.0	22038.420	17941.8010
7	115	NaN	008	1.0	16084.973	6875.2935
8	170	157	009	1.0	17766.367	31663.4510
9	118	NaN	010	1.0	29356.543	NaN

And the output data out.kid_piks:

	pik
0	104
1	144
2	054
3	154
4	064
5	194
6	074
7	084
8	014
9	124

`WHERE` Option¶

The where option lets us select observations from the input or output data sets by applying conditions to the data. This is similar to using a use if in Stata:

use file_name if x > 100, clear

To apply a condition using a where option we would write where = (condition).

Example¶

Recall that in the example above we used an if statement to restrict to rows where kid_pik ended in a 4. We make the same restriction below using a where option in the set statement.

Note

The where option is evaluated after the rename statement, so we use the name kid_pik in the where option.

data out.kid_piks(keep = kid_pik rename = (kid_pik = pik));
    set in.fake_micro(keep = pik rename = (pik = kid_pik) where = (substr(kid_pik, 3, 1) = "4"));
run;

Here is the input data in.fake_micro:

	pik	mom_pik	dad_pik	kid_married_2015	mom_inc_2000	dad_inc_2000
0	141	173	NaN	1.0	16474.893	NaN
1	138	149	NaN	0.0	21689.520	37348.6450
2	177	NaN	003	0.0	11420.902	14666.4970
3	146	013	004	0.0	NaN	34485.8130
4	104	187	005	1.0	52476.680	6003.8027
5	144	NaN	006	1.0	13744.854	20464.3630
6	083	025	007	0.0	22038.420	17941.8010
7	115	NaN	008	1.0	16084.973	6875.2935
8	170	157	009	1.0	17766.367	31663.4510
9	118	NaN	010	1.0	29356.543	NaN

Here is the output data out.kid_piks:

	pik
0	104
1	144
2	054
3	154
4	064
5	194
6	074
7	084
8	014
9	124

Introduction to SAS for Incoming Predocs

DATA Step Options¶

Important DATA Step Options¶

KEEP and DROP Options¶

Example¶

RENAME Option¶

Example¶

WHERE Option¶

Example¶

Important `DATA` Step Options¶

`KEEP` and `DROP` Options¶

`RENAME` Option¶

`WHERE` Option¶