PING
0.9
Statistical data handling and processing in production environment
|
!!! TODO !!!
If a manual task can be automated, it should be automated. That applies to creating standard SAS macros and/or R functions, and using common/conventional standards to save yourself work just (as much as it applies to creating test/example/check utilities to save your colleagues work).
We provide hereby some basic guidelines for developing your own code to be integrated into PING
. We desire a consistent look and feel for the SAS code being developed and integrated onto PING
platform, and need faster program development cycles. For that purppose, the adoption of common coding guidelines ensures the consistency of the programs developed herein.
In their current state, the guidelines are based on common sense, often derived from existing litterature, or practical uses that may seem arbitrary. Improvements are still welcome/possible.
Some generic principles need to be taken into account into our developments:
For that purpose, we provide hereby some practical recommendations, following also some guidelines found in the litterature (see for instance Guidelines for coding of SAS programs):
• write code that can be re-used, with different parameters. Keyword parameters are preferable to positional parameters, because they are less likely to be specified incorrectly.
%global
or macro call arguments, are explicitly %local
and initialised; in general, avoid using %global` macro variables;naming:
parameterising:
%global
or macro call arguments, are explicitly %local
and initialised; in general, avoid using %global` macro variables;First of all a macro should be tested in the context it was meant for. Appropriate parameters should be specified for its arguments, preferably default parameter values. These can be dataset names, libnames, variable names, format names, (macro specific) option keywords to indicate the macro's actions, etc.. Note that for this purpose, we provide in PING
specific functionalities. In SAS for instance, the following macros are available: %macro_isblank, %ds_check, %var_check, %ds_check, %ds_check
Separate blocks of code, using indents and white space. ♦ Insert a blank line between SAS program steps; that is, before each DATA
or PROC
step. ♦ Be consistent with your indentation increments. ♦ Indent all statements in a logical grouping by the same amount. ♦ Left-justify all OPTIONS, DATA, PROC, and RUN statements. Indent all of the statements within a DATA or PROC step. ♦ Indent conditional blocks and DO groups, and do it consistently, The logic will be easier to follow. ♦ Align each END statement with its corresponding DO statement. This will make it easier to verify that they match. ♦ Remember to preface major blocks of code with explanatory comments. Insert parentheses in meaningful places in order to clarify the sequence in which mathematical or logical operations are performed, ♦ Break really complicated statements into a number of simpler statements, make the code easy to read” is “make the physical structure of the code reflect the logical structure
code inside macro definitions is further indented:
Complete systems of macros can be created, where the interrelated set of macros get called within each other, NOT often using global macro variables to store information created in the outer-level macros and used by the inner macros. The top-level macro will set up the processing environment subsequently used by the lower-level macros. Values can often be passed to the inner macros via parameters in the macro calls.
Every macro has a comment header block for built-in documentation (see page on documentation
Template Macro: Macro Setup In this section of the macro, we do all the pre-execution edit checking and validation, along with displaying standard debugging information and define the local macro variables used.
Our standard is to set all enumerated (key) values parameter values to upper case. This makes checking of values easier throughout the rest of the macro code and the upper case values stand out better
We then define and describe local macro variables. While SAS does not require local variable to be defined (with the LOCAL statement), it prevents you from accidentally referencing or changing a global variable.
Use comments to explain subsetting or other conditional logic, ♦ Use comments throughout the code to document the program,
checks for values that must always be specified, provides the default value if a parameter is set to null, checks values for enumerated parameters and will abort the macro if anything serious is detected. Doing this before the macro starts executing can save significant amounts of machine and user time if an invalid parameter value was entered.
In summary, to reduce the number of times the data are read: ♦ Minimize the number of passes through the data, ♦ Minimize the number of DATA steps, ♦ Read and store only the data that are needed, ♦ Sort the data only when it is absolutely necessary.
Here are a few more efficiency-related guidelines: • When you read in an external file, use pointer controls, informats, or column specifications in the INPUT statement, to read only those fields you actually need. • Store only the variables you need by using DROP or KEEP statements, DROP= or KEEP= options (eliminate variables from the output data set which are needed only during DATA step execution, and not afterward). • When only one condition can be true for a given observation, use IF ... THEN ...ELSE ... statements (or a SELECT group), instead of a series of IF ... THEN ... statements without ELSE statements (In a sequence of IF-THEN statements without the ELSE, the SAS System will check each condition for every observation). • When using a series of IF ... THEN ... ELSE ... statements, list the conditions in descending order of probability. This will save CPU time., • Use the LENGTH statement to reduce the storage space for variables in SAS data sets. • Minimize workspace usage by using the DELETE statement in a PROC DATASETS step, to eliminate temporary data sets that are no longer needed by the program. • Use the IN operator instead of a series of multiple logical OR operators.
Here's my humble opinion: If a parameter will almost always default to a specific value, make it a named parameter, otherwise make it positional. That way one can take advantage of the ability to assign that parameter to a default value when designing the macro Keyword Parameters allow for a true default value for parameter used by the macro, can be used in any order and finally allow for tracing back what was enetered as values for what parameter (think debugging)
TODO?
Then, after the macro’s MEND statement, we have some code to help us develop and debug the macro. This code should be deleted when the final macro is saved to the production macros fileref.
/* http://www.mwsug.org/proceedings/2010/advanced/MWSUG-2010-50.pdf */