Advanced Techniques Track

Principals of Automation

Robert Ellsworth

This paper discusses the basic principles to automate base SAS programs. The paper describes how automation is achieved by using no manual code changes, programmatically populating output, scheduling, process run control, validating inputs, validating results, email notification, restartability, and using standard code.

View paper.

Saving and Restoring Startup (Initialized) SAS® System Options

Kirk Paul Lafler

Processing requirements sometimes require the saving (and restoration) of SAS® System options at strategic points during a program’s execution cycle. This paper and presentation illustrates the process of using the OPTIONS, OPTSAVE, and OPTLOAD procedures to perform the following operations:

- Display portable and host-specific SAS System options and their settings;
- Display restricted SAS System options;
- Display SAS System options that can be restricted;
- Display information about SAS System option groups;
- Display a list of SAS System options that belong to a specific group;
- Display a list of SAS System options that can be saved;
- Save startup SAS System options;
- Restore startup SAS System options, when needed.

View paper.

People Keep ASCIIng Me About These Characters ** BEST CONTRIBUTED PAPER **

Dan Konkler, Rebecca Bowermaster and Stephanie Sanchez

starAdvanced Techniques Best Paper Winner

Every programmer has run into an issue at least once that was caused by the presence of a control character or extended ASCII character in their dataset. You get extra records in your dataset because there’s a line feed character within a CSV field. Your Excel output has “unreadable content” because your data contain “<, >, or &”. Something went wrong in your RTF file when your trusty escape character wound up in your dataset. Your fancy validation program that reads in the production output for programmatic comparison fails because it can’t process the extended ASCII or control characters. These issues can cause hours of lost time to searching for the problematic character. In this paper, we present two macros that can quickly identify potentially problematic characters in your dataset library and point you to the specific records and fields that need your attention.

View paper.

Fun With The Fun King: The SAS Solution to the Magic Square

Frank Ferriola

My father introduced me to the "Magic Square" as a 3x3 or other odd numbered square in which you could place the integers from 1 to (n*n) and have all rows, across, down and diagonally equal
the same number.

In 2001, I included the concept of the Magic Square in a presentation at WUSS, and how my father taught me a simple pattern that solved it in seconds. In 2005, I presented the Manual Solution to the Magic Square.

In this paper I present the SAS(R) solution to the problem, which will take you through all the rules that get applied to solve for any square that has an odd number of rows and columns.

View paper.

Tips for Correctly and Efficiently Comparing Two Files in SAS®

Aaron Brown

This paper gives some tips for correctly and efficiently comparing two data files via SAS® programs, not just to locate if discrepancies exist but where they exist. This can be helpful if you need to compare two different versions of a file. The tips include thoughts on different methods of reading data into SAS, then code examples for the COMPARE and SQL procedure to compare the datasets.

View paper.

Working with Sparse Matrices in SAS®

Andrew Kuligowski and Lisa Mendez

For the past couple of years, it seems that “Big Data” has been a buzzword in the industry. We have more and more data coming in from more and more places, and it is our job to figure out how best to handle it. One way to attempt to organize data is with arrays – but what do you do when the array you are attempting to populate is so large that it cannot be handled in memory. Further, how do you handle a large array when most of the elements are missing?

This presentation deals with the concept of a Sparse Matrix – that is, a large array with relatively few actual elements. We will address methods such a construct be handled while keeping memory, CPU, clock, and programmer time to their respective minimums.

View paper.

User-Defined Multithreading with the SAS® DS2 Procedure: Performance Testing DS2 Against Functionally Equivalent DATA Steps

Troy Martin Hughes

The Data Step 2 (DS2) procedure affords the first opportunity for developers to build custom, multithreaded processes in Base SAS®. Multithreaded processing debuted in SAS 9, when built-in procedures such as SORT, SQL, and MEANS were threaded to reduce runtime. Despite this advancement, and in contrast with languages such as Java and Python, SAS 9 still did not provide developers the ability to create custom, multithreaded processes. This limitation was overcome in SAS 9.4 with the introduction of the DS2 procedure—a threaded, object-oriented version of the DATA step. However, because DS2 relies on methods and packages (neither of which have been previously available in Base SAS), both DS2 instruction and literature have predominantly fixated on these object-oriented aspects rather than DS2 multithreading. This text, rather, focuses squarely on DS2 multithreading and compares the performance and efficiency of multithreaded processes with functionally equivalent single-threaded DATA steps and with asynchronous multiprocessing.

View paper.

Prevent a Failure of Communication: Things you should know about HASH and the PDV.

Elizabeth Axelrod

Using hash tables to build tables in memory enables you to solve interesting problems, using code that's quite straightforward. But not having a clear understanding of how your hash variables communicate with the Program Data Vector (PDV) can result in some unexpected results or unobserved errors. Using a few simple examples, this paper will follow some data as it makes its way through a DATA step, the PDV, and hash tables. Don't be a victim of a failure to communicate!

View paper.

Proc Dtree VS Proc Hpsplit

YuTing Tian

Regression and classification trees are methods for analyzing how a dependent variable is correlated with independent variables. Proc Hpsplit and Proc Dtree can both create decision trees that look similar. Both begin with a single node followed by increased number of leaves. However, they focus on a different purpose. This paper is a preliminary introduction differences between Proc Hpsplit and Proc Dtree.

View paper.

Fifteen Functions to Supercharge Your SAS® Code

Josh Horstman

The number of functions included in SAS® software has exploded in recent versions, but many of the most amazing and useful functions remain relatively unknown. This paper will discuss such functions and provide examples of their use. Both new and experienced SAS programmers should find something new to add to their toolboxes.

View paper.