September 7-9, 2016      
WESTERN USERS OF SAS SOFTWARE                                                                       
 
ACADEMIC SECTIONS

This section offers more advanced presentations on a wide variety of topics and products geared for an audience of intermediate and advanced programmers.

SAS defines analytics as “data-driven insight for better decisions.” Popular topics in this section include aspects of data modeling, and forecasting, as well as demonstrations of analytic procedures, such as analysis of variance, Bayesian statistics, logistic regression, mixed models, factor analysis, and time series analyses. Presentations offered include practical applications and results of statistical methods used to analyze data from a variety of fields.

This section is an extension of the SAS Essentials Workshop for contributed presenters to share their useful techniques. Although geared towards novice users, this material can be of interest to all users.

Presentations in this section explain methods using SAS to handle data too large for traditional proceessing, otherwise known as "big data".

This section covers all aspects of data manipulation. Topics include comparing datasets, transferring data, creating data-driven contnet, and modifying variables.

Popular topics in this section include the use of PROC REPORT, SAS Styles, Templates and ODS, as well as a variety of techniques used to produce SAS results in Microsoft Excel, Powerpoint, and other Office Applications. Topics include graphics, data visualization, publishing, and reporting. Popular topics in this section include the use of SAS/Graph, SAS Styles, Templates and ODS, as well as a variety of techniques used to produce SAS results in Microsoft Excel and other Office Applications.

Data science is regarded as an extension of statistics, data mining, and predictive analytics. This section focuses on how "the Sexiest Job of the 21st Century" is done in SAS.. Areas of interest include text analytics and social media data.

Presenters prepare a digital display that will be available to be viewed by all attendees throughout the conference, rather than conducting a lecture-style presentation. The section often displays high-resolution graphics and/or thought-provoking concepts or ideas that allow some independent study by conference participants.

Presentations are centered around visualizing data, including PROC GPLOT, animated graphs, and other customizations.

Hands-on-Workshops provide attendees ‘hands-on-the-keyboard’ interaction with SAS Software during each presentation. Presenters guide attendees through examples of SAS Software techniques and capabilities while offering the opportunity to ask questions and to learn through practice. All HOW presentations are given by experienced SAS users who are invited to present.

This section has presentations on data integration, analysis, and reporting, yet with industry specific content. Examples of content driven topics are: 

  • Health Outcomes and Healthcare research methods
  • Data Standards and Quality Control for Submission of Clinical Trial data to FDA
  • Banking, Credit Card, Insurance and Risk Management
  • Insurance modeling and analytics

This section helps SAS users understand how to immerse themselves in the rich world of resources that are devoted to achieving high quality SAS education/training, publication, social networking, consulting, certification, technical support, and opportunities for professional affiliation and growth.

This section allows novice SAS users and others to attend a series of presentations that will guide them through the fundamental concepts of the creation of Base SAS DATA step and PROC syntax, followed by two Hands-on Workshops. All SAS Essentials presentations are conducted by experienced SAS users who are invited to present.


A Preview of Presentations, Posters & Workshops
Presentations are lecture style and will be 10, 20, or 50-minutes in length.  Most workshop sections are 50 or 75-minutes in length and may include the opportunity for hands-on the PC instruction.

Invited presentations are listed in bold at the beginning of each section.

Note: This is a preliminary listing. Presentattions may be added, deleted or modified. Attendees will receive a final schedule with their registration materials.
Advanced Techniques Analytics & Statistics Beginner's Techniques
Big Data Data Management Data Presentation & Reporting
Data Science ePosters Graphics
Hands-on Workshops Industry Solutions
SAS Essentials Workshop    
Advanced Techniques
Where Am I? What Have I Done?
Jack Hamilton
If you have a program that runs a long time, or will be run multiple times, you might want to keep track of how long each part of the program takes to execute. This can help you find the slow parts of your program and predict how long a future run will take. This paper presents a tool to help with those problems.  The Write_Program_Status macro provides a way to create a status file, easily read by humans or machines.
Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS Code
Josh Horstman
Nearly every SAS® program includes logic that causes certain code to be executed only when specific conditions are met.  This is commonly done using the IF...THEN...ELSE syntax.  In this paper, we will explore various ways to construct conditional SAS logic, including some that may provide advantages over the IF statement.      

Topics will include the SELECT statement, the IFC and IFN functions, the CHOOSE and WHICH families of functions, and the COALESCE function.  We’ll also make sure we understand the difference between a regular IF and the %IF macro statement.
A Waze App for Base SAS®: Automatically Routing around Locked Data Sets, Bottleneck Processes, and Other Traffic Congestion on the Data Superhighway
Troy Hughes
The Waze application, purchased by Google in 2013, alerts millions of users about traffic congestion, collisions, construction, and other complexities of the road that can stymie motorists' attempts to get from A to B. From jackknifed rigs to jackalope carcasses, roads can be gnarled by gridlock or littered with obstacles that impede traffic flow and efficiency. Waze algorithms automatically reroute users to more efficient routes based on user-reported events as well as historical norms that demonstrate typical road conditions. Extract, transform, load (ETL) infrastructures often represent serialized process flows that can mimic highways, and which can become similarly snarled by locked data sets, slow processes, and other factors that introduce inefficiency.  The LOCKITDOWN SAS® macro, introduced at WUSS in 2014, detects and prevents data access collisions that occur when two or more SAS processes or users simultaneously attempt to access the same SAS data set. Moreover, the LOCKANDTRACK macro, introduced at WUSS in 2015, provides real-time tracking of and historical performance metrics for locked data sets through a unified control table, enabling developers to hone processes to optimize efficiency and data throughput. This text demonstrates the implementation of LOCKSMART and its lock performance metrics to create data-driven, fuzzy logic algorithms that preemptively reroute program flow around inaccessible data sets. Thus, rather than needlessly waiting for a data set to become available or a process to complete, the software actually anticipates the wait time based on historical norms, performs other (independent) functions, and returns to the original process when it becomes available.
Ushering SAS Emergency Medicine into the 21st Century: Toward Exception Handling Objectives, Actions, Outcomes, and Comms
Troy Hughes
Emergency medicine comprises a continuum of care that often commences with first aid, basic life support (BLS), or advanced life support (ALS.) First responders, including firefighters, emergency medical technicians (EMTs), and paramedics, often are the first to triage the sick, injured, and ailing, rapidly assessing the situation, providing curative and palliative care, and transporting patients to medical facilities. Emergency medical services (EMS) treatment protocols and standard operating procedures (SOPs) ensure that, despite the singular nature of every patient as well as potential complications, trained personnel have an array of tools and techniques to provide varying degrees of care in a standardized, repeatable, and responsible manner. Just as EMS providers must assess patients to prescribe an effective course of action, software too should identify and assess process deviation or failure, and similarly prescribe its commensurate course of action. Exception handling describes both the identification and resolution of adverse, unexpected, or untimely events that can occur during software execution, and should be implemented in SAS® software that demands reliability and robustness. The goal of exception handling always is to reroute process control back to the "happy trail" or "happy path"—i.e., the originally intended process path that delivers full business value. But, when insurmountable events do occur, exception handling routines should instruct the process, program, or session to gracefully terminate to avoid damage or other untoward effects. Between the opposing outcomes of a fully recovered program and graceful program termination, however, lie several other exception resolution paths that can deliver full or partial business value, sometimes only with a slight delay. To that end, this text demonstrates these paths and discusses various internal and external modalities for communicating exceptions to SAS users, developers, and other stakeholders.
Talk to Me!
Elizabeth Axelrod
Wouldn’t it be nice if your long-running program could tap you on the shoulder and say ‘Okay, I’m all done now’.  It can!  This quick tip will show you how easy it is to have your SAS® program send you (or anyone else) an email during program execution.  Once you’ve got the simple basics down, you’ll come up with all sorts of uses for this great feature, and you’ll wonder how you ever lived without it.
Finding All Differences in two SAS libraries using Proc Compare
Bharat Kumar Janapala
In the clinical industries validating the datasets by parallel programming and proc comparing these derived datasets is a routine practice, but due to constant updates in raw data it becomes hard to figure out differences between two libraries. The current program points all the differences between libraries in the most optimized way with the help of Proc compare and SAS help directories.

Firstly, the program figures out the datasets present in both the libraries and lists out the uncommon datasets. Secondly, the program looks for the total number of observations and variables present in both the libraries by dataset and lists out both the uncommon variables and datasets with differences in observation number. Thirdly, assuming both libraries are identical the program proc compares the datasets with like names and captures the differences which could be monitored by assigning the max number of differences by variable for optimization. Finally, the program reads all the differences and provides a consolidated report followed by the description by dataset.
Let The Environment Variable Help You: Moving Files Across Studies and Creating SAS Library On-The-Go
David Liang
In clinical trials, datasets and SAS programs are stored under different studies under different products in Unix. SAS programmers need to access those locations frequently, to read in data for programming, or to copy files for reuse in new analysis. Typing the lengthy directory path is very time consuming and nerve-racking. This paper describes an efficient way to store the various directory paths in advance through environment variables. Those predefined environment variables can be used for Unix file operations (coping, deleting, searching for files, etc.). Information carried by those variables can also be passed into SAS to construct libraries wherever you go.
Check Please: An Automated Approach to Log Checking
Richann Watson
In the Pharmaceutical industry, we find ourselves having to re-run our programs repeatedly for each deliverable.  These programs can be run individually in an interactive SAS® session which will allow us to review the logs as we execute the programs.  We could run the individual program in batch and open each individual log to review for unwanted log messages, such as ERROR, WARNING, uninitialized, have been converted to, etc.  Both of these approaches are fine if there are only a handful of programs to execute.  But what do you do if you have hundreds of programs that need to be re-run?  Do you want to open every single one of the programs and search for unwanted messages?  This manual approach could take hours and is prone to accidentally oversight.   This paper will discuss a macro that will search through a specified directory and check either all the logs in the directory or check only logs with a specific naming convention or check only the files listed.  The macro then produces a report that lists all the files checked and indicates whether or not issues were found.
Let SAS® Do Your DIRty Work
Richann Watson
Making sure you have all the necessary information to replicate a deliverable saved can be a cumbersome task.  You want to make sure that all the raw data sets are saved, all the derived data sets, whether they are SDTM or ADaM data sets, are saved and you prefer that the date/time stamps are preserved.  Not only do you need the data sets, you also need to keep a copy of all programs that were used to produce the deliverable as well as the corresponding logs when the programs were executed.  Any other information that was needed to produce the necessary outputs needs to be saved.  All of these needs to be done for each deliverable and it can be easy to overlook a step or some key information.  Most people do this process manually and it can be a time-consuming process, so why not let SAS do the work for you?
.LST Files with Proc Compare Result
Manvitha Yennam and Srinivas Vanam
The most widely used method to validate programs is Double Programming, which involves two programmers working on a single program and finally comparing their outputs by using procedure like “compare”. The Proc Compare results are generally produced in .LST files. Most of the companies do the manual review by checking each and every .LST file to ensure that the outputs are similar. But this manual process is time consuming as well as error prone. The purpose of this paper is to use a SAS macro instead of following the manual review process. This SAS macro reads all the .LST files provided a path and creates a summary of the list files and indicating if it has issue or not and also the type of issue.
Analytics and Statistics
Creating meaningful factors from dietary questionnaires
Jenna Carlson
In studying diet, the hypothesis is often focused on overall dietary patterns rather than individual questions about specific food/beverage consumption. Factor analysis is a useful tool in such situations to create meaningful variables from high dimensional data. However, traditional factor analysis is not appropriate for the commonly-used food-frequency questionnaires, as responses are categorical. Instead, factor analysis should be performed using polychoric correlation. I will demonstrate how to do factor analysis of high-dimensional food and beverage frequency data using polychoric correlation in SAS. In particular, beginning with SAS/STAT version 9.4, it is possible to compute polychoric correlation without the macro %polychor, increasing the ease with which polychoric correlation can be used in factor analysis. Data from a longitudinal study of Appalachian mother-baby pairs is used to illustrate this technique. The results highlight the relationship of poor diet and oral health among women with good oral hygiene.
How we know an intervention works: Program evaluation by example
AnnMaria De Mars
Read any publication, from national media to your local news website. Educational achievement, particularly in STEM fields, is a grave concern and billions of dollars are spent addressing this issue. How can SAS be applied to analyze the outcome of an intervention, and, equally important, convey the results of that analysis to a non-technical audience? Using real data from evaluations of educational games, this presentation walks through the steps of an evaluation, from needs assessment to validation of measurement to pre-post test comparison. Techniques applied include PROC FREQ with options for correlated data, PROC FACTOR for factor analysis, PROC TTEST and PROC GLM for repeated measures ANOVA. Happy use is made throughout of ODS Statistical Graphics. Using standard SAS/STAT procedures, these analyses can be run on any operating system with SAS, including SAS Studio on an iPad.
Constructing Confidence Intervals for Differences of Binomial Proportions in SAS®
Will Garner
Given two binomial proportions, we wish to construct a confidence interval for the difference. The most widely known method is the Wald method (ie, normal approximation), but it can produce undesirable results in extreme cases (eg, when the proportions are near 0 or 1). Numerous other methods exist, including asymptotic methods, approximate methods, and exact methods. This paper presents 9 different methods for constructing such confidence intervals, 8 of which are available in SAS® 9.3 procedures. The methods are compared and thoughts are given on which method to use.
An Animated Guide: Incremental Response Modeling in Enterprise Miner
Russ Lavery
Some people can be expected to buy a product without any marketing contact. If all potential customers are contacted, a company cannot determine the true effect of a marketing manipulation.  This talk uses the INCREMENTAL RESPONSE node in SAS® Enterprise Miner™ to solve a basic marketing problem.  Marketers typically target, and spend money contacting, all potential customers.  This is wasteful, since some of these people would become customers on their own.  This node uses a set of data to separate customers into groups: 1) likely to buy 2) likely to buy if they are a subject of marketing campaigns and 3) customers who are expected to be resistant to marketing efforts.
Employing Latent Analyses in Longitudinal Studies: An Exploration of Independently Developed SAS® Procedures
Deanna Schreiber-Gregory
This paper looks at several ways to investigate latent variables in longitudinal surveys by utilizing three independently created SAS® procedures. Three different analyses for latent variable discovery will be reviewed and explored: latent class analysis, latent transition analysis, and latent trajectory analysis. The latent analysis procedures explored in this paper (all of which were developed outside of SAS® Institute) are PROC LCA, PROC LTA and PROC TRAJ. The specifics behind these procedures and how to add them to one’s procedure library will be explored and then applied to an explorative case study question. The effect of the latent variables on the fit and use of the regression model compared to a similar model using observed data may also be briefly reviewed. The data used for this study was obtained via the National Longitudinal Study of Adolescent Health, a study distributed and collected by Add Health.  Data was analyzed using SAS 9.4. This paper is intended for moderate to advanced level SAS® users. This paper is also written to an audience with a background in behavioral science and/or statistics.
MIghty PROC MI to the rescue!
Cynthia Alvarez
Missing data is a feature of many data sets, as participants may withdraw from studies, not provide self-reported measures, and at times, technical issues may interfere with data collection. If we only use completed observations, we are left with larger standard errors, wider confidence intervals, and larger p-values. Missing data methods such as complete case analysis or imputation may be used but the missing data mechanisms and patterns must be understood first. This paper will give an overview of missing data sources, patterns and mechanisms. A complete dataset will be used to obtain true regression analysis results. Two datasets with missing values will be created, one with data missing completely at random and one with data missing not at random. The missing data methods of complete case, single and multiple imputation will be applied. Proc MI and MIANALYZE will be used in SAS® 9.4 for the analysis. The results of the missing data methods will be compared to each other and to the true results.
Equivalence Tests
John Amrhein and Fei Wang
Motivated by the frequent need for equivalence tests in clinical trials, this paper provides insights into tests for equivalence. We summarize and compare equivalence tests for different study designs, including designs for one-sample problem, designs for two-sample problem (paired observations, and two unrelated samples), and designs with multiple treatment arms. Power and sample size estimation are discussed. We also give examples to implement the methods using the FREQ, TTEST, MIXED, and POWER procedures in SAS/SAT® software.
Distance Correlation for Vectors: A SAS® Macro
Thomas Billings
The Pearson correlation coefficient is well-known and widely-used. However, it suffers from certain constraints: it is a measure of linear dependence (only) and does not provide a test of statistical independence, and it is restricted to univariate random variables. Since its inception, related and alternative measures have been proposed to overcome these constraints. Several new measures to replace or supplement Pearson correlation have been proposed in the statistical literature in recent years. Szekeley et al. (2007) describes a new measure - distance correlation - that overcomes the shortcomings of Pearson correlation.  Distance correlation is defined for 2 random variables X, Y (which can be vectors) as a weight or distance function applied to the difference between the joint characteristic function for (X,Y) and the product of the individual characteristic functions for X, Y. In practice it is estimated by computing the individual distance matrices for X, Y, and distance correlation is a similarity measure for the 2 matrices. For the bivariate normal case, distance correlation is a function of Pearson correlation. Distance correlation also supports a related test of statistical independence. Distance correlation has performed well in simulation studies comparing it with other alternatives to Pearson correlation.  Here we present a Base SAS® macro to compute distance correlation for arbitrary real vectors.
Determining the Functionality of Water Pumps in Tanzania Using SAS® EM and VA
India Kiran Chowdaravarpu, Vivek Manikandan Damodaran and Ram Prasad Poudel
Accessibility to clean and hygienic drinking water is a basic luxury every human being deserves. In Tanzania, there are 23 million people who do not have access to safe water and are forced to walk miles in order to fetch Water for daily needs. The prevailing problem is more of a result of poor maintenance and inefficient functioning of existing infrastructure such as hand pumps. To solve the current water crisis and ensure accessibility to safe water, there is a need to locate non-functional and functional pumps that need repair so that they can be repaired or replaced. It is highly cost ineffective and impractical to inspect the functionality of over 74,251 water points manually in a country like Tanzania where resources are very limited. The objective of this study is to build a model to predict which pumps are functional, which needs some repair and which don’t work at all by using the data from the Tanzania Ministry of Water. We also find the important variables that predict the pump’s working condition. The data is managed by Taarifa waterpoints dashboard. After pre-processing the final data consists of 39 variables and 74,251 observations. We used SAS Bridge for ESRI and SAS VA to illustrate spatial variation of functional water points at regional level of Tanzania along with other socio economic variables. Among decision tree, neural network, logistic regression and HPrandom forest models, HP random forest model was found the best model. The misclassification rate, sensitivity and specificity of the model are 24.91%, 62.7% and 91.7% respectively. The classification of water pumps using the champion model will expedite maintenance operations of water points that will ensure clean and accessible water across Tanzania in low cost and in a short period of time.
Fitting Threshold Models using the SAS® Procedures NLIN and NLMIXED
Todd Coffey
Threshold regression models have applications in many diverse fields, including dose-response, cell biology, ecology, infectious disease, epidemiology, finance, and econometrics.  The name of these models is derived from a location in the data in which the initial statistical model changes abruptly and a new pattern emerges.  Often, the initial part of the model is a baseline or background effect and the subsequent part describes a new shape once the threshold is reached.  When the threshold, or common join point between the distinct models on either side of it, is unknown, the statistical model is nonlinear in the parameters, regardless of the functional form of the piecewise models.  Nonlinear models can be challenging to fit due to the lack of a closed-form estimate for parameters and threshold models can provide additional difficulty due to the irregular features of the threshold parameter.  SAS® has two procedures that are commonly used to fit threshold models. PROC NLIN is used when all observations are independent.  When observations come from the same subject, the relatively newer PROC NLMIXED can be used to model this intra-subject correlation using random coefficients.  In this paper, we discuss the similarities and differences between these two SAS/STAT® procedures. Then, using two different datasets, one from an experiment on the toxicological effects of pesticides and another from a veterinary surgical study, we demonstrate how to fit threshold models, with or without random coefficients. We compare results from the four available algorithms in PROC NLIN and PROC NLMIXED and contrast parameter estimates obtained from the two SAS® procedures.
Knowledge is Power: The Basics of PROC POWER
Elaina Gates
In many statistics applications it is extremely important to understand how the power function of a specific distribution behaves. Coding this from scratch can be an arduous process. This talk will outline how to use PROC POWER to dramatically simplify the coding process and obtain other relevant information. I will also go over the basics of PROC POWER and how to interpret the output for various probability distributions.
Hierarchical Generalized Linear Models for Behavioral Health Risk-Standardized 30-Day and 90-Day Readmission Rates
Allen Hom
Hierarchical Generalized Linear Models for Behavioral Health Risk-Standardized 30-Day and 90-Day Readmission Rates The Achievements in Clinical Excellence (ACE) program encourages excellence across all behavioral health network facilities by promoting those that provide the highest quality of care.  Two key benchmarks of outcome effectiveness in the ACE program are the risk adjusted 30-day readmission and risk adjusted 90-day readmission rates. Risk adjustment was performed with hierarchical general linear models (HGLM) to account for differences across hospitals in patient demographic and clinical characteristics. One year of administrative admission data (June 30, 2013 to July 1, 2014) from patients for 30-day (N=78,761, N Hospitals=2,233) and 90-day (N=74,540, N Hospital =2,205) time frames were the data sources. HGLM simultaneously models two levels 1) Patient level – models log-odds of hospital readmission using age, sex, selected clinical covariates, and a hospital-specific intercept, and 2) Hospital level – a random hospital intercept that accounts for within-hospital correlation of the observed. PROC GLIMMIX was used to implement a HGLM with hospital as a random (hierarchical) variable separately for substance use disorder (SUD) admissions and mental health (MH) admissions and pooled to obtain a hospital-wide risk adjusted readmission rate. The HGLM methodology was derived from Centers for Medicare & Medicaid Services (CMS) documentation for the 2013 Hospital-Wide All-Cause Risk-Standardized Readmission Measure SAS package. This methodology was performed separately on 30-day and 90-day readmission data. The final metrics were a hospital-wide risk adjusted 30-day readmission rate percent and a hospital-wide risk adjusted 90-day readmission rate percent. HGLM models were cross-validated on new production data that overlapped with the development sample. Revised HGLM models were tested in April, 2015, and the outcome statistics were extremely similar. In short, the test of the revised model cross-validated the original HGLM models, because the revised models were based on different samples.
Demystifying the CONTRAST and ESTIMATE Statement
Hector Lemus
Many analysts are mystified on how to use CONTRAST and ESTIMATE statements in SAS to test a variety of general linear hypotheses (GLH). GLHs can be used to parsimoniously test key comparisons and complex hypotheses. However, setting up a simple GLH tends to intimidate some SAS users.  Examples from various sources seem to magically come up with the correct answer. The key is to understand how the procedure parameterizes the model and then use that parameterization to construct the GLH. CONTRAST and/or ESTIMATE statements can be found in many of the modeling procedures in the SAS.  However, not all procedures use the same syntax for these statements. This presentation will demystify the use of the CONTRAST and ESTIMATE statements using examples in PROCs GLM, LOGISTIC, MIXED, GLIMMIX and GENMOD.
Short Introduction to Reliability Engineering and PROC RELIABILITY to Non-Engineers
Danny Rithy
Reliability engineering specializes how often a product or system fails under stated conditions over time. In the modern world, it is important for a product or system maintains for a long time. Because technology is well-developed these days, some systems will eventually fail. Mathematical and statistical methods are useful for quantifying and for analyzing reliability data. However, the most important priority of reliability engineering is to apply engineering knowledge to prevent the likelihood of failures. This paper introduces the idea of reliability engineering to non-engineers as well as PROC RELIABILITY that demonstrates some applications of reliability data.
Simulating Queuing Models in SAS
Danny Rithy
This paper introduces users to how to simulate queuing models using a set of SAS macros: %MM1,%MG1, and %MMC. The SAS macros will simulate queuing system in which entities (like customers, patients, cars or email messages) arrive, get served either at a single station or at several stations in turn, might have to wait in one or more queues for service, and then may leave. After the simulation, SAS will give a graphical output as well as statistical analysis of the desired queuing model.
Selection Bias: How Can Propensity Score Utilization Help Control For It?
Deanna Schreiber-Gregory
An important strength of observational studies is the ability to estimate a key behavior or treatment’s effect on a specific health outcome. This is a crucial strength as most health outcomes research studies are unable to use experimental designs due to ethical and other constraints. Keeping this in mind, one drawback of observational studies (that experimental studies naturally control for) is that they lack the ability to randomize their participants into treatment groups. This can result in the unwanted inclusion of a selection bias. One way to adjust for a selection bias is through the utilization of a propensity score analysis. In this paper we explore an example of how to utilize these types of analyses. In order to demonstrate this technique, we will seek to explore whether recent substance abuse has an effect on an adolescent’s identification of suicidal thoughts. In order to conduct this analysis, a selection bias was identified and adjustment was sought through three common forms of propensity scoring: stratification, matching, and regression adjustment. Each form is separately conducted, reviewed, and assessed as to its effectiveness in improving the model. Data for this study was gathered through the Youth Risk Behavior Surveillance System, an ongoing nationwide project of the Centers for Disease Control and Prevention. This presentation is designed for any level of statistician, SAS® programmer, or data analyst with an interest in controlling for selection bias.
Using SAS to analyze Countywide Survey Data: A look at Adverse Childhood Experiences and their Impact on Long-term Health
Roshni Shah
The adverse childhood experiences (ACEs) scale measures childhood exposure to abuse and household dysfunction. Research suggests ACEs are associated with higher risks of engaging in risky behaviors, poor quality of life, morbidity, and mortality later in life. In Santa Clara County, a large diverse county where 88% residents have household internet access, we conducted a county-wide Behavioral Risk Factor Survey of adults with a unique web-based follow-up. We conducted a random-digit-dial telephone survey (N=4,186) and follow-up online survey using the CDC BRFSS ACE module. Of those eligible for the web-based survey, the response rate was 33%. The online ACE module comprised 11 questions to form 8 categories on abuse and household dysfunction.  PROC SURVEYFREQ and SURVEYLOGISTIC were used in SAS 9.4 to analyze survey data and provide county-wide estimates for Santa Clara County as a whole. Most respondents (74%) reported having experienced 1+ ACEs. Emotional abuse was the most common (44%), followed by household substance abuse (28%), and household mental illness (25%). The prevalence of emotional abuse, household substance abuse, physical abuse, and household mental illness was highest among individuals with high (3+) and low (1-2) ACEs. Indicators of perceived poor health showed a strong association among individuals with ACEs. The odds of 1+ poor mental health days in the past month were higher among individuals with low ACEs (OR=2.86), high ACEs (OR=6.74), and among women (OR=2.27). A web-based survey offers a reliable means to assess a population about sensitive subjects like ACE at lower cost than a telephone survey in smaller jurisdictions. Results suggest ACEs are common among adults in the county, and may be under-reported in telephone interviews. PROC SURVEYFREQ  and SURVEYLOGISTIC in SAS are powerful tools that can be used to analyze survey data, especially for small area estimates on the health of county residents.
How D-I-D you do that? Basic Difference-in-Differences Models in SAS
Margaret Warton
Long a mainstay in econometrics research, difference-in-differences (DID) models have only recently become more commonly used in health services and epidemiologic research. DID study designs are quasi-experimental, can be used with retrospective observational data, and do not require exposure randomization. This study design estimates the difference in pre-post changes in an outcome comparing an exposed group to an unexposed (reference) group. The outcome change in the unexposed group estimates the expected change in the exposed group had the group been, counterfactually, unexposed. By subtracting this change from the change in the exposed group (the “difference in differences”), the effects of background secular trends are removed.  In the basic DID model, each subject serves as his or her own control, removing confounding by known and unknown individual factors associated with the outcome of interest. Thus, the DID generates a causal estimate of the change in an outcome associated with the initiation of the exposure of interest while controlling for biases due to secular trends and confounding. A basic repeated-measures generalized linear model provides estimates of population-average slopes between two time points for the exposed and unexposed groups and tests whether the slopes differ by including an interaction term between the time and exposure variables.  In this paper, we illustrate the concepts behind the basic DID model and present SAS code for running these models.  We include a brief discussion of more advanced DID methods and present an example of a real-world analysis using data from a study on the impact of introducing a value-based insurance design (VBID) medication plan at Kaiser Permanente Northern California on change in medication adherence.
Using PROC PHREG to Assess Hazard Ratio in Longitudinal Environmental Health Study
Xiaoyi zhou
Air pollution, especially combustion products, can activate metabolic disorders through inflammatory pathways potentially leading to obesity. The effect of air-pollution on BMI growth was shown by a previous study (Jerrett, et al. 2014).  Recognizing the role of air pollution in the development of obesity in children can help guide possible interventions reducing obesity formation. The objective of this paper is to analyze the obesity incidence of children participating in Children’s Hospital Study (CHS) who were non-obese at baseline, identify the time interval for the onset of obesity, and identify the effects of various risk factors, especially air pollutants. The PROC PHREG procedure was used in creating a model within a macro that included community random effects, stratified by sex, and adjusting for baseline characteristics.
Using PROC LOGISTIC for Conditional Logistic Regression to Evaluate Vehicle Safety Performance
Xiaoyi zhou
The LOGISTIC Procedure has several capabilities beyond standard logistic regression on binary outcome variables. For a conditional logit model, PROC LOGISTIC can perform several types of matching, 1:1, 1:M matching, and even M:N matching. This paper shows an example of using PROC LOGISTIC for conditional logit models to evaluate vehicle safety performance in fatal accidents using the Fatality Analysis Reporting System (FARS) 2004-2011 database. Conditional logistic regression models were performed with an additional stratum parameter to model the relationship between fatality of the drivers and the vehicle’s continent of origin.
Beginner's Techniques
Identifying Duplicates Made Easy
Elizabeth Angel and Yunin Ludena
Have you ever had trouble removing or finding the exact type of duplicate you want? SAS offers several different ways to identify, extract, and/or remove duplicates, depending on exactly what you want. We will start by demonstrating perhaps the most commonly used method, PROC SORT, and the types of duplicates it can identify and how to remove, flag, or store them. Then, we will present the other less commonly used methods which might give information that PROC SORT cannot offer, including the data step (FIRST./LAST.), PROC SQL, PROC FREQ, and PROC SUMMARY. The programming is demonstrated at a beginner’s level.
Don't Forget About Small Data
Lisa Eckler
Beginning in the world of data analytics and eventually flowing into mainstream media, we are seeing a lot about Big Data and how it can influence our work and our lives.  Through examples, this paper will explore how Small Data -– which is everything Big Data is not -– can and should influence our programming efforts.  The ease with which we can read and manipulate data from different formats into usable tables in SAS® makes using data to manage data very simple and supports healthy and efficient practices.  This paper will explore how using small or summarized data can help to organize and track program development, simplify coding and optimize code.
Let the CAT Out of the Bag: String Concatenation in SAS® 9
Josh Horstman
Are you still using TRIM, LEFT, and vertical bar operators to concatenate strings? It's time to modernize and streamline that clumsy code by using the string concatenation functions introduced in SAS® 9. This paper is an overview of the CAT, CATS, CATT, and CATX functions introduced in SAS® 9, and the new CATQ function added in SAS® 9.2. In addition to making your code more compact and readable, this family of functions also offers some new tricks for accomplishing previously cumbersome tasks.
SAS® Abbreviations: a Shortcut for Remembering Complicated Syntax
Yaorui Liu
SAS® Abbreviations: a Shortcut for Remembering Complicated Syntax Yaorui Liu, Department of Preventive Medicine, University of Southern California ABSTRACT One of many difficulties for a SAS® programmer is remembering how to accurately use SAS syntax, especially the ones that include many parameters. Not mastering the basic syntax parameters by heart will definitely make one’s coding inefficient because one would have to check the SAS reference manual constantly to ensure that one’s syntax was implemented properly. One of the more useful tools in SAS, but seldom known by novice programmers, is the use of SAS Abbreviations. It allows users to store text strings, such as the syntax of a DATA step function, a SAS procedure, or a complete DATA step, with a user-defined and easy-to-remember abbreviated term. Once this abbreviated term is typed within the enhanced editor, SAS will automatically bring-up the corresponding stored syntax. Knowing how to use SAS Abbreviations will ultimately be beneficial to programmers with varying levels of SAS expertise. In this paper, various examples by utilizing SAS Abbreviations will be demonstrated.
Implementation of Good Programming Practices in Clinical SAS
Srinivas Vanam and Manvitha Yennam
This Paper presents certain Tips and Techniques/Conventions that can be implemented in your Day-to-Day Programming activities to increase your Efficiency at Work. Since, SAS® Institute (or) FDA does not specify any Standards for the development of Programs, SAS® Users have the flexibility to write the programs in their own style. This might sound cool, but this results in having Inconsistent programs across the Project/Study. Hence following certain Principles/Conventions while Programming will make the programs an Asset to the Company. This target audience of this paper are both the Programmer (who creates the building blocks of the Project) and Manager (who oversees the entire Project and keeps all the building blocks together).
Big Data
Introduction to Data Quality using DataFlux® Data Management Studio
Bob Janka
Data Quality is an essential part of effective data management which then leads to improved analytics. Using data quality tools, users can plan, act, and monitor their organzation’s data. This paper introduces SAS users to DataFlux® Data Management Studio, a client available with the SAS® Data Management 9.4 package. Learn how to profile, standardize, enrich, entity resolve, household, monitor, and evaluate data. Although this paper shows small sets of sample data, these techniques scale well for large data volumes such as those found in enterprise data management where data volumes can exceed 100 million records across 10s of different input sources or those found in “Big Data” where consistency in source data is less common. Readers with an intermediate knowledge of Base SAS and SAS/Macro as well as SQL will understand these concepts and learn to utilize the techniques presented in this paper.
Top Ten SAS® Performance Tuning Techniques
Kirk Paul Lafler
SAS® Base software provides users with many choices for accessing, manipulating, analyzing, and processing data and results. Partly due to the power offered by the SAS software and the size of data sources, many application developers and end-users are in need of guidelines for more efficient use. This presentation highlights my personal top ten list of performance tuning techniques for SAS users to apply in their applications. Attendees learn DATA and PROC step language statements and options that can help conserve CPU, I/O, data storage, and memory resources while accomplishing tasks involving processing, sorting, grouping, joining (merging), and summarizing data.
Sorting a Bajillion Records: Conquering Scalability in a Big Data World
Troy Hughes
"Big data" is often distinguished as encompassing high volume, velocity, or variability of data. While big data can signal big business intelligence and big business value, it also can wreak havoc on systems and software ill-prepared for its profundity. Scalability describes the ability of a system or software to adequately meet the needs of additional users or its ability to utilize additional processors or resources to fulfill those added requirements. Scalability also describes the adequate and efficient response of a system to increased data throughput. Because sorting data is one of the most common as well as resource-intensive operations in any software language, inefficiencies or failures caused by big data often are first observed during sorting routines. Much SAS® literature has been dedicated to optimizing big data sorts for efficiency, including minimizing execution time and, to a lesser extent, minimizing resource usage (i.e., memory and storage consumption.) Less attention has been paid, however, to implementing big data sorting that is reliable and robust even when confronted with resource limitations. To that end, this text introduces the SAFESORT macro that facilitates a priori exception handling routines (which detect environmental and data set attributes that could cause process failure) and post hoc exception handling routines (which detect actual failed sorting routines.) If exception handling is triggered, SAFESORT automatically reroutes program flow from the default sort routine to a less resource-intensive routine, thus sacrificing execution speed for reliability. However, because SAFESORT does not exhaust system resources like default SAS sorting routines, in some cases it performs more than 200 times faster than default SAS sorting methods. Macro modularity moreover allows developers to select their favorite sorting routine and, for data-driven disciples, to build fuzzy logic routines that dynamically select a sort algorithm based on environmental and data set attributes.
SAS integration with NoSQL database
kevin lee
We are living in the world of abundant data, so called “big data”. The term “big data” is closely associated with any structured data – unstructured, structured and semi-structured. They are called “unstructured” and “semi-structured” because they do not fit neatly in a traditional row-column relational database. A NoSQL (Not only SQL or Non-relational SQL) database is the type of database that can handle any structured data. For example, a NoSQL database can store any structured data such as XML (Extensible Markup Language), JSON (JavaScript Object Notation) or RDF (Resource Description Framework) files. If an enterprise is able to extract any structured data from NoSQL databases and transfer it to the SAS environment for analysis, it will produce tremendous value, especially from a big data solutions standpoint. This paper will show how any structured data is stored in the NoSQL databases and ways to transfer it to the SAS environment for analysis.  First, the paper will introduce the NoSQL database. For example, NoSQL databases can store any structured data such as XML, JSON or RDF files. Secondly, the paper will show how the SAS system connects to NoSQL databases using REST (Representational State Transfer) API (Application Programming Interface). For example, SAS programmers can use the PROC HTTP option to extract XML or JSON files through REST API from the NoSQL database. Finally, the paper will show how SAS programmers can convert XML and JSON files to SAS datasets for analysis. For example, SAS programmers can create XMLMap files using XMLV2 LIBNAME engine and convert the extracted XML files to SAS datasets.
DS2 Versus Data Step: Efficiency Considerations
Andra Northup
There is recognition that in large, complex systems the advantages of object-oriented concepts available in DS2 of modularity, code reuse and ease of debugging can provide increased efficiency.  Object-oriented programming also allows multiple teams of developers to work on the same project easily. DS2 was designed for data manipulation and data modeling applications that can achieve increased efficiency by running code in threads, splitting the data across multiple processors and disks. Of course, performance is also dependent on hardware architecture and the amount of effort you put into the tuning of your architecture and code. Join our panel for a discussion of architecture, tuning and data size considerations in determining if DS2 is the more efficient alternative.
Using Shared Accounts in Kerberized Hadoop Clusters with SAS®: How Can I Do That?
Michael Shealy
Using shared accounts to access third-party data servers is a common architecture in SAS® environments.  SAS software can support seamless user access to shared accounts in databases such as Oracle, via group definitions and outbound authentication domains in Metadata.  However, the configurations necessary to leverage shared accounts in Hadoop clusters with Kerberos authentication are more complicated.  Not only must Kerberos tickets be generated and maintained in order to simply access the Hadoop environment, but those tickets must allow access as the shared account instead of the individual users’ accounts.  Methods for implementing this arrangement in SAS environments can be non-intuitive.  This paper starts by outlining several general architectures of shared accounts in Kerberized Hadoop environments.  It then presents possible methods of managing such shared account access in SAS environments, including specific implementation details, code samples and security implications.        Finally, troubleshooting methods are presented for when issues arise.  Example code and configurations for this paper were developed on a SAS 9.4 system running over Redhat Enterprise Linux 6.
Data Management
What just happened? A visual tool for highlighting differences between two data sets.
Steve Cavill
Base SAS includes a great utility for comparing two data sets - PROC COMPARE. The output though can be hard to read as the differences between values are listed separately for each variable.  It's hard to see the differences across all variables for the same observation. This talk presents a macro to compare two SAS data sets and display the differences in Excel.  PROC COMPARE OUT= option creates an output data set with all the differences.  This data set is then processed with PROC REPORT using ODS EXCEL and colour highlighting to show the differences in an Excel, making the differences easy to see.
Tips and Tricks for Producing Time-Series Cohort Data
Nate Derby
Time-Series cohort data are needed for many applications where you're tracking performance over time. We show some simple coding techniques for effectively and efficiently producing time-series cohort data with SAS®.
Avoid Change Control by Using Control Tables
Frank Ferriola
Developers working on a production process need to think carefully about ways to avoid future changes that require change control, so it's always important to make the code dynamic rather than hardcoding items into the code. Even if you are a seasoned programmer, the hardcoded items might not always be apparent. This paper assists in identifying the harder-to-reach hardcoded items and addresses ways to effectively use control tables within the SAS® software tools to deal with sticky areas of coding such as formats, parameters, grouping/hierarchies, and standardization. The paper presents examples of several ways to use the control tables and demonstrates why this usage prevents the need for coding changes. Practical applications are used to illustrate these examples.
The Power of the Function Compiler: PROC FCMP
Christina Garcia
PROC FCMP, the user-defined function procedure, allows SAS users of all levels to get creative with SAS and expand their scope of functionality. PROC FCMP is the superhero of all SAS functions in its vast capabilities to create and store uniquely defined functions that can later be used in data steps. This paper outlines the basics as well as tips and tricks for the user to get the most out of this procedure.
Creating Viable SAS® Data Sets From Survey Monkey® Transport Files
John R Gerlach
Survey Monkey is an application that provides a means for creating online surveys.  Unfortunately, the transport (Excel) file from this application requires a complete overhaul in order to do any serious data analysis. Besides having a peculiar structure and containing extraneous data points, the column headers become very problematic when importing the file into SAS.  In fact, the initial SAS data set is virtually unusable.  This paper explains a systematic approach for creating a viable SAS data set for doing serious analysis.
Document and Enhance Your SAS® Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata
Roberta Glass and Louise Hadden
Discover how to document your SAS® programs, data sets, and catalogs with a few lines of code that include SAS functions, macro code, and SAS metadata. Do you start every project with the best of intentions to document all of your work, and then fall short of that aspiration when deadlines loom? Learn how your programs can automatically update your processing log. If you have ever wondered who ran a program that overwrote your data, SAS has the answer! And If you don’t want to be tracing back through a year’s worth of code to produce a codebook for your client at the end of a contract, SAS has the answer!
Don’t Get Blindsided by PROC COMPARE
Roger Muller and Josh Horstman
"NOTE: No unequal values were found. All values compared are exactly equal." Do your eyes automatically drop to the end of your PROC COMPARE output in search of these words? Do you then conclude that your data sets match? Be careful here! Major discrepancies may still lurk in the shadows, and you’ll never know about them if you make this common mistake. This paper describes several of PROC COMPARE’s blind spots and how to steer clear of them. Watch in horror as PROC COMPARE glosses over important differences while boldly proclaiming that all is well. See the gruesome truth about what PROC COMPARE does, and what it doesn’t do! Learn simple techniques that allow you to peer into these blind spots and avoid getting blindsided by PROC COMPARE!
A Macro That Can Fix Data Length Inconsistency and Detect Data Type Inconsistency
Ting Sa
Common tasks that we need to perform are merging or appending SAS® data sets. During this process, we sometimes get error or warning messages saying that the same fields in different SAS data sets have different lengths or different types. If the problems involve a lot of fields and data sets, we need to spend a lot of time to identify those fields and write extra SAS codes to solve the issues. However, if you use the macro in this paper, it can help you identify the fields that have inconsistent data type or length issues. It also solves the length issues automatically by finding the maximum field length among the current data sets and assigning that length to the field. An html report is generated after running the macro that includes the information about which fields’ lengths have been changed and which fields have inconsistent data type issues.
Macro Replacer
Rituraj Saxena
For a statistical programmer in the pharmaceutical industry each work day is new. A project you have been working on for a few months can be changed at a moment’s notice and you need to implement changes quickly and accurately. To ensure that the desired changes are done quickly, and most  especially accurately, if the task entails doing  a find and replace sort of thing in all the SAS Programs in a directory (or multiple directories)  a macro called “Replacer”  could come to the rescue.  Process Flow: First, it reads all the SAS programs in a directory one by one and converts every SAS program to a SAS dataset using grepline.  After this, it reads all datasets, one by one , replacing an existing string with the now desired string using if then conditional logic. Finally, it outputs each  updated SAS dataset as a new SAS program at a desired location which has been specified.This macro has multiple parameters which you can specify: the input directory; the output directory ; and the from and to strings which gives the programmer more control over the process. A quick example of  the practical use of the replacer macro is – when making the transition from a Windows  to UNIX Server  we needed  to make sure we changed the path of our init.sas and changed  all forward slashes(/)  to backward slashes (\).Let’s assume we have 100 programs and we decide to do this manually. It can be a cumbersome task and given time constraints, accuracy is not guaranteed. The programmer may end up spending a couple of hours to complete the necessary changes to each program before re-running  all the programs to make sure the appropriate changes have taken place.  Replacer can accomplish this same task in less than 2 minutes.
Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables
Andrea Shane
Data set documentation is essential to good programming and for sharing data set information with colleagues who are not SAS programmers. However, most SAS programmers dislike writing memos which must be updated each time a dataset is manipulated. Utilizing two tools, macro variables and the outer union corresponding set operator in PROC SQL, we can write concise code that exports a single summary table containing important data set information serving in lieu of data memos. These summary tables can contain the following data set information and much more: 1) Report in the change in the number of records in a dataset due to dropping records, collapsing across IDs, removing duplicate records; 2) summary statistics of key variables; and 3) trends across time. This presentation requires some basic understanding of macros and SQL queries.
File Management Using Pipes and X Commands in SAS®
Emily Sisson
SAS for Windows can be an extremely powerful piece of software, not only for analyzing data, but also for organizing and maintaining output and permanent datasets.  By employing pipes and operating system (‘X’) commands within a SAS session, you can easily and effectively manage files of all types stored on your local network.
Handling longitudinal data from multiple sources: experience with analyzing kidney disease patients
Elani Streja and Melissa Soohoo
Analyses in health studies using multiple data sources often come with a myriad of complex issues such as missing data, merging multiple data sources and date matching. Combining multiple data sources is not straight forward, as often times there is discordance or missing information such as dates of birth, dates of death, and even demographic information such as sex, race, ethnicity and pre-existing comorbidities. It therefore becomes essential to document the data source from which the variable information was retrieved. Analysts often rely on one resource as the dominant variable to use in analyses and ignore information from other sources. Sometimes, even the database thought to be the “gold standard” is in fact discordant with other data sources. In order to increase sensitivity and information capture, we have created a source variable, which demonstrates the combination of sources for which the data was concordant and derived.  In our example, we will show how to resolve address information on date of birth, date of death, date of transplant, sex and race combined from 3 data sources with information on kidney disease patients. These 3 sources include: the United States Renal Data System, Scientific Registry of Transplant Recipients, and data from a large dialysis organization. This paper focuses on approaches of handling multiple large databases for preparation for analyses. In addition, we will show how to summarize and prepare longitudinal lab measurements (from multiple sources) for use in analyses.
An Array of Fun: Macro Variable Arrays
Kalina Wong and Sarah Short
When encountering repetitive programming tasks, arrays can be quite useful for reducing unnecessarily long programs.  There are many different styles of arrays and their flavors can range from being eloquently simple to efficiently complex.  The most popular or best-known uses of arrays are in DATA steps where a couple of lines of code can individually process each variable in an array list.  Although valuable, a programmer may be faced with wanting to use that list of variables outside of a DATA step (e.g. in a macro), in which case using macro variable arrays is more appropriate.  With macro variable arrays, an array list is not limited to a single DATA step, but can be applied in different SAS procedures and data processing.  There are many strengths and purposes of using macro variable arrays, but there are also some limitations. We show some examples of macro variable applications, particularly how to use them in visit windowing around consent date when wishing to produce an enrollment-level data set, and using them when processing medication data.
Data Presentation and Reporting
Color, Rank, Count, Name; Controlling it all in PROC REPORT
Art Carpenter
Managing and coordinating various aspects of a report can be challenging.  This is especially true when the structure and composition of the report is data driven.  For complex reports the inclusion of attributes such as color, labeling, and the ordering of items complicates the coding process. Fortunately we have some powerful reporting tools in SAS® that allow the process to be automated to a great extent. In the example presented in this paper we are tasked with generating an EXCEL® spreadsheet that ranks types of injuries within age groups.  A given injury type is to receive a constant color regardless of its rank and the labeling is to include not only the injury label, but the actual count as well.  Of course the user needs to be able to control such things as the age groups, color selection and order, and number of desired ranks.
The Fantastic Four: Running Your Report Using the TABULATE, TRANSPOSE, REPORT, or SQL Procedure
Josh Horstman
Like all skilled tradespeople, SAS® programmers have many tools at their disposal. Part of their expertise lies in knowing when to use each tool. In this paper, we use a simple example to compare several common approaches to generating the requested report: the TABULATE, TRANSPOSE, REPORT, and SQL procedures. We investigate the advantages and disadvantages of each method and consider when applying it might make sense. A variety of factors are examined, including the simplicity, reusability, and extensibility of the code in addition to the opportunities that each method provides for customizing and styling the output. The intended audience is beginning to intermediate SAS programmers.
Something Old, Something New... Flexible Reporting with DATA Step-based Tools
Pete Lund
The report looks simple enough—a bar chart and a table, like something created with the GCHART and REPORT procedures.        But, there are some twists to the reporting requirements that make those procedures not quite flexible enough.  The solution was to mix "old" and "new" DATA step-based techniques to solve the problem.  Annotate datasets are used to create the bar chart and the Report Writing Interface (RWI) to create the table.  Without a whole lot of additional code, an extreme amount of flexibility is gained.

The goal of this paper is to take a specific example of a couple generic principles of programming (at least in SAS®):  

  1.   The tools you choose are not always the most obvious ones – So often, other from habit of comfort level, we get zeroed in on specific tools for reporting tasks.  Have you ever heard anyone say, “I use TABULATE for everything” or “Isn’t PROC REPORT wonderful, it can do anything”? While these tools are great (I’ve written papers on their use), it’s very easy to get into a rut, squeezing out results that might have been done more easily, flexibly or effectively with something else.

2.        It’s often easier to make your data fit your reporting than to make your reporting fit your data – It always takes data to create a report and it’s very common to let the data drive the report development.  We struggle and fight to get the reporting procedures to work with our data.  There are numerous examples of complicated REPORT or TABULATE code that works around the structure of the data. However, the data manipulation tools in SAS (data step, SQL, procedure output) can often be used to preprocess the data to make the report code significantly simpler and easier to maintain and modify.
Proc Document, The Powerful Utility for ODS Output
Roger Muller
The DOCUMENT procedure is a little-known procedure that can save you vast amounts of time and effort when managing the output of your SAS® programming efforts. This procedure is deeply associated with the mechanism by which SAS controls output in the Output Delivery System (ODS). Have you ever wished you didn’t have to modify and rerun the report-generating program every time there was some tweak in the desired report? PROC DOCUMENT enables you to store one version of the report as an ODS Document Object and then call it out in many different output forms, such as PDF, HTML, listing, RTF, and so on, without rerunning the code. Have you ever wished you could extract those pages of the output that apply to certain “BY variables” such as State, StudentName, or CarModel? With PROC DOCUMENT, you have where capabilities to extract these. Do you want to customize the table of contents that assorted SAS procedures produce when you make frames for the table of contents with HTML, or use the facilities available for PDF? PROC DOCUMENT enables you to get to the inner workings of ODS and manipulate them. This paper addresses PROC DOCUMENT from the viewpoint of end results, rather than provide a complete technical review of how to do the task at hand. The emphasis is on the benefits of using the procedure, not on detailed mechanics.  There will be a number of practical applications presented for everyday real life challenges that arise in manipulating output in HTML, PDF and RTF formats.
A SAS macro for quick descriptive statistics
Jenna Carlson
Arguably, the most required table in publications is the description of the sample table, fondly referred to among statisticians as “Table 1”. This table displays means and standard errors, medians and IQRs, and counts and percentages for the variables in the sample, often stratified by some variable of interest (e.g. disease status, recruitment site, sex, etc.). While this table is extremely useful, the construction of it can be time consuming and, frankly, rather boring. I will present two SAS macros that facilitate the creation of Table 1. The first is a “quick and dirty” macro that will output the results for Table 1 for nearly every situation. The second is a “pretty” macro that will output a well formatted Table 1 for a specific situation.
Controlling Colors by Name; Selecting, Ordering, and Using Colors for Your Viewing Pleasure
Art Carpenter
Within SAS® literally millions of colors are available for use in our charts, graphs, and reports.  We can name these colors using techniques which include color wheels, RGB (Red, Green, Blue) HEX codes, and HLS (Hue, Lightness, Saturation) HEX codes.        But sometimes I just want to use a color by name.  When I want purple, I want to be able to ask for purple not CX703070 or H03C5066. But am I limiting myself to just one purple?  What about light purple or pinkish purple.  Do those colors have names or must I use the codes?  It turns out that they do have names.  Names that we can use.  Names that we can select, names that we can order, names that we can use to build our graphs and reports. This paper will show you how to gather color names and manipulate them so that you can take advantage of your favorite purple; be it ‘purple’, ‘grayish purple’, ‘vivid purple’, or ‘pale purplish blue’.    Much of the control will be obtained through the use of user defined formats.  Learn how to build these formats based on a data set containing a list of these colors.
Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE
Steve Cavill
PROC TABULATE is a great tool for generating cross tab style reports.  It's very flexible but has a few annoying limitations.  One is suppressing superfluous subtotals.        The ALL keyword creates a total or subtotal for the categories in one dimension.  However if there is only one category in the dimension, the subtotal is still shown, which is really just repeating the detail line again.  This can look a bit strange. This talk demonstrates a method to suppress those superfluous totals by saving the output from PROC TABULATE using the OUT= option.  That data set is then reprocessed to remove the undesirable totals using the _TYPE_ variable which identifies the total rows.  PROC TABULATE is then run again against the reprocessed data set to create the final table.
Indenting with Style
Bill Coar
Within the pharmaceutical industry, may SAS programmers rely heavily on Proc Report.  While it is used extensively for summary tables and listings, it is more typical that all processing is done prior to final report procedure rather than using some of its internal functionality. In many of the typical summary tables, some indenting is required. This may be required to combine information into a single column in order to gain more printable space (as is the case with many treatment group columns). It may also be to simply make the output more aesthetically pleasing.  A standard approach it to pad a character string with spaces to give the appearance of indenting. This requires pre-processing of the data as well as the use of the ASIS=ON option in the column style.  While this may be sufficient in many cases, it fails for longer text strings that require wrapping within a cell. Alternative approaches that conditionally utilize INDENT and LEFTMARGIN options of a column style are presented. This Quick-tip presentation will describe such options for indenting. Example outputs will be provided to demonstrate the pros and cons of each.  The use of Proc Report and ODS is required in this application using SAS 9.4 in a Windows environment.
SAS® Office Analytics: An Application In Practice Data Monitoring and Reporting Using Stored Process
Mansi Singh, Kamal Chugh, Chaitanya Chowdagam and Smitha Krishnamurthy
Time becomes a big factor when it comes to ad-hoc reporting and real-time monitoring of data while the project work is on full swing. There are always numerous urgent requests from various cross-functional groups regarding the study progress. Typically a programmer has to work on these requests along with the study work which can become stressful. To address this growing need of real-time monitoring of data and to tailor the requirements to create portable reports, SAS® has introduced a powerful tool called SAS Office Analytics. SAS Office Analytics with Microsoft® Add-In provides excellent real-time data monitoring and report generating capabilities with which a SAS programmer can take ad-hoc requests and data monitoring to next level. Using this powerful tool, a programmer can build interactive customized reports as well as give access to study data, and anyone with knowledge of Microsoft Office can then view, customize, and/or comment on these reports within Microsoft Office with the power of SAS running in the background. This paper will be a step-by-step guide to demonstrate how to create these customized reports in SAS and access study data using Microsoft Office Add-In feature.
Getting it done with PROC TABULATE
Michael Williams
The task of displaying statistical summaries of different types of variables in a single table is quite familiar to many SAS users. There are many ways to go about this. PROC FREQ tends to be a favorite for counts and percentages of categorical variables, while PROC MEANS/SUMMARY and PROC UNIVARIATE tend to be preferred for summary statistics of continuous variables. In addition, PROC REPORT has many desirable customizations for displaying data sets. PROC TABULATE combines the computational functionality of FREQ, MEANS/SUMMARY, UNIVARIATE with the customization abilities of PROC REPORT. In this presentation/paper, we will give an overview of PROC TABULATE syntax, and then discuss stylistic customizations, calculating percentages, dealing with missing values, creating and processing PROC TABULATE output data sets.
Data Science
Categorization of Fitbit’s Customer Complaints on Twitter
Jacky Arora and Sid Grover
All companies are trying to be more customer centric and  implementing new measures to enhance the consumer experience. One such measure, recently implemented by many companies is the “Social Media Customer Service”. According to J.D. Power, 67% of consumers have used a company’s social media support page for customer service issue.  It has been reported that consumers can expect a reply within a couple of minutes from the support team regarding their issue. One such company is Fitbit, which has grown quite popular in the recent days has encouraged its customers to generate a buzz in social media by expressing their reviews, discussing the new product launch and utility of Fitbit in customer’s day-to-day life. However, on the flip side, Fitbit’s twitter support page is flooded with issues that consumers are facing while using their products. There is at least one tweet (@FitbitSupport) every minute by an user or by the support team responding to an user’s complaint.. The primary objective of this research is to categorize these complaints and figure out the major issues such as whether it's related to activity tracking or, design or, tech specs or, application interactivity and so on. Because the tweets are model specific, we will compare if the issues are resolved between two generations of the product.
Text Analysis of American Airlines Reviews.
Saurabh Choudhary and Rajesh Tolety
According to a survey report of tripadvisor, about 43% of the airline passenger rely on online reviews of different airlines before booking a ticket. Therefore the nature and the tone of the reviews are important metrics for airlines to track and manage.  We plan to do text analysis of online reviews of American Airlines which runs about 945 flights across 350 destinations. The analysis would help American Airlines company to understand what their passengers are talking about and perhaps take actions to improve their service. We have 38,363 reviews of American Airlines customers from Facebook and twitter. The extracted dataset includes customers’ rating (on a scale of 1-5), date of review, detailed comments and location. We plan to do text analysis as well as supervised sentiment analysis on this dataset.
Reproducible Data Science Using SAS® in a Jupyter Notebook
Hunter Glanz
From state-of-the-art research to routine analytics, the Jupyter Notebook offers an unprecedented reporting medium. Historically tables, graphics, and other output had to be created separately and integrated into a report piece by piece amidst the drafting of the text. The Jupyter Notebook interface allows for the creation of code cells and markdown cells in any kind of arrangement. While the markdown cells admit all the typical sorts of formatting, the code cells can be used to run code within and throughout the document. In this way, report creation happens naturally and in a completely reproducible way. Handing a colleague a Jupyter Notebook file to be re-run or revised is much easier and simpler than passing along at least two files: the code and the text. With the new SAS ® kernel for Jupyter, all of this is possible and more!
Clinton vs. Trump 2016: Analyzing and Visualizing Sentiments towards Hillary Clinton and Donald Trump’s Policies
Sid Grover and Jacky Arora
The United States 2016 presidential election has seen an unprecedented media coverage, numerous presidential candidates and acrimonious debate over wide-ranging topics from candidates of both the republican and the democratic party. Twitter is a dominant social medium for people to understand, express, relate and support the policies proposed by their favorite political leaders. In this paper, we aim to analyze the overall sentiment of the public towards some of the policies proposed by Donald Trump and Hillary Clinton using Twitter feeds. We have started to extract the live streaming data from Twitter. So far, we have extracted about 200,000 twitter feeds accessing the live stream API of Twitter, using a java program mytwitterscraper which is an open source real-time twitter scraper. We will use SAS® Enterprise Miner and SAS® Sentiment Analysis Studio to describe and assess how people are reacting to each candidate’s stand on issues such as immigration, taxes and so on. We will also track and identify patterns of sentiments shifting across time (from March to June) and geographic regions.
Donor Sentiment Analysis of Presidential Primary Candidates Using SAS
Aditya Jakkam and Swetha Nallamala
It is always a big question to everyone who follows the election race, “What makes a donor to donate for primary candidate?” Is it about the background that the candidate is having or is it about the promises which the candidates are making? The reason may be anything but donations make a huge chances for a primary candidate to be a nominee. This paper is a summary of findings of donor behaviors using descriptive analysis of numerical data combined with textual analysis of Twitter feeds. Data about donation information for all presidential candidates democratic and republican are available from the Federal Election Commission database. So far, we have 2,20,000 observations with numerical data that give donation by each individual including their addresses. We will first do descriptive analysis to understand donation patterns both geographically as well as over time. In addition we are writing Python scripts to collect Twitter data. These tweets will be cleaned using Python and then analyzed using SAS Sentiment Studio to figure out sentiments expressed.
Proc DS2: What's in it for you?
Viraj Kumbhakarna
In this paper, we explore advantages of using PROC DS2 procedure over the data step programming in SAS®. DS2 is a new SAS proprietary programming language that is appropriate for advanced data manipulation. We explore use of PROC DS2 to execute queries in databases using FED SQL from within the DS2 program. Several DS2 language elements accept embedded FedSQL syntax, and the run-time generated queries can exchange data interactively between DS2 and supported database. This action enables SQL preprocessing of input tables, which effectively allows processing data from multiple tables in different databases within the same query thereby drastically reducing processing times and improving performance. We explore use of DS2 for creating tables, bulk loading tables, manipulating tables, and querying data in an efficient manner. We explore advantages of using PROC DS2 over data step programming such as support for different data types, ANSI SQL types, programming structure elements, and benefits of using new expressions or writing one’s own methods or packages available in the DS2 system. We also explore high-performance version of the DS2 procedure, PROC HPDS2, and show how one can submit DS2 language statements for execution to either a single machine running multiple threads or to a distributed computing environment, including the SAS LASR Analytic Server thereby massively reducing processing times resulting in performance improvement. The DS2 procedure enables users to submit DS2 language statements from a Base SAS session. The procedure enables requests to be processed by the DS2 data access technology that supports a scalable, threaded, high-performance, and standards-based way to access, manage, and share relational data. In the end, we empirically measure performance benefits of using PROC DS2 over PROC SQL for processing queries in-database by taking advantage of threaded processing in supported data databases such as Oracle.
Social Media, Anonymity, and Fraud: HP Forest Node in SAS® Enterprise Miner™
Taylor Larkin and Denise McManus
With an ever-increasing flow of data from the Web, droves of data are constantly being introduced into the world.  While the internet provides people with an easy avenue to purchase their favorite products or participate in social media, the anonymity of users can make the internet a breeding ground for fraud, the distribution of illegal material, or communication between terrorist cells (Sun, Yang, Wang, & Liu, 2010).  In response, areas such as author identification through online writeprinting are increasing in popularity.   Much like a human fingerprint, writeprinting quantifies the characteristics of a person’s writing style in ways such as the frequency of certain words, the lengths of sentences and their structures, and the punctuation patterns.  The ability to translate someone’s post into quantifiable inputs allows the ability to implement advanced machine learning tools to make predictions as to the identity of potentially malicious users. To demonstrate prediction on this type of problem, the Amazon Commerce Reviews dataset from the UCI Machine Learning Repository is used.  This dataset consists of 1,500 observations, representing 30 product reviews from 50 authors, and 10,000 numeric writeprint variables corresponding to each review. Given that the target variable has 50 classes, only select models are adequate for this prediction task.  Fortunately, with the help of the HP Forest node in SAS® Enterprise Miner™ 13.1, we can apply the popular Random Forest algorithm very efficiently, even in a setting where the number of predictor variables is much larger that the observation count.   Results show that the HP Forest node produces the best results compared to the Decision Tree, MBR, and HP Neural nodes.  The intended audience for this content is anyone who is interested in the implementation of Random Forests in Enterprise Miner.
ePosters
Real-Time Data Center Power Consumption: SAS® Event Stream Processing™
Taylor Anderson and Denise McManus
The intended focus of this research project is to use power consumption metrics at the data center, row, rack, and server levels as a proxy for determining hardware and software failures, network failures, and cyber-attacks.  By using power data at each level of the data center, it should be possible to determine whether or not a system has failed.  Additionally, by using power consumption baselines, it should be possible to determine whether or not a service on a server has stopped running or whether a component is likely to fail.  Likewise, detecting increased and unusual power consumption on a network switch or a combination of switches and a server during a non-peak time could be a sign of a cyber-attack.  First, we will determine the most efficient way to obtain data center power data in real time.  The focus of this phase is to collect data from individual servers, entire racks of servers, entire rows of racks, and the entire data center.  The key success factors for this phase will include collecting and transmitting usable data for analysis.  The second phase of this study is focused on the development of the hardware and software requirements necessary to collect the power data as well as exploration of SAS® Event Stream Processing™ as a tool to perform real time analysis of the power data in an effort to show that real time data center power consumption can be used to detect network outages, cyber-attacks, hardware problems, or perhaps used to strengthen other detection methods.
The SAS® Ecosystem – A Programmer’s Perspective
Thomas Billings
You may encounter people who used SAS® long ago (perhaps in university) or through very limited use in a job. Some of these people with limited knowledge/experience think that the SAS system is “just a statistics package” or “just a GUI”, the latter usually a reference to SAS® Enterprise Guide® or if a dated reference, to (legacy) SAS/AF® or SAS/FSP® applications.  The reality is that the modern SAS system is a very large, complex ecosystem, with hundreds of software products and a diversity of tools for programmers and users.  This poster provides a set of diagrams and tables that illustrate the complexity of the SAS system, from the perspective of a programmer. Diagrams/illustrations that are provided here include: the different environments that program code can run in; cross-environment interactions and related tools; SAS Grid: parallel processing; SAS can run with files in memory – the legacy SAFILE statement and big data/Hadoop; some code can run in-database. We end with a tabulation of the many programming languages and SQL dialects that are directly or indirectly supported within SAS. Hopefully the content of this poster will inform those who think that SAS is an old, dated statistics package or just a simple GUI.
Leadership: More than Just a Position Laws of Programming Leadership
Bill Coar and Amber Randall
Search the internet for statistical programming leadership or programming leadership and you likely find that same same results as us: none with respect to career growth as a SAS programmer. While leadership principles have been more prominent in the American Statistical Association (ASA), not much in terms of leadership has been presented as it relates to SAS programmers.  It is clear that there is room for additional work promoting Statisticians as holding leadership roles
.
In 2007, John C. Maxwell published a book titled “21 Irrefutable Laws of Leadership”. Many of these laws exist as common themes across various books and magazines related to leadership and management. Many industry programmers are eager to grow, yet many are forced into leadership roles without having the opportunity to learn leadership principles. We approach this by considering existing leadership principles from business and economics and emphasize how they can directly apply to SAS programmers taking on leadership roles. The information presented in this poster will encourage much needed discussion on how to promote such principles of leadership and encourage growth of SAS programmers.  This exercise serves as an opportunity to identify core leadership principles from an industry specific (pharmaceutical) point of view.
Animated Graphs for a New Era
Mitchell Collins
As someone studying statistics in the data science era, more and more emphasis is put on illustrious graphs. Data is no longer displayed with a black and white boxplot. Using SAS® MACRO and the Statistical Graphics procedure, you can animate graphs to turn an outdated two variable graph into a graph in motion that shows not only a relation between factors but also a change over time. An even simpler approach for bubble graphs is to use a function in JMP to create colorful moving plots that would typically require many lines of code, with just a few clicks of the mouse.
Sentiment Analysis of Opinions about Self-driving cars
Swapneel Deshpande and Nachiket Kawitkar
Self-driving cars are no longer a futuristic dream. In recent past, Google launched a prototype of the self-driving car while Apple is also developing its own self-driving car. Companies like Tesla have just introduced an Auto Pilot version in their newer version of electric cars which have created quite a buzz in the car market. This technology is said to enable aging or disable people to drive around without being dependent on anyone while also might affecting the accident rate due to human error. But many people are still skeptical about the idea of self-driving cars and that’s our area of interest. In this project, we plan to do sentiment analysis on thoughts voiced by people on the Internet about self-driving cars.  We have obtained the data from http://www.crowdflower.com/data-for-everyone which contain these reviews about the self-driving cars. Our dataset contains 7,156 observations and 9 variables. We plan to do descriptive analysis of the reviews to identify key topics and then use supervised sentiment analysis. We also plan to track and report at how the topics and the sentiments change over time.
An Analysis of the Repetitiveness of Lyrics in Predicting a Song’s Popularity
Drew Doyle
In the interest of understanding whether or not there is a correlation between the repetitiveness of a song’s lyrics and its popularity, the top ten songs from the year-end Billboard Hot 100 Songs chart from 2002 to 2015 were collect. These songs then had their lyrics assessed to determine the count of the top ten words used. These words counts were then used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and if word count is a significant predictor of a song’s popularity. To investigate if song lyrics are becoming more simplistic over time there were several tests completed in order to see if the average word counts have been changing over the years. All analysis was completed in SAS® using various PROCs.
Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL
Drew Doyle
This paper will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and had their chlorine level, temperature, and pH recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were designated as the independent variables collected from each water sample. All data collected was analyzed through various Statistical Analysis System (SAS®) procedures. A partial residual plot was used for each variable to determine possible relationships between the chlorine level and the independent variables. Stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F tests were conducted to determine which of the models appears to be the most useful. There was an analysis of the residual plot, jackknife residuals, leverage values, Cook’s D, PRESS statistic, and normal probability plot of the residuals. Possible outliers were investigated and the critical values for flagged observations were stated along with what problems the flagged values indicate.
A Student’s Declassified Guide to WUSS
Caiti Feeley
This conference provides a range of events that can benefit any and all SAS Users. However, sometimes the extensive schedule can be overwhelming at first glance. With so many things to do and people to see, I have compiled the advice I was given as a novice WUSS and lessons I’ve learned since. This presentation will provide a catalog of tips to make the most out of anyone’s conference experience. From volunteering, to the elementary advice of sitting at a table where you do not know anyone’s name, listeners will be excited to take on all that WUSS offers.
Patients with Morbid Obesity and Congestive Heart Failure Have Longer Operative Time and Room Time in Total Hip Arthroplasty
Yubo Gao
More and more patients with total hip arthroplasty have obesity, and previous studies have shown a positive correlation between obesity and  increased operative time in total hip arthroplasty. But those studies shared the limitation of small sizes. Decreasing operative time and room time is essential to meeting the increased demand for total hip arthroplasty, and factors that influence these metrics should be quantified to allow for targeted reduction in time and adjusted reimbursement models. This study intend to use a multivariate approach to identify which factors increase operative time and room time in total hip arthroplasty. For the purposes of this study, the American College of Surgeons National Surgical Quality Improvement Program database was used to identify a cohort of over thirty thousand patients having total hip arthroplasty between 2006 and 2012. Patient demographics, comorbidities including body mass index, and anesthesia type were used to create generalized linear models identifying independent predictors of increased operative time and room time. The results showed that morbid obesity (body mass index >40) independently increased operative time by 13 minutes and room time 18 by minutes. Congestive heart failure led to the greatest increase in overall room time, resulting in a 20-minute increase. Anesthesia method further influenced room time, with general anesthesia resulting in an increased room time of 18 minutes compared with spinal or regional anesthesia. Obesity is the major driver of increased operative time in total hip arthroplasty. Congestive heart failure, general anesthesia, and morbid obesity each lead to substantial increases in overall room time, with congestive heart failure leading to the greatest increase in overall room time. All analyses are conducted via SAS (version SAS 9.4, Cary, NC).
Using SAS: Monte Carlo Simulations of Manufactured Goods - Should-Cost Models
Cameron Jagoe and Dr. Denise J. McManus
Should cost modeling, or “cleansheeting”, of manufactured goods or services is a valuable tool for any procurement group.  It provides category managers a foundation to negotiate, test and drive value added/value engineering ideas. However, an entire negotiation can be derailed by a supplier arguing that certain assumptions or inputs are not reflective of what they are currently seeing in their plant. The most straightforward resolution to this issue is using a Monte Carlo simulation of the cleansheet.  This enables the manager to prevent any derailing supplier tangents, by providing them with the information in regards to how each input effects the model as a whole, and the resulting costs. In this ePoster, we will demonstrate a method for employing a Monte Carlo simulation on manufactured goods.  This simulation will cover all of the direct costs associated with production, labor, machine, material, as well as the indirect costs, i.e.,  overhead, etc.  Using SAS, this simulation model will encompass 60 variables, from nine discrete manufacturing processes, and will be set to automatically output the information most relevant to the category manager.
Making Prompts Work for You: Using SAS Enterprise Guide Prompts with Categorization of Output
Edward Lan and Kai-Jen Cheng
In statistical and epidemiology units of public health departments, SAS codes are often re-used across a variety of different projects for data cleaning and generation of output datasets from the databases. Each SAS user will copy and paste common SAS codes into their own program and use it to generate datasets for analysis. In order to simplify this process, SAS Enterprise Guide (EG) prompts can be used to eliminate the need for the user to edit the SAS code or copy and paste. Instead, the user will be able to enter the desired directory, date ranges, and desired variables to be included in the dataset. In the event of large datasets, however, it is beneficial for these variables to be grouped into categories instead of having the user individually choose the desired variables or lumping all the variables into the final dataset. Using the SAS EG prompt for static lists where the SAS user selects multiple values, variable categories can be created for selection where groups of variables are selected into the dataset. In this paper for novice and intermediate SAS users, we will discuss how macros and SAS EG prompts, using EG 7.1, can be used to automate the process of generating an output dataset where the user selects a folder directory, date ranges, and categories of variables to be included in the final dataset. Additionally, the paper will explain how to overcome issues with integrating the categorization prompt with generating the output dataset.
Application of Data Mining Techniques for Determining Factors Associated with Overweight and Obesity Among California Adults
Elizabeth Ortega
This paper describes the application of supervised data mining methods using SAS Enterprise Miner 12.3 on data from the 2013-2014 California Health Interview Survey (CHIS), in order to better understand obesity and the indicators that may predict it. CHIS is the largest health survey ever conducted in any state, which samples California households through random-digit-dialing (RDD). EM was used to apply logistic regression, decision trees and neural network models to predict a binary variable, Overweight/Obese Status, which determines whether an individual has a Body Mass Index (BMI) greater than 25. These models were compared to assess which categories of information, such as demographic factors or insurance status, and individual factors like race, best predict whether an individual is overweight/obese or not.
The Orange Lifestyle
Sangar Rane and mohit singhi
Being a freshman at a large university, life can be fun as well as stressful. The choices a freshman makes while in college may impact his/her overall health. In order to examine the overall health and different behaviors of students at Oklahoma State University a survey was conducted among the freshmen students. The survey focused on capturing the psychological, environmental, diet, exercise, alcohol and drug use among students. A total of 790 out of 1,036 freshman students filled the survey which included around 270 questions or items that covered the range of issues mentioned above. An exploratory factor analysis identified 34 possible factors. For example, two factors that relates to the behavior of students under stress are eating and relaxing. Analysis is currently continuing and we hope the results will give us deep insights into the lives of students and thereby help improve the health and lifestyle of students at Oklahoma State University in future years.
You are the Concierge – a CDISC compliant project management tool on SDD
Chen Shi
Background Time is so precious during submission. So many things are happening in such a short period of time. There is definitely a lot workload on the study leads’ shoulder where tons of time is spent on tracking the delivery status and checking the completion against study timeline. This poster is to demonstrate a high-quality tracking and CDISC compliant reporting system for a sample Clinical Study Report (CSR) submission. Objective The objective is to increase the efficiency and quality of the output delivery process, while decreasing the overall time of submission. An Excel tracker will serve as info center backed up with JAVA tools grabbing document status including latest modified timing, upstream/downstream document status, traffic lighting OPenCDISC report status, and other features SAS ® Drug Development (SDD) allows administrator users to monitor such as scheduled batch jobs. A time management sheet on this Excel will also track the programming team delivery hours and link to Microsoft Projector as an estimation of study timeline.
Graphics
Jumping Into SAS ODS Graphics Overnight
Roger Muller
If you are like many SAS users you have worked with the classical "old" SAS graphics procedures for some time and are very comfortable with the code syntax, workflow approach etc that make for reasonably simple creation of presentation graphics. Then all of a sudden, a job requires the capabilities of the procedures in SAS ODS graphics.   At first glance you may be thinking --- "OK, a few more procedures to learn and a little syntax to learn".  Then you realize that moving yourself into this arena is no small task.  This presentation will overview the options and approaches that you might take to get up to speed fast.   Included will be decision trees to be followed in deciding upon a course of action.  This paper contains many examples of very simple ways to get very simple things accomplished. Over 20 different graphs are developed using only a few lines of code each, using data from the SASHELP data sets. The usage of the SGPLOT, SGPANEL, and SGSCATTER procedures are shown. In addition, the paper addresses those situations in which the user must alternatively use a combination of the TEMPLATE and SGRENDER procedures to accomplish the task at hand. Most importantly, the use of ODS Graphics Designer as a teaching tool and a generator of sample graphs and code are covered. A single slide in the presentation overviewing the ODS Designer shows everything needed to generated a very complex graph.  The emphasis in this paper is the simplicity of the learning process. Users will be able to take the included code and run it immediately on their personal machines to achieve an instant sense of gratification. The paper also addresses the "ODS Sandwich" for creating output and the use of Proc Document to manipulate it.
Exploring Multidimensional Data with Parallel Coordinate Plots
Brian Calimlim
Throughout the many phases of an analysis, it may be more intuitive to review data statistics and modeling results as visual graphics rather than numerical tables.  This is especially true when an objective of the analysis is to build a sense of the underlying structures within the data rather than describe the data statistics or model results with numerical precision.  Although scatterplots provide a means of evaluating relationships, its two-dimensional nature may be limiting when exploring data across multiple dimensions simultaneously.  One tool to explore multivariate data is parallel coordinate plots. I will present a method of producing parallel coordinate plots using PROC SGPLOT and will provide examples of when parallel coordinate plots may be very informative.  In particular, I will discuss its application on an analysis of longitudinal observational data and results from unsupervised classification techniques.
Making SAS the Easy Way Out: Harnessing the Power of PROC TEMPLATE to Create Reproducible, Complex Graphs
Debra Goldman
With high pressure deadlines and mercurial collaborators, creating graphs in the most familiar way seems like the best option. Using post-processing programs like Photoshop or Microsoft Powerpoint to modify graphs is quicker and easier to the novice SAS User or for one’s collaborators to do on their own. However, reproducibility is a huge issue in the scientific community. Any changes made outside statistical software need to be repeated when collaborator preferences change, the data changes, the journal requires additional elements, and a host of other reasons The likelihood of making errors increases along with the time spent making the figure. Learning PROC TEMPLATE allows one to seamlessly create complex, automatically generated figures and eliminates the need for post-processing. This paper will demonstrate how to do complex graph manipulation procedures in SAS 9.3 or later to solve common problems, including lattice panel plots for different variables, split plots and broken axes, weighted panel plots, using select observations in each panel, waterfall plots, and graph annotation. The examples presented are healthcare based, but the methods are applicable to finance, business and education. Attendees should have a basic understanding of the macro language, graphing in SAS using SGPLOT, and ODS graphics.
Customizing plots to your heart’s content using PROC GPLOT and the annotate facility
Debbie Huang
PROC GPLOT is a powerful procedure in SAS/GRAPH® for creating two-dimensional graphs such as scatter plots, line plots, and bubble plots. While PROC GPLOT is easy to use, customization is not always straightforward. This paper will introduce PROC GPLOT and the simple customizations of the symbols, axes and legend, then focus on not-so-straightforward customizations such as adding breaks or hashes to the y-axis and labeling sample sizes below the x-axis for data plotted over time, using the annotate facility.
SAS Graphing Done Right: Two Good Alternatives to PROC GPLOT/GCHART, Part 2
Isaiah Lankham
When visualizing data in SAS, many users rely on classic yet quirky SAS/GRAPH procedures like PROC GPLOT and PROC GCHART. Some users have good reasons for sticking with SAS/GRAPH, like creating choropleth maps with PROC GMAP, but most would benefit from switching to alternatives like PROC SGPLOT, which is part of the ODS Graphics functionality in Base SAS, or creating graphics with the R programming language using an interface available in SAS/IML. In this follow-up to an award-winning WUSS 2015 paper, we take a deep dive into PROC SGPLOT, including the combination of PROC TEMPLATE and a special Graph Template Language (GTL) needed to create highly customized plots/charts. We also provide side-by-side instructions for creating comparable visualizations using the highly acclaimed R package ggplot2, along with a discussion of the advantages and disadvantages of each approach. As we'll see, PROC SGPLOT provides an easier initial learning curve, and PROC TEMPLATE and GTL provide a much fuller set of available customization options; however, the combination of PROC IML and the R programming language might provide a shorter path to desired output for some users. This paper is aimed at all intermediate to advanced users of SAS 9.3 (or higher) and assumes basic familiarity with PROC SGPLOT. Also, while SAS University Edition users have access to PROC SGPLOT, its version of SAS/IML does not currently provide an R-language interface.
Customized Flow Charts Using SAS Annotation
Abhinav Srivastva
ABSTRACT: Data visualization is becoming a trend in all sectors where critical business decisions or assessments are made. In pharmaceuticals, flowcharts are used to provide a high-level view of the clinical trials data which could be summarizing patient’s disposition, enrollments and other summary of adverse events, medical history or dosing data. As oppose to most common layouts that SAS can generate using SG procedures or GTL, these flowcharts have non-typical layouts and goes back to the concept of SAS annotation macros and functions. The paper will present a brief discussion on some SAS annotation functions/macros paving the way for a SAS macro that allows user to make the flowcharts customizable as per their needs. The layouts will require the user to have a good understanding of the underlying data and summarization of interest based on which the flowchart will be constructed.
Hands-On Workshops
Building and Using User Defined Formats
Art Carpenter

Formats are powerful tools within the SAS System.  They can be used to change how information is brought into SAS, how it is displayed, and can even be used to reshape the data itself.  The Base SAS product comes with a great many predefined formats and it is even possible for you to create your own specialized formats. This paper will very briefly review the use of formats in general and will then cover a number of aspects dealing with user generated formats.   Since formats themselves have a number of uses that are not at first apparent to the new user, we will also look at some of the broader application of formats.   

Topics include; building formats from data sets, using picture formats, transformations using formats, value translations, and using formats to perform table look-ups.

How to Speed Up Your Validation Process Without Really Trying
Alice Cheng
This paper introduces tips and techniques that can speed up the validation of 2 datasets.  It begins with a brief introduction to PROC COMPARE, then proceeds to introduce some techniques without using automation to that can help to speed up the validation process.  These techniques are most useful when one validates a pair of datasets for the first time.        For the automation part, %QCData is used to compare 2 datasets and %QCDir is used to compare datasets in the production directory against their corresponding datasets in the QC directory. Also introduced is &SYSINFO, a powerful, and extremely useful macro variable which holds a value that summarizes the result of a comparison.
Combining Reports into a Single File Deliverable
Bill Coar
In daily operations of a Biostatistics and Statistical Programming department, we are often tasked with generating reports in the form of tables, listings, and figures (TLFs). A common setting in the pharmaceutical industry is to develop SAS® code in which individual programs generate one or more TLFs in some standard formatted output such as RTF or PDF with a common look and feel. As trends move towards electronic review and distribution, there is an increasing demand for producing a single file as the final deliverable rather than sending each output individually.       

Various techniques have been presented over the years, but they typically require post-processing individual RTF or PDF files, require knowledge base beyond SAS, and may require additional software licenses. The use of item stores as an alternative has been presented more recently. Using item stores, SAS stores the data and instructions used for the creation of each report. Individual item stores are restructured and replayed at a later time within an ODS sandwich to obtain a single file deliverable.  This single file is well structured with either a hyperlinked Table of Contents in RTF or properly bookmarked PDF. All hyperlinks and bookmarks are defined in a meaningful way enabling the end user to easily navigate through the document. This Hands-on-Workshop will introduce the user to creating, replaying, and restructuring item stores to obtain a single file containing a set of tables, listings, and figures. The use of ODS is required in this application using SAS 9.4 in a Windows environment.
Getting your Hands on Contrast and Estimate Statements
Leanne Goldstein
Many SAS users are familiar with modeling with and without random effects through PROC GLM, PROC MIXED, PROC GLIMMIX, and PROC GENMOD. The parameter estimates are great for giving overall effects but analysts will need to use CONTRAST and ESTIMATE statement for digging deeper into the model to answer questions such as: “What is the predicted value of my outcome for a given combination of variables?” “What is the estimated difference between groups at a given time point?” or “What is the estimated difference between slopes for two of three groups?” This HOW will provide a step by step introduction so that the SAS USER will get more comfortable programming ESTIMATE and CONTRAST statements and finding answers to these types of questions. The hands on workshop will focus on statements that can be applied to either fixed effects models or mixed models.
Advanced Programming Techniques with PROC SQL
Kirk Paul Lafler
The SQL Procedure contains a number of powerful and elegant language features for SQL users. This hands-on workshop (HOW) emphasizes highly valuable and widely usable advanced programming techniques that will help users of Base-SAS harness the power of the SQL procedure. Topics include using PROC SQL to identify FIRST.row, LAST.row and Between.rows in BY-group processing; constructing and searching the contents of a value-list macro variable for a specific value; data validation operations using various integrity constraints; data summary operations to process down rows and across columns; and using the MSGLEVEL= system option and _METHOD SQL option to capture vital processing and the algorithm selected and used by the optimizer when processing a query.
How to analyze correlated and longitudinal data?
Niloofar Ramezani
Longitudinal and correlated data are extensively used across disciplines to model changes over time or in clusters. When dealing with these types of data, more advanced models are required to account for correlation among observations. When modeling continuous longitudinal responses, many studies have been conducted using Generalized Linear Models (GLM); however, no extensive studies on discrete responses over time have been completed.  These studies require more advanced models within conditional, transitional and marginal models (Firzmaurice et al., 2009). Examples of these models which enable researchers to account for the autocorrelation among the repeated observations include Generalized Linear Mixed Model (GLMM), Generalized Estimating Equations (GEE), Alternating Logistic Regression (ALR) and Fixed Effects with Conditional Logit Analysis. The purpose of the current study is to assess modeling options for aforementioned models with varying types of responses. This study looks at several methods of modeling binary, categorical and ordinal correlated response variables within regression models. Starting with the simplest case of binary outcomes, through ordinal outcomes, this study looks at different modeling options within SAS including longitudinal cases for each model. At the end, some hierarchical models are also discussed which account for the possible clustering among observations. This presentation introduces different procedures for the aforementioned models and shows the audience how to run them in SAS 9.4. These procedures include PROC LOGISTIC, PROC GLIMMIX, PROC GENMOD, PROC NLMIXED, PROC GEE and PROC PHREG. After exploring different modeling procedures, their strengths and limitations are specified for applied researchers and practitioners and recommendations are provided. The main emphasis of this presentation will be on categorical outcomes due to the lack of extensive studies of models for such response variables.
Industry Solutions
Are you ready for Dec 17th, 2016 - CDISC compliant data submission?
Kevin Lee
Are you ready for Dec 17th, 2016?  

According to FDA Data Standards Catalog v4.4, all clinical trial studies starting after December 17th, 2016 with the exception of certain INDs will be required to have CDISC compliant data. Organizations who are unclear on their compliance status will have their understanding of FDA expectations elucidated in the paper. The paper will show how programmers can interpret and understand the crucial elements of the FDA Data Standards Catalog, which includes support begin date, support end date, requirement begin date and requirement end date of specific standards for both eCTD and CDISC. First, the paper will provide the brief introduction of regulatory recommendation of electronic submission, including methods, five modules in CTD especially m5, technical deficiencies in submission and etc.  The paper will also discuss what programmers need to prepare for the submission according to FDA and CDISC guidelines for CSR, Protocol, SAP, SDTM annotated eCRF, SDTM datasets, ADaM datasets, ADaM datasets SAS® programs and Define.xml.Additionally, the paper will discuss formatting logistics that programmers should be aware of in their preparation of documents, including length, naming conventions and file formats of electronic files.  For examples, SAS data sets should be submitted as SAS transport file formats and SAS programs should be submitted as text format, rather than SAS format.    Finally, based on information from FDA CSS meeting and FDA Study Data Technical Conformance guides v 3.0, the paper will discuss the latest FDA concerns and issues on electronic submission.  This will include the size of SAS data sets, lack of Trial Design dataset(TS) and Define.xml, importance of Reviewer Guide and etc.
SDTM Bookmarking Automation: Leveraging SAS, Ghostscript and Form-Visit Study Data
Nasser Al Ali and Maria Paz
United States Food and Drug Administration (FDA) requires an annotated Case Report Form (aCRF) to be submitted as part of the electronic data submission for every clinical trial.  aCRF is a PDF document that maps the captured data in a clinical trial to their corresponding variable names in the Study Data Tabulation Model (SDTM) datasets. The SDTM Metadata Submission Guidelines recommends that the aCRF should be bookmarked in a specific way.  A one-to-one relationship between the bookmarks and aCRF forms is not typical; one form may have two or more bookmarks.  Therefore, the number of bookmarks can easily reach thousands in any study!  Generating the bookmarks manually is a tedious, time consuming job.  This paper presents an approach to automate the entire bookmark generation process by using SAS® 9.2 and later releases, Ghostscript, a PDF editing tool, and leveraging the linkages between forms and their corresponding visits.  This approach could potentially save tremendous amounts of time and the eyesight of programmers while reducing the potential for human error.
Did the Protocol Change Work? Interrupted Time Series Evaluation for Health Care Organizations.
Carol Conell and Alexander Flint
Background: Analysts are increasingly asked to evaluate the impact of policy and protocol changes in healthcare, as well as in education and other industries.  Often the request occurs after the change is implemented and the objective is to provide an estimate of the effect as quickly as possible. This paper demonstrates how we used time series models to estimate the impact of a specific protocol change using data from the electronic health record (EHR).  Although the approach is well established in econometrics, it remains much less common in healthcare: the paper is designed to make this technique available to intermediate level SAS programmers. Methods: This paper introduces the time series framework, terminology, and advantages to users with no previous experience using time series.  It illustrates how SAS ETS can be used to fit an interrupted time series model to evaluate the impact of a one-time protocol change based on a real-world example from Kaiser Northern California.  Macros are provided for creating a time series database, fitting basic ARMA models using PROC ARIMA, and comparing models.  Once the simple time-series structure is identified for this example, heterogeneity in the effect of the intervention is examined using data from subsets of patients defined by the severity of their presentation.  This shows how the aggregated approach can allow exploring effect heterogeneity. Conclusions: Aggregating data and applying time series methods provides a simple way to evaluate the impact of protocol changes and similar interventions. When the timing of these interventions is well-defined, this approach avoids the need to collect substantial data on individual level confounders and problems associated with selection bias.  If the effect is immediate, the approach requires a very moderate number of time points.
Finding Strategies for Credit Union Growth without Mergers or Acquisitions
Nate Derby
In this era of mergers and acquisitions, community banks and credit unions often believe that bigger is better, that they can't survive if they stay small. Using 20 years of industry data, we disprove that notion for credit unions, showing that even small ones can grow slowly but strongly on their own, without merging with larger ones. We first show how we find this strategy in the data. Then we segment credit unions by size and see how the strategy changes within each segment. Finally, we track the progress of these segments over time and develop a predictive model for any credit union. In the process, we introduce the concept of "High-Performance Credit Unions," which do actions that are proven to lead to credit union growth. Code snippets will be shown for any version of SAS® but will require the SAS/STAT package.
A Case of Retreatment – Handling Retreated Patient Data
Sriramu Kundoor and Sumida Urval
In certain clinical trials, if the study protocol allows, there are scenarios where subjects are re-enrolled into the study for retreatment. As per CDISC guidelines these subjects need to be handled in a manner different from non-retreated subjects. The CDISC SDTM Implementation Guide versions 3.1.2 (Page 29) and 3.2 (Section 4 - page 8) state: “The unique subject identifier (USUBJID) is required in all datasets containing subject-level data. USUBJID values must be unique for each trial participant (subject) across all trials in the submission. This means that no two (or more) subjects, across all trials in the submission, may have the same USUBJID. Additionally, the same person who participates in multiple clinical trials (when this is known) must be assigned the same USUBJID value in all trials.” Therefore a retreated subject cannot have two USUBJIDs in spite of being the same person undergoing the trial phase more than once. This paper describes (with suitable examples) a method of handling retreated subject data in the SDTMs as per CDISC standards, and at the same time capturing it in such a way that it is easy for the programmer or statistician to analyze the data in ADaM datasets. This paper also discusses the conditions that need to be followed (and the logic behind them) while programming retreated patient data into the different SDTM domains.
Why and What Standards for Oncology Studies (Solid Tumor, Lymphoma and Leukemia)?
kevin lee
Each therapeutic area has its own unique data collection and analysis. Oncology especially, has particularly specific standards for collection and analysis of data.  Oncology studies are also separated into one of three different sub types according to response criteria guidelines. The first sub type, Solid Tumor study, usually follows RECIST (Response Evaluation Criteria in Solid Tumor).  The second sub type, Lymphoma study, usually follows Cheson. Lastly, Leukemia study follows study specific guidelines (IWCLL for Chronic Lymphocytic Leukemia, IWAML for Acute Myeloid Leukemia, NCCN Guidelines for Acute Lymphoblastic Leukemia and ESMO clinical practice guides for Chronic Myeloid Leukemia). This paper will demonstrate the notable level of sophistication implemented in CDISC standards, mainly driven by the differentiation across different response criteria.  The paper will specifically show what SDTM domains are used to collect the different data points in each type.  For example, Solid tumor studies collect tumor results in TR and TU and response in RS. Lymphoma studies collect not only tumor results and response, but also bone marrow assessment in LB and FA, and spleen and liver enlargement in PE.  Leukemia studies collect blood counts (i.e., lymphocytes, neutrophils, hemoglobin and platelet count) in LB and genetic mutation as well as what are collected in Lymphoma studies.   The paper will also introduce oncology terminologies (e.g., CR, PR, SD, PD, NE) and oncology-specific ADaM data sets - Time to Event (--TTE) data set.  Finally, the paper will show how standards (e.g., response criteria guidelines and CDISC) will streamline clinical trial artefacts development in oncology studies and how end to end clinical trial artefacts development can be accomplished through this standards-driven process.
Efficacy Endpoint Analysis Dataset Generation with Two-Layer ADaM Design Model
Michael Pannucci and Chengxin Li
In clinical trial data processing, the efficacy endpoints dataset design and implementation are often the most challenging process to standardize. This paper introduces a two-layer ADaM design method for generating an efficacy endpoints dataset and summarizes the practices from past projects.  The two-layer ADaM design method improves not only implementation and review, but validation as well. The method is illustrated with examples.
Strategic Considerations for CDISC Implementation
Amber Randall and Bill Coar
The Prescription Drug User Fee Act (PDUFA) V Guidance mandates eCTD format for all regulatory submissions by May 2017.  The implementation of CDISC data standards is not a one-size-fits-all process and can present both a substantial technical challenge and potential high cost to study teams.  There are many factors that should be considered in strategizing when and how which include timeline, study team expertise, and final goals.  Different approaches may be more efficient for brand new studies as compared to existing or completed studies.  Should CDISC standards be implemented right from the beginning or does it make sense to convert data once it is known that the study product will indeed be submitted for approval?  Does a study team already have the technical expertise to implement data standards?        If not, is it more cost effective to invest in training in-house or to hire contractors?  How does a company identify reliable and knowledgeable contractors?  Are contractors skilled in SAS programming sufficient or will they also need in-depth CDISC expertise? How can the work of contractors be validated?  Our experience as a statistical CRO has allowed us to observe and participate in many approaches to this challenging process.  What has become clear is that a good, informed strategy planned from the beginning can greatly increase efficiency and cost effectiveness and reduce stress and unanticipated surprises.
SDD project management tool real-time and hassle free ---- a one stop shop for study validation and completion rate estimation
Chen Shi

Do you feel sometimes it is like an octopus to work on multiple projects as a lead program or it is hard to monitor what’s going on? Perhaps you know about Murphy’s Law: Anything that can go wrong will go wrong. And you will want to be the first one to know it before anybody else. What’s its impact and what’s the downstream process? After pulling the study submission package up to SDD, we developed a working process which collects status information of each program and output. Then a SAS program will read in the status report of repository documents and update the tracker with

• Timestamp (last modified, last run) of:
o Source and validation program.
o Upstream documents (served as input of the program such as raw data or macros).
o Downstream documents

Features including
•              Pinnacle 21 traffic lighting
•              Pulling time variables from SDD and building the logic (raw<SDTM<ADaM, Source<Validation)
•              Logscan in batch (time estimation on completion)
•              Metadata level checking
•              The workflow of all these above
•              Scheduled job of running the sequenced above tasks
•              Study completion report (and algorithm)

Building Better ADaM Datasets Faster With If-Less Programming
Lei zhang
One of major tasks in building ADaM datasets is to write the SAS code to implement the ADaM variables based on an ADaM specification. SAS programmers often find this task tedious, time-consuming and even prone to error. The main reason that the task seems daunting is because a large number of variables have to be created with if-then-else statements in one or more data steps at the same time for each of ADaM datasets. In order to address this common issue and alleviate the process involved, this paper introduces a small set of data step inline macros that allow programmers to derive most of ADaM variables without using if-then-else statements. With this if-less programming approach, a programmer can not only make a piece of ADaM implementation code easy to read and understand, but also makes it easy to modify along with the evolving ADaM specification, and straight to reuse in the development of other ADaM datasets, or studies. What’s more, this approach can be applied to the derivation of ADaM datasets from both SDTM, and non-SDTM datasets.
Professional Development
What’s Hot – Skills for SAS® Professionals
Kirk Paul Lafler
As a new generation of SAS® user emerges, current and prior generations of users have an extensive array of procedures, programming tools, approaches and techniques to choose from. This presentation identifies and explores the areas that are hot in the world of the professional SAS user. Topics include Enterprise Guide, PROC SQL, PROC REPORT, Output Delivery System (ODS), Macro Language, DATA step programming techniques such as arrays and hash objects, SAS University Edition software, technical support at support.sas.com, wiki-content on sasCommunity.org®, published “white” papers on LexJansen.com, and other venues.
Creating Dynamic Documents with SAS® in the Jupyter Notebook to Reinforce Soft Skills
Hunter Glanz
Experience with technology and strong computing skills continue to be among the most desired qualifications by employers. Programs in Statistics and other especially quantitative fields have bolstered the programming and software training they impart on graduates. But as these skills become more common, there remains an equally important desire for what are often called "soft skills": communication, telling a story, extracting meaning from data. Through the use of SAS® in the Jupyter Notebook traditional programming assignments are easily transformed into exercises involving both analytics in SAS and writing a clear report. Traditional reports become dynamic documents which include both text and living SAS ® code that gets run during document creation. Students should never just be writing SAS ® code again.
Contributing to SAS® By Writing Your Very Own Package
Hunter Glanz
One of the biggest reasons for the explosive growth of R statistical software in recent years is the massive collection of user-developed packages. Each package consists of a number of functions centered around a particular theme or task, not previously addressed (well) within the software. While SAS ® continues to advance on its own, SAS ® users can now contribute packages to the broader SAS ® community. Creating and contributing a package is simple and straightforward, empowering SAS ® users immensely to grow the software themselves. There is a lot of potential to increase the general applicability of SAS ® to tasks beyond statistics and data management, and it's up to you!
Collaborations in SAS Programming; or Playing Nicely with Others
Kristi Metzger and Melissa R. Pfeiffer
SAS programmers rarely work in isolation, but rather are usually part of a team that includes other SAS programmers such as data managers and data analysts, as well as non-programmers like project coordinators. Some members of the team -- including the SAS programmers -- may work in different locations. Given these complex collaborations, it is increasingly important to adopt approaches to work effectively and easily in teams. In this presentation, we discuss strategies and methods for working with colleagues in varied roles. We first address file organization -- putting things in places easily found by team members -- including the importance of numbering programs that are executed sequentially. While documentation is often a neglected activity, we next review the importance of documenting both within SAS and in other forms for the non-SAS users of your team. We also discuss strategies for sharing formats and writing friendly SAS coding for seamless work with other SAS programmers. Additionally, data sets are often in flux, and we talk about approaches that add clarity to data sets and their production. Finally, we suggest tips for double-checking another programmer’s code and/or output, including the importance of confirming the logic behind variable construction and the use of proc compare in the confirmation process. Ultimately, adopting strategies that ease working jointly helps when you have to review work you did in the past and makes for a better playground experience with your teammates.
A Brief Introduction to WordPress for SAS Programmers
Andra Northup
WordPress is a free, open-source platform based on PHP and MySQL used to build websites. It is easy to use with a point-and-click user interface. You can write custom HTML and CSS if you want, but you can also build beautiful webpages without knowing anything at all about HTML or CSS. Features include a plugin architecture and a template system. WordPress is used by more than 26.4% of the top 10 million websites as of April 2016. In fact, SAS® blogs (hosted at http://blogs.sas.com) use the wordPress platform. If you are considering starting a blog to share your love of SAS or to raise the profile of your business and are considering using WordPress, join us for a brief introduction to WordPress for SAS programmers.
How to Be a Successful and Healthy Home-Based SAS Programmer in Pharma/Biotech Industry
Daniel Tsui

Abstract Submission 10 min. Quick Tip Talk WUSS 2016 Educational Forum and Conference September 7-9, 2016 Grand Hyatt San Francisco on Union Square San Francisco, California

How to Be a Successful and Healthy Home-Based SAS Programmer in Pharma/Biotech Industry

Daniel Tsui Parexel International Inc.

With the advancement of technology, the tech industry accepts more and more flexible schedules and telecommuting opportunities. In recent years, more statistical SAS programming jobs in Pharma/Biotech industry have shifted from office-based to home-based. There has been ongoing debates about how beneficial is the shift. A lot of room is still available for discussion about the pros and cons of this home-based model. This presentation is devoted to investigate these pros and cons as home-based SAS programmer within the pharma/biotech industry. The overall benefits have been proposed in a Microsoft whitepaper based on a survey, Work without Walls, which listed the top 10 benefits of working from home from the employee viewpoint, such as work/home balance, avoid traffic, more productive, less distractions, etc. However, to be a successful home-based SAS programmer in the pharma/biotech industry, some enemies have to be defeated, such as 24 hours on call, performance issues, solitude, advancement opportunities, dealing with family, etc. This presentation will discuss some key highlights.
SAS Essentials Workshop
Susan Slaughter
SAS Studio: A New Way to Program in SAS
Lora Delwiche and Susan Slaughter
SAS Studio is an important new interface for SAS, designed for both traditional SAS programmers and for point-and-click users. For SAS programmers, SAS Studio offers many useful features not found in the traditional Display Manager. SAS Studio runs in a web browser. You write programs in SAS Studio, submit the programs to a SAS server, and the results are returned to your SAS Studio session. SAS Studio is included in the license for Base SAS, is the interface for SAS University Edition and is the default interface for SAS OnDemand for Academics. Both SAS University Edition and SAS OnDemand for Academics are free of charge for non-commercial use. With SAS Studio becoming so widely available, this is a good time to learn about it.
An Animated Guide: An introduction to SAS Macro quoting
Russ Lavery
This cartoon like presentation expands materials in a previous paper (that explained how SAS processes Macros) to show how SAS processes macro quoting. It is suggested that the "map of the SAS Supervisor" in this cartoon is a very useful paradigm for understanding SAS macro quoting.

Boxes on the map are either subroutines or storage areas and the cartoon allows you to see "quoted" tokens flow through the components of the SAS supervisor as code executes. Basic concepts for this paper are: 1) the map of the SAS supervisor 2) the idea that certain parts of the map monitor tokens as they pass through 3) the idea of SAS tokens as rule triggers for actions to be taken by parts of the map 4) macro masking prevents recognition of tokens and the triggering of rules 5) the places in the SAS system where unquoting happens.