WUSS 2022 Paper Abstracts

WUSS 2022 will feature nearly 150 paper presentations, posters, and hands-on workshops. Papers are organized into 10 academic sections and cover a variety of topics and experience levels.

Note: This information is subject to change. Last updated 08-Sep-2022.

Advanced Techniques

No. Author(s) Paper Title (click for abstract)
38 Kirk Paul Lafler
& Stephen Sloan
Application of Fuzzy Matching Techniques Using SAS Software – A Panel Discussion
40 Richann Watson
& Louise Hadden
Functions (and More!) on CALL!
46 Jennifer Rosson The Best Kept Secret of Custom Built Tasks Built in SAS® Studio!
48 Derek Morgan Four Different Page Layouts on the Fly with Three Reports on a Page using the Output Delivery System
69 Ronald Fehd Calculating Cardinality Ratio in Two Steps
79 Troy Hughes GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS® PROC GINSIDE
87 Richann Watson What’s Your Favorite Color? Controlling the Appearance of a Graph
88 Richann Watson Have a Date with ISO®? Using PROC FCMP to Convert Dates to ISO 8601
97 Ronald Fehd Introduction To SCL Functions For Macro Programmers
108 Josh Horstman Getting Started with DATA Step Hash Objects
111 Jedediah Teres Your Query’s No Good Here: PROC SQL Code That Doesn’t Work Outside of SAS
112 Becky Lien
& Sara Richter
& Lily Dunk
Let SAS do the work for you: Tips and tricks for turning SAS output into client-ready tables
118 Ekaterina Roudneva Automating Reports Using Macros and Macro Variables
121 Jake Reeser Using Dictionary Tables to Create Dynamic Programs
143 Tasha Chapman Creating a data dictionary using Base SAS®
144 Lisa Mendez An Introduction to SAS® Arrays
158 Charu Shankar Cooking in SAS Viya

Analytics & Statistics

No. Author(s) Paper Title (click for abstract)
47 Alec Zhixiao Lin Attribute Reduction for Continuous Dependent Variables in SAS®
60 Natasha Oza
& Jesse Canchola
Customizable SAS Graphs for Bias Analysis
74 Kodjo Botchway
& Rupom Bhattacherjee
& Xitong Hu
& Jack Pashin
& Goutam Chakraborty
& Prem Bikkina
Evaluating CO2 storage potential of SECARB Offshore Reservoirs and Saline Formations by Employing Data-driven Models with SAS® Viya software
102 Jim Box What’s your model really doing? Understanding human biases in Machine Learning.
105 Jim Box Story telling with SAS Visual Statistics
155 Miriam McGaugh Watching Our Gardens Grow using Social Network Analysis

Beginner’s Techniques

No. Author(s) Paper Title (click for abstract)
25 Kirk Paul Lafler Enhancing Your Skillset with SAS® OnDemand for Academics (ODA) Software
33 Kirk Paul Lafler
& Joshua Horstman
The Battle of the Titans (Part II): PROC TABULATE versus PROC REPORT
39 Kirk Paul Lafler
& Zheyuan Yu
& Shaonan Wang
& Nuoer Lu
& Daniel Qian
& Swallow Yan
Data Access Made Easy Using SAS® Studio
49 Derek Morgan The Essentials of SAS® Dates and Times
52 Derek Morgan PROC SORT (then and) NOW
59 Ron Cody A Survey of Some of the Most Useful SAS Functions
67 Ronald Fehd True is not False: Evaluating Logical Expressions
109 Josh Horstman Using the Output Delivery System to Create and Customize Excel Workbooks
139 Jane Eslinger Generating Simple Statistics with Base SAS Procedures
152 Chris Hemedinger Using Git with Your SAS Projects
163 Chevell Parker Getting Started with the Output Delivery System

Data Management & Administration

No. Author(s) Paper Title (click for abstract)
20 Louise Hadden Putting the Meta into the Data: Managing Data Processing for a Large Scale CDC Surveillance Project with SAS®
54 Daniel Konkler
& Gilbert Ramos
Explore Your Data and Avoid Surprises
70 Lisa Eckler Did the load work?
71 Xiao Qing Wanng
& Sarah Seelye
& Brenda McGrath
& Hallie Prescott
& Theodore Iwashyna
& Elizabeth Viglianti
Using SAS® Gplot Overlay to Effectively Visualize and Compare COVID-19-Sepsis versus Sepsis Post-hospital Discharge Locations Over Time
80 Troy Hughes Calling for Backup When Your One-Alarm Becomes a Two-Alarm Fire: Developing SAS® Data-Driven Concurrent Processing Models through Control Tables and Dynamic Fuzzy Logic
84 Troy Hughes
& Louise Hadden
Should I Wear Pants in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures that Leverage SAS Arrays

HOW

No. Author(s) Paper Title (click for abstract)
37 Kirk Paul Lafler Macro Programming Essentials for New SAS Users
76 Jayanth Iyengar Understanding Administrative Healthcare Data sets using SAS programming tools.
92 Isaiah Lankham
& Matthew Slaughter
Commit early, commit often! A gentle introduction to the joy of Git and GitHub
99 Zeke Torres SASJS the coolest SAS code tool since Proc Sort!
103 Bill Coar ODS Document & Item Stores: A New Beginning
119 Joe Matise Simmering Data: Using Beautiful Soup and Python to scrape data from web pages
124 Josh Horstman Map It Out: Using SG Attribute Maps for Precise Control of PROC SGPLOT Output
137 Jane Eslinger Proc Report Step by Step with Styles

Open Source

No. Author(s) Paper Title (click for abstract)
61 Ronald Fehd Using LaTeX document class sugconf to write your paper
83 Troy Hughes Data-Driven Robotics: Leveraging SAS® and Python to Virtually Build LEGO MINDSTORMS Gear Trains for the EV3 Brick
93 Matthew Slaughter
& Isaiah Lankham
Friends are better with Everything: Using PROC FCMP Python Objects in Base SAS
104 Jim Box
& Samiul Haque
SAS and Open Source Playing Nicely Together
117 Joe Matise Unlocking the Web With Python and SAS: Shortcuts to accessing data using Python and SAS
151 Gowtham Varma Bhupathiraju Data mining for the online retail industry: Customer segmentation and assessment of customers using RFM and k-means
153 Chris Hemedinger Using Visual Studio Code for SAS Programming
156 Miriam McGaugh Making survey systems talk with analytics software: Comparing connections to SAS and SAS Viya

Pharma and Healthcare

No. Author(s) Paper Title (click for abstract)
43 Richann Watson
& Karl Miller
Standardized, Customized or Both? Defining and Implementing (MedDRA) Queries in ADaM Data Sets
44 Richann Watson
& Karl Miller
Standardised MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options
50 Derek Morgan Time Since Last Dose: Anatomy of a SQL Query
63 Lindsey Xie
& Richann Watson
& Jinlin Wang
& Lauren Xiao
Child Data Set: An Alternative Approach for Analysis of Occurrence and Occurrence of Special Interest
89 Oleg Korovyanko Study of cause and effect in medical research via SAS Statistical package
100 Bill Coar Cautionary Notes when Working Interim Data
101 Bill Coar Finding Your Latest Date
110 Yoganand Budumuru Complex heatmaps in Statistical analysis of Biomarkers and cancer genomics – Yoganand Budumuru, IQVIA.
120 Jim Baker
& David Polus
& John Kurtz
The Functional Service Provider Model: A Comprehensive and Collaborative Solution
134 Inka Leprince
& Elizabeth Li
& Carl Chesbrough
TrackCHG: A SAS Macro to Colorize and Track Changes Between Data Transfers in Subject-Level Safety Listings
161 Jim Box
& Matt Becker
Learn the Basics About the Pharmaceutical Industry in 20 Minutes
162 Jim Box
& Matt Becker
Quick Jumpstart into Pharmaceutical SAS Programming

Professional Development

No. Author(s) Paper Title (click for abstract)
58 Carey Smoak So You Want to be a Successful Statistical Programmer?: The Importance of People Skills
107 Josh Horstman
& Richann Watson
Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
122 Janette Garner A Statistical Programmer’s Growth Journey: It is More than Learning New Code
164 Missy Hannah Building a LinkedIn That Stands Out

Programming

No. Author(s) Paper Title (click for abstract)
19 Louise Hadden Form(at) or Function? A Celebratory Exploration of Encoding and Symbology
21 Louise Hadden Designing and Implementing Reporting Meeting Federal Government Accessibility Standards with SAS®
22 Louise Hadden SAS® PROC GEOCODE and PROC SGMAP: The Perfect Pairing for COVID-19 Analyses
24 Kurt Bremser Talking to Your Host Interacting with the Operating System and File System from SAS
27 Kirk Paul Lafler Essential Programming Techniques Every SAS® User Should Learn
28 Kirk Paul Lafler Ten Rules for Better Charts, Figures and Visuals
41 Richann Watson
& Louise Hadden
What Kind of WHICH Do You CHOOSE to be?
42 Richann Watson
& Louise Hadden
“Bored”-Room Buster Bingo – Create Bingo Cards Using SAS® ODS Graphics
45 Jayanth Iyengar If its not broke, don’t fix it; existing code and the programmers’ dilemma
53 Jayanth Iyengar Best Practices for Efficiency and Code Optimization in SAS programming
55 Piotr Krzystek SAS as a Tool in Data Curation: A Case Example with the Inter-university Consortium for Political and Social Research
64 Ronald Fehd A Configuration File Companion: testing and using environment variables and options; templates for startup-only options initstmt and termstmt
65 Ronald Fehd An Autoexec Companion, Allocating Location Names during Startup
115 Drew Metz SAS Log Parsing: SAS Logs without the slog
145 Mathieu Gaoette Hash Tables Like You’ve Never Seen Them Before
146 Tom Kari Using SAS Formats
157 Charu Shankar 5 secrets of the SQL Goddess
159 Mark Jordan A Brief Introduction to DS2
160 Mark Jordan Fun with FILENAMEs

e-Posters

No. Author(s) Paper Title (click for abstract)
26 Kirk Paul Lafler Exploring the Skills Needed by the Data Scientist
77 Jayanth Iyengar From %let To %local; Methods, Use, And Scope Of Macro Variables In Sas Programming
81 Troy Hughes Failure To EXIST: Why Testing Data Set Existence with the EXIST Function Is Inadequate for Serious Software Development in Asynchronous, Multiuser, and Parallel Processing Environments
113 Harshita Budumuru Heatmaps for Hot Housing Markets SAS an ultimate tool for analyzing real estate data
114 Chhavi Nijhawan Analysis of Chemicals in Beauty Products and its Impact on Consumers
116 Paul Silver Generate Complex SAS code from File


Abstracts

Advanced Techniques

38 : Application of Fuzzy Matching Techniques Using SAS Software – A Panel Discussion
Kirk Paul Lafler, sasNerd
Stephen Sloan, Accenture

Data comes in all forms, shapes, and sizes. When consistent and reliable identifiers (or keys) between two or more files exist, SAS® users are typically able to search and match observations without a problem. But, when a unique identifier (or key) is not consistent or reliable then the matching process can become compromised. This panel discussion will explore topics related to what fuzzy matching is, common data issues, popular data cleaning and validation techniques, the application of the five CAT functions, the application of the SOUNDEX (for phonetic matching) algorithm, the SPEDIS, COMPLEV, and COMPGED functions, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully search, merge/join and sort less than perfect or messy data.

40 : Functions (and More!) on CALL!
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

SAS® Functions have deservedly been the focus of many excellent SAS papers. SAS CALL Routines, which rely on and collaborate with SAS Functions, are less well known, although many SAS programmers use these routines frequently. This paper and presentation will look at numerous SAS Functions and CALL Routines, as well as explaining how both SAS Functions and CALL Routines work in practice. There are many areas that SAS CALL Routines cover including CAS (Cloud Analytic Services) specific functions, character functions, character string matching, combinatorial functions, date and time functions, external routines, macro functions, mathematical functions, sort functions, random number functions, special functions, variable control functions, and variable information functions. While there are myriad SAS CALL Routines and SAS Functions, we plan to drill down on character function SAS CALL Routines including string matching; macro, external and special routines; sort routines; random number generation routines; and variable control and information routines. We could go on and on about SAS CALL Routines, but we are going to limit the SAS CALL Routines discussed in this paper, excluding any environment specific SAS CALL Routines such as those designated for use with CAS and TSO, as well as other redundant examples. We hope to demystify SAS CALL Routines by demonstrating real world applications of specific SAS CALL Routines, bringing some amazing capabilities to light.

46 : The Best Kept Secret of Custom Built Tasks Built in SAS® Studio!
Jennifer Rosson, Western Alliance Bank

SAS® Studio is a web-based tool that allows users to run SAS code through the web browser. It also offers several predefined tasks, which allow for point-and-click user interfaces for several analytical procedures. A big surprise to many SAS programmers is that SAS Studio offers the capability to create custom tasks with very little effort. These custom tasks can become powerful tools of automation for many audited processes. Custom tasks can be created to provide a user interface requiring entry of a few parameters that can set off a series of SAS programs fully automated to run, review log files for errors, check data quality, send emails, and publish reports. Custom tasks to automate SAS programs can be created to meet the most stringent data control requirements, including SOX (Sarbanes-Oxley), MRM (Model Risk Management) and FRB (Federal Reserve Board) audit reviews imitating a production environment for publication of financial reports such as CECL (Current Expected Credit Losses), DFAST (Dodd-Frank Act Stress Test), and CCAR (Comprehensive Capital Analysis and Review). With the addition of some well-known Base SAS features to read directories, capture logs and send emails, a custom task can set off a fully functional and controlled production environment with less effort than most imagined.

48 : Four Different Page Layouts on the Fly with Three Reports on a Page using the Output Delivery System
Derek Morgan, Bristol Myers Squibb

This paper details the creation of a ready-to-publish document with multiple data sources and differing page layouts of the same type of information based on page fit. Each set of information is placed on one, two, or three pages depending on fit, and requires three calls to the REPORT procedure using three different data sets. This solution does not involve ODS DOCUMENT. Instead, we calculate how much horizontal and vertical space the report requires and use the macro language to execute code conditionally. This allows us to dynamically determine which page layout to apply, including setting or suppressing page breaks with ODS STARTPAGE=. This solution makes extensive use of inline formatting as well. The output was produced as an RTF file using SAS® University Edition running on a Microsoft Windows 10 machine.

69 : Calculating Cardinality Ratio in Two Steps
Ronald Fehd, Fragile-Free Software

The cardinality of a set is the number of elements in the set. The cardinality of a SAS software data set is the number of observations of the data set, n-obs. The cardinality of a variable in a data set is the number of distinct values (levels) of the variable, n-levels. The cardinality ratio of a variable is n-levels / n-obs; the range of this value is from zero to one. Previous algorithms combined output data sets from the frequency and contents procedures in a data step. This algorithm reduces multiple frequency procedure steps to a single call, and uses scl functions to fetch contents information in the second data step. The output data set, a dimension table of the list of data set variable names, has variable cr-type whose values are in (few, many, unique); this variable identifies the three main types of variables in a data set, few is discrete, many is continuous, and unique is a row-identifier. The purpose of this paper is to provide a general-purpose program, ml-namex.sas, which provides enhanced information about the variables in a data set. The author uses this list-processing program in Fast Data Review, Exploratory Data Analysis (EDA) and in Test-Driven Development (TDD), a discripline of Agile and Extreme Programming.

79 : GIS Challenges of Cataloging Catastrophes: Serving up GeoWaffles with a Side of Hash Tables to Conquer Big Data Point-in-Polygon Determination and Supplant SAS® PROC GINSIDE
Troy Hughes, Datmesis Analytics

The GINSIDE procedure represents the SAS® solution for point-in-polygon determination—that is, given some point on earth, does it fall inside or outside of one or more bounded regions? Natural disasters typify geospatial data—the coordinates of a lightning strike, the epicenter of an earthquake, or the jagged boundary of an encroaching wildfire—yet observing nature seldom yields more than latitude and longitude coordinates. Thus, when the United States Forestry Service needs to determine in what zip code a fire is burning, or when the United States Geological Survey (USGS) must ascertain the state, county, and city in which an earthquake was centered, a point-in-polygon analysis is inherently required. It determines within what boundaries (e.g., nation, state, county, federal park, tribal lands) the event occurred, and confers boundary attributes (e.g., boundary name, area, population) to that event. Geographic information systems (GIS) that process raw geospatial data can struggle with this time-consuming yet necessary analytic endeavor—the attribution of points to regions. This text demonstrates the tremendous inefficiency of the GINSIDE procedure, and promotes GeoWaffles as a far faster alternative that comprises a mesh of rectangles draped over polygon boundaries. This facilitates memoization by running point-in-polygon analysis only once, after which the results are saved to a hash object for later reuse. GeoWaffles debuted in the 2013 white paper Winning the War on Terror with Waffles: Maximizing GINSIDE Efficiency for Blue Force Tracking Big Data (Hughes, 2013), and this text represents an in-memory, hash-based refactoring. All examples showcase USGS tremor data as GeoWaffles tastefully blow GINSIDE off the breakfast buffet—processing coordinates more than 25 times faster than the out-of-the-box SAS solution!

87 : What’s Your Favorite Color? Controlling the Appearance of a Graph
Richann Watson, DataRich Consulting

The appearance of a graph produced by the Graph Template Language (GTL) is controlled by Output Delivery System (ODS) style elements. These elements include fonts and line and marker properties as well as colors. A number of procedures, including the Statistical Graphics (SG) procedures, produce graphics using a specific ODS style template. This paper provides a very basic background of the different style templates and the elements associated with the style templates. However, sometimes the default style associated with a particular destination does not produce the desired appearance. Instead of using the default style, you can control which style is used by indicating the desired style on the ODS destination statement. However, sometimes not a single one of the 50-plus styles provided by SAS® achieves the desired look. Luckily, you can modify an ODS style template to meet your own needs. One such style modification is to control which colors are used in the graph. Different approaches to modifying a style template to specify colors used are discussed in depth in this paper.

88 : Have a Date with ISO®? Using PROC FCMP to Convert Dates to ISO 8601
Richann Watson, DataRich Consulting

We have all had to deal with dates and sometimes determining whether a date is in a day-month or month-day format can leave us confounded. Because of this confusion, CDISC has implemented the use of ISO® 8601 format for datetimes in SDTM domains. However, converting these datetimes from the raw data source to ISO 8601 format is no picnic. While SAS® has many different functions and CALL routines there is no magic function to take raw datetimes and convert them to ISO 8601. Fortunately, SAS allows us to create our own custom functions and subroutines. This paper illustrates a custom function with custom subroutines that takes raw datetimes in various states of completeness and converts them to the proper ISO 8601 format.

97 : Introduction To SCL Functions For Macro Programmers
Ronald Fehd, Fragile-Free Software

Another dialect of SAS software is SAS Component Language (scl). This paper shows scl methods for common tasks in jobs and macro definitions as statements and as functions using macro function %sysfunc.

108 : Getting Started with DATA Step Hash Objects
Josh Horstman, Nested Loop Consulting

The hash object provides a powerful and efficient way to store and retrieve data from memory within the context of a DATA step. This presentation will introduce the hash object, cover its basic syntax and usage, and walk through several examples that demonstrate how it can offer new and innovative solutions to complex coding problems. This presentation is intended for SAS® users who are already proficient with basic DATA step programming.

111 : Your Query’s No Good Here: PROC SQL Code That Doesn’t Work Outside of SAS
Jedediah Teres, MDRC

PROC SQL is based on the 1992 ANSI standard. As a result, SQL implementations based on more recent standards have functions not available in PROC SQL, like the ordered analytic family of functions (also known as “window” functions). Conversely, there are enhancements and extensions to SQL available in PROC SQL that are unique to SAS, such as the OUTER UNION CORRESPONDING operator. This paper describes scenarios in which valid PROC SQL will not work in other SQL implementations such as SQL Server and proposes alternative solutions.

112 : Let SAS do the work for you: Tips and tricks for turning SAS output into client-ready tables
Becky Lien, Professional Data Analysts
Sara Richter, Professional Data Analysts
Lily Dunk, Professional Data Analysts

When putting together results tables for clients, how often have you thought, “There has to be a better way?” There is! Commonly used SAS procedures like PROC FREQ/TABULATE/MEANS/SURVEYFREQ provide a wide range of options to customize table output. Combine those options with ODS OUTPUT statements and the customization possibilities are virtually endless. Utilizing these possibilities, we can prepare Excel tables directly from SAS without needing to manually adjust the output once in Excel. In this brief session we will show examples of tailored SAS output using ODS Excel destination along with corresponding SAS syntax. Examples include using ODS OUTPUT statements within SAS procedures (FREQ/SURVEYFREQ/MEANS) and wrapping the procedure and data steps in a macro for ease of reuse. We also utilize style sheets and conditional formatting to improve the look and readability of the finalized tables. Examples in this session were developed in Base SAS.

118 : Automating Reports Using Macros and Macro Variables
Ekaterina Roudneva, UC Davis

Report generation can involve manually updating code when you get new information. This paper will go over some techniques that can help you automate your reports and remove the need to manually change code when a dataset is updated or when working with different data. Topics include using PROC SQL to create macro variables and data driven programs. This will be applied to a real-world example that automates the creation of a data dictionary and a frequency report of all variables for specified datasets. In the example, macro variables in combination with %DO loops and macro arrays are used to create dynamic code that links variables with defined formats and outputs a report of all frequencies for every single variable. Using macros and %IF %THEN branching logic can further make the code more flexible and validate the input. This paper is intended for an audience that has basic knowledge of the SAS macro language.

121 : Using Dictionary Tables to Create Dynamic Programs
Jake Reeser, NORC

My paper would discuss the dictionary library, and how to utilize some of the tables in this library to write dynamic programs. I will cover what the dictionary library is, how to access the library, useful tables in this library, and how to use these tables to write data driven code. I will specifically look at the two tables dictionary.columns and dictionary.tables, and how to utilize these tables to pass in parameters to macros. This technique allows users to easily call a repeated macro, as well as allows for less user input errors.

143 : Creating a data dictionary using Base SAS®
Tasha Chapman, Oregon Department of Consumer and Business Services

Base SAS contains a number of procedures and options that make documentation of metadata easier than ever. This paper will discuss one method for creating a data dictionary using Base SAS, which will cover PROC CONTENTS, the Output Delivery System (including ODS EXCEL, ODS TRACE, ODS SELECT, and ODS Output), and PROC DATASETS (including Extended Attributes). (Extended Attributes in particular are very cool and not to be missed.)

144 : An Introduction to SAS® Arrays
Lisa Mendez, Emerge Solutions Group

So, you’ve heard about SAS® arrays, but are not sure when – or why – you would use them. This presentation will provide the attendee / reader with a background in SAS arrays, from an explanation as to what occurs during compile time through to their programmatic use, and will include a discussion regarding how DO-loops and macro variables can enhance array usability. Specific examples, including Fahrenheit to Celsius temperature conversion, salary adjustments, and data transposition / counting will assist the user with effective use of SAS arrays in their own work, and provide a few caveats as to their usage, as well.

158 : Cooking in SAS Viya
Charu Shankar, SAS Institute

What does cooking for 300 at a yoga retreat have to do with SAS Viya. Come to this session to learn from the yummy analogies of SAS instructor-yoga teacher & chef, Charu who draws from her experience of cooking in a Bahamas yoga retreat for over 300 guests. Get the distinction between SAS & CAS. Satisfy your curiosity on questions like these. Can I use my existing code in SAS Viya? How to get cloud savvy with CAS-cloud analytic services. If techie terminology gets you all wound up in circles, then this session – deliciously prepared is just for you the SAS 9 programmer. All levels welcome. Some basic knowledge of SAS programming will help you get more value out of this session.

Analytics & Statistics

47 : Attribute Reduction for Continuous Dependent Variables in SAS®
Alec Zhixiao Lin, Southern California Edison

Weight of Evidence (WOE) and Information Value (IV) are the foundation of algorithms for classification. However, its use encounters difficulty in analyzing continuous dependent variables because they cannot be coded in binary values of 0 and 1. Improving upon some earlier attempts to tackle this problem, this paper suggests normalizing continuous values to a probability-like range of 0 to 1. IV can be computed at an aggregated level, same as how a binary outcome is treated. By doing so we can expand the use of WOE and IV to continuous outcomes. A SAS process is provided which will efficiently evaluate the predictive power of independent variables as many as one might have and with minimal handling from the user. The SAS outputs give useful suggestions on segmentation, regressions and machine learning.

60 : Customizable SAS Graphs for Bias Analysis
Natasha Oza, Roche Diagnostics Solution
Jesse Canchola, Roche Diagnostics Solution

Graphical methods for bias analysis are well characterized (Bland and Altman, 1983). The papers by Fernandez and Fernandez (2009) and Johnson and Waller (2018) go into some detail on both the theory and SAS code for creating bias plots. However, the graphical coding is basic and not necessarily tailorable to the user requirements. We introduce a method that produces highly customizable camera-ready graphs that contain details of the bias plot.

74 : Evaluating CO2 storage potential of SECARB Offshore Reservoirs and Saline Formations by Employing Data-driven Models with SAS® Viya software
Kodjo Botchway, Oklahoma State University
Rupom Bhattacherjee, Oklahoma State University
Xitong Hu, Oklahoma State University/Sam’s Club
Jack Pashin, Oklahoma State University
Goutam Chakraborty, Oklahoma State University
Prem Bikkina, Oklahoma State University

The SECARB offshore partnership project seeks to screen deep saline aquifers, and hydrocarbon reservoirs in the central Gulf of Mexico (GOM) for CO2 sequestration and CO2-driven enhanced oil and gas recovery and estimate the corresponding CO2 storage resources for select reservoirs. To this end, three major objectives have been completed: managing geological data from different sources, building a reservoir screening platform for CO2 storage, and ranking the reservoirs based on the estimated storage potential. First, the major geological characteristics of both shelf and deep-water areas of the central GOM were examined and compared to define the appropriate reservoir screening criteria. Consequently, the CO2 storage resources of the screened reservoirs were calculated and reported at the BOEM field level to identify fields with the highest storage potential. In the project’s current phase, the assessment is being expanded to saline formations. Correlations are being identified, tested, and developed for a broad range of rock and fluid properties, including thickness, porosity, permeability, fluid saturation and fluid chemistry. These correlations are developed for interfacial tension and CO2 saturated brine viscosity to improve the storage estimates and consider the capillary trapping and solubility of CO2 in reservoir fluids. SAS® Viya software is used to make visualizations of these properties of the offshore reservoir and make comparisons and draw similarities between the experimental and estimated correlated measures. Using interactive plots, different conditions could be screened, making the analysis more attractive from a user’s point of view.

102 : What’s your model really doing? Understanding human biases in Machine Learning.
Jim Box, SAS Institute

Machine learning and artificial intelligence are having a tremendous impact on our day-to-day lives, covering areas like financial opportunities, health care, job selection and promotion, and even how we interact with vehicles on the street. It’s tempting to think that because a model came up with a suggestion, it must be based on science that has been fairly conducted, but that is far from the case – human biases have tremendous impacts on how machine learning algorithms come up with predictions. As programmers, we have a responsibility to understand the sources and impacts of these biases, and to look at ways we can mitigate the potential harm.

105 : Story telling with SAS Visual Statistics
Jim Box, SAS Institute

SAS Visual Statistics allows you to combine creative data visualizations with advanced modeling techniques to create compelling stories to solve data problems. In this presentation, we’ll combine dashboards, visualizations and statistical analyses to find the root causes for a medical procedure failure.

155 : Watching Our Gardens Grow using Social Network Analysis
Miriam McGaugh, Oklahoma State University

Hofanti Chokma (means “To Grow Well” in Chickasaw) is a systems-change project implemented within the Chickasaw Nation Tribal Communities. Chickasaw Nation encompasses 13 counties but resources are often centralized to select locations. However, they are making a concentrated effort to ensure all children within the Chickasaw Nation communities have the resources they need to grow into happy, healthy and productive citizens. This is accomplished through a comprehensive and systematic change within the early childhood serving communities. Hofanti Chokma provides education, outreach, and/or services to children from birth to 8 years old and their families even if they are not a tribal member but reside within the tribal counties. Evaluation efforts are taking place to ensure objectives are met but it is hard to measure systems across such a vast area. Like a viral social media post, the Hofanti Chokma personnel wanted to see if their efforts were making far reaching impacts within the early childhood communities. In order to ensure this was occurring, we enacted a community survey with a snowball recruitment method. This social network analysis will start with two people at the center of the network and examine how far the ripples go from that central point. SAS Viya will be used to examine the results for this social network analysis.

Beginner’s Techniques

25 : Enhancing Your Skillset with SAS® OnDemand for Academics (ODA) Software
Kirk Paul Lafler, sasNerd

The free cloud-based SAS OnDemand for Academics (ODA) software is an exciting development for SAS users and learners around the world! The software includes Base SAS, SAS Studio, SAS/STAT, SAS/GRAPH, SAS/ETS, SAS/OR, SAS/IML, SAS/QC, SAS/CONNECT, SAS Enterprise Miner, and SAS/ACCESS to PC Files. SAS ODA offers users with extensive learning opportunities to enhance skills for career development and advancement using data access, data manipulation, data management, programming techniques, analytics, data visualization, and statistical analysis capabilities. Topics include an introduction and overview of SAS OnDemand for Academics (ODA) software, demonstration of SAS Studio features, and programming examples to showcase this exciting software suite.

33 : The Battle of the Titans (Part II): PROC TABULATE versus PROC REPORT
Kirk Paul Lafler, sasNerd
Joshua Horstman, Nested Loop Consulting

Should I use PROC REPORT or PROC TABULATE to produce that report? Which one will give me the control and flexibility to produce the report exactly the way I want it to look? Which one is easier to use? Which one is more powerful? WHICH ONE IS BETTER? If you have these and other questions about the pros and cons of the REPORT and TABULATE procedures, this presentation is for you. We will discuss, using real-life report scenarios, the strengths (and even a few weaknesses) of the two most powerful reporting procedures in SAS® (as we see it). We will provide you with the wisdom you need to make that sometimes difficult decision about which procedure to use to get the report you really want and need.

39 : Data Access Made Easy Using SAS® Studio
Kirk Paul Lafler, sasNerd
Zheyuan Yu, MS Biostatistics
Shaonan Wang, Student
Nuoer Lu, Student
Daniel Qian, Student
Swallow Yan, US Education Without Borders

SAS® Studio is a comprehensive and customizable integrated development environment (IDE) for all SAS users. A number of techniques will be introduced to showcase SAS Studio’s ability to access a variety of data files; the application of point-and-click techniques using the Navigation Pane’s Tasks and Utilities; the importation of external delimited text (or tab-delimited), comma-separated values (CSV), and Excel spreadsheet data files by accessing Import Data under Utilities; read JSON data files; and view the results and SAS data sets that are produced. We also provide key takeaways to assist users learn through the application of tips, techniques and effective examples.

49 : The Essentials of SAS® Dates and Times
Derek Morgan, Bristol Myers Squibb

The first thing you need to know is that SAS® software stores dates and times as numbers. However, this is not the only thing that you need to know. This presentation gives you a solid base for working with dates and times in SAS. It introduces you to functions and features that enable you to manipulate your dates and times with surprising flexibility. This paper shows you some of the possible pitfalls with dates (and times and datetimes) in your SAS code and how to avoid them. We show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats and how to use dates and times in TITLE and FOOTNOTE statements. The presentation closes with a brief discussion of Excel conversions.

52 : PROC SORT (then and) NOW
Derek Morgan, Bristol Myers Squibb

The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than the IBM procedure SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data. These days, PROC SORT has fallen out of favor; after all, PROC SQL enables merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: Is the SORT procedure still relevant to any other than the SAS novice or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear “yes”. PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper discusses those enhancements.

59 : A Survey of Some of the Most Useful SAS Functions
Ron Cody, SAS Instructor

SAS® Functions provide amazing power to your DATA step programming. Some of these functions are essential—some of them save you writing volumes of unnecessary code. This talk covers some of the most useful SAS functions. Some of these functions may be new to you and they will change the way you program and approach common programming tasks. The majority of the functions described in this presentation work with character data. There are functions that search for strings, others that can find and replace strings or join strings together. Still others that can measure the spelling distance between two strings (useful for “fuzzy” matching). Some of the newest and most amazing functions are not functions at all, but call routines. Did you know that you can sort values within an observation? Did you know that not only can you identify the largest or smallest value in a list of variables, but you can identify the second or third or nth largest of smallest value? I hope this abstract piques your interest.

67 : True is not False: Evaluating Logical Expressions
Ronald Fehd, Fragile-Free Software

The SAS software language provides methods to evaluate logical expressions which then allow conditional execution of parts of programs. In cases where logical expressions contain combinations of intersection (and), negation (not), and union (or), later readers doing maintenance may question whether the expression is correct. The purpose of this paper is to provide a truth table of Boole’s rules, De~Morgan’s laws, and sql joins for anyone writing complex conditional statements in data steps creating subsets, merges with in=, macros, or procedures with a where clause. In my own work, taking the time to construct a truth table has helped me explain to my customers why their description of a subset is incomplete because of not reviewing the excluded observations.

109 : Using the Output Delivery System to Create and Customize Excel Workbooks
Josh Horstman, Nested Loop Consulting

In years past, SAS® output was limited to the text-based SAS listing. However, the Output Delivery System (ODS) greatly enhanced the capabilities of the SAS system by allowing users to create highly-customizable output in a variety of document formats, including Microsoft Excel® workbooks. This paper provides a brief overview of how to use the ODS EXCEL destination to create excel workbooks and how to customize the various visual attributes of the output such as fonts, colors, styles, and much more.

139 : Generating Simple Statistics with Base SAS Procedures
Jane Eslinger, SAS Institute

Multiple Base SAS procedures generate simple statistics, like mean, min, and max. As a programmer it is always good to know which procedures do what. This paper and presentation compares and contrasts generating simple statistics with the MEANS, UNIVARIATE, TABULATE, and REPORT procedures.

152 : Using Git with Your SAS Projects
Chris Hemedinger, SAS

Few technologies have done more to advance code collaboration and automation than Git. GitHub’s popularity has drawn the attention of all types of programmers including SAS programmers. Many SAS products have direct integration with Git – extending to GitHub. In this session we will cover: What is Git and why do I care? Using Git with SAS Enterprise Guide Using Git with SAS Studio Git functions in Base SAS Where to learn more

163 : Getting Started with the Output Delivery System
Chevell Parker

This document describes techniques used in getting started with the Output Delivery System. Covered are Output Delivery System basics which provides the framework needed for a solid understanding of ODS. Other topics covered include an introduction to the ODS Destinations and procedures and how to effectively get the most out of them. Also covered include using the styling components of ODS which allow the automation of presentational-quality output directly from SAS or Viya. Finally, demonstrated techniques for using ODS in conjunction with the FILENAME statement and various FILENAME access methods such as EMAIL for emailing ODS output.

Data Management & Administration

20 : Putting the Meta into the Data: Managing Data Processing for a Large Scale CDC Surveillance Project with SAS®
Louise Hadden, Abt Associates Inc.

There are myriad epidemiological and surveillance studies ongoing due to the pervasive COVID-19 pandemic, often embodying the definition of “big data” with thousands of participants, variables, and lab samples. Data can be utilized coming from many different streams in a given study, for example: REDCap software, electronic medical records (EMR), chart abstraction, laboratory records, etc. Different contractors can be managing different aspects of the same project, the data is changing minute to minute, and the deliveries are required at a fast and furious pace. Wrangling all the different data sources requires robust data management routines, and SAS® can help, with tools to obtain data via APIs and PROC HTTP, metadata resources, and programming techniques. This paper and presentation will outline best practices for managing multiple aspects of large scale CDC surveillance projects, using SAS.

54 : Explore Your Data and Avoid Surprises
Daniel Konkler, Independent Contractor
Gilbert Ramos, Valleywise Health

At some point or another every SAS programmer is given data to analyze where they had no involvement with its collection. It might come from a formal data management system or from a test file or spreadsheet on a PC. There may be no documentation or the documentation may not be current. This set of macros allows standard and automatic data exploration and documentation. It also provides an easy method to verify analysis tables and files with the optional ‘by’ processing to subset data and the option of adding proc univariate descriptive statistics to the default analysis. All the data is directly from the data source itself so it does not depend on external documentation.

70 : Did the load work?
Lisa Eckler, Lisa Eckler Consulting Inc.

After a series of database tables are loaded from multiple data sources and before using the data to feed automated reports and BI tools, we want to know whether the load was complete and successful. This goes beyond confirming that the jobs ran without errors. The more precise concerns are, “Were all of the data sources ingested? Were the right number of rows of data added or updated in each table? Were all of the appropriate columns populated? Do the data values make sense? Are the values of categorical variables different than expected?”. This paper describes a fully automated process for comparing a new set of data loaded to one or more tables with previous sets to check for reasonableness and completeness and highlight potential problems. It can be used more generally as a data comparison tool.

71 : Using SAS® Gplot Overlay to Effectively Visualize and Compare COVID-19-Sepsis versus Sepsis Post-hospital Discharge Locations Over Time
Xiao Qing Wanng, Michigan Medicine
Sarah Seelye, Veterans Affairs Center for Clinical Management Research, HSR&D Ann Arbor, MI
Brenda McGrath, OCHIN, Inc
Hallie Prescott, Michigan Medicine
Theodore Iwashyna, Michigan Medicine
Elizabeth Viglianti, Michigan Medicine

In a world of Big Data and results-orientated clinical research, complex results can be difficult to absorb quickly and easily. This is particularity true during the COVID-19 pandemic, when vast amounts of data and statistics were made available to the public. Effective data visualization can offer quicker data insights and processing, help better communicate findings, and facilitate meaningful discussions. We aim to illustrate the effectiveness of visualization techniques in understanding the unique outcomes of distinct populations. We do this by creating overlay plots using SAS® Gplot to compare post-hospital discharge locations among U.S. Veterans in 2020 with either A) sepsis and COVID-19+ or B) sepsis-only. The plots show clear proportional differences in post-hospital discharge mortality and readmission into acute care or nursing home facilities over time. Our paper describes the data preparation needed for creating Gplots and assessing differences in patient outcomes. The methods presented here have broad applications for diverse fields. Health care professionals may especially benefit by learning effective visualization tools to understand important clinical outcomes for different patient populations.

80 : Calling for Backup When Your One-Alarm Becomes a Two-Alarm Fire: Developing SAS® Data-Driven Concurrent Processing Models through Control Tables and Dynamic Fuzzy Logic
Troy Hughes, Datmesis Analytics

In the fire and rescue service, a box alarm (or, simply, “alarm”) describes the severity of a fire. As an alarm is elevated from a one-alarm fire to a multi-alarm fire, additional, predetermined resources (e.g., personnel and apparatuses) are summoned to combat the blaze more aggressively. Thus, a five-alarm fire—or its equivalent “five-alarm” Cincinnati chili—represents something extremely hot and dangerous. After extinguishment, and as smoke and embers recede and overhaul begins, fire and rescue resources are released “back into service” so they can be utilized elsewhere if necessary. Essential to managing complex fireground operations, this load balancing paradigm is also common in grid and cloud computing environments in which additional computational resources can be shifted temporarily to an application or process to maximize its performance and throughput. This text instead demonstrates a programmatic approach in which SAS® extract-transform-load (ETL) operations are decomposed and modularized and subsequently directed (for execution) by control tables. If increased throughput is required, additional instances of the ETL program can be invoked concurrently, with each software instance performing various operations on different data sets, thus decreasing overall runtime. A shared control table provides the communications backbone for all SAS sessions by tracking incomplete, in-progress, and completed operations for all data sets. A configuration file allows end users to specify prerequisite processes (that must be completed before an operation can commence), thus facilitating the dynamic fuzzy logic that autonomously selects the specific ETL operations to be executed. This data-driven design approach ensures that the execution of operations can be prioritized, optimized, and, to the extent possible, run in parallel to maximize performance and throughput.

84 : Should I Wear Pants in the Portuguese Expanse? Automating Business Rules and Decision Rules Through Reusable Decision Table Data Structures that Leverage SAS Arrays
Troy Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.

Decision tables operationalize one or more contingencies and the respective actions that should be taken when contingencies are true. Decision tables capture conditional logic in dynamic control tables rather than hardcoded programs, facilitating maintenance and modification of the business rules and decision rules they contain—without the necessity to modify the underlying code (that interprets and operationalizes the decision tables). This text introduces a flexible, data-driven SAS® macro that ingests decision tables—maintained as comma-separated values (CSV) files—into SAS to dynamically write conditional logic statements that can subsequently be applied to SAS data sets. This metaprogramming technique relies on SAS temporary arrays that can accommodate limitless contingency groups and contingencies of any content. To illustrate the extreme adaptability and reusability of the software solution, several decision tables are demonstrated, including those that separately answer the questions Should I wear pants and Where should I travel in the Portuguese expanse? The DECISION_TABLE SAS macro is included and is adapted from the author’s text: SAS® Data-Driven Development: From Abstract Design to Dynamic Functionality (Hughes, 2019).

HOW

37 : Macro Programming Essentials for New SAS Users
Kirk Paul Lafler, sasNerd

The SAS® Macro Language is a powerful tool for extending the capabilities of the SAS System. This hands-on workshop teaches essential macro coding concepts, techniques, tips and tricks to help beginning users learn the basics of how the Macro language works. Using a collection of proven Macro Language coding techniques, attendees learn how to write and process macro statements and parameters; replace text strings with macro (symbolic) variables; generate SAS code using macro techniques; manipulate macro variable values with macro functions; create and use global and local macro variables; construct simple arithmetic and logical expressions; interface the macro language with the SQL procedure; store and reuse macros; troubleshoot and debug macros; and develop efficient and portable macro language code.

76 : Understanding Administrative Healthcare Data sets using SAS programming tools.
Jayanth Iyengar, Data Systems Consultants LLC

Changes in the healthcare industry have highlighted the importance of healthcare data. The volume of healthcare data collected by healthcare institutions, such as providers and insurance companies is massive and growing exponentially. SAS programmers need to understand the nuances and complexities of healthcare data structures to perform their responsibilities. There are various types and sources of administrative healthcare data, which include Healthcare Claims (Medicare, Commercial Insurance, & Pharmacy), Hospital Inpatient, and Hospital Outpatient. This hands-on workshop seminar will give attendees an overview and detailed explanation of the different types of healthcare data, and the SAS programming constructs to work with them. The workshop will engage attendees with a series of SAS exercises involving healthcare datasets.

92 : Commit early, commit often! A gentle introduction to the joy of Git and GitHub
Isaiah Lankham, University of California Office of the President
Matthew Slaughter, Kaiser Permanente Center for Health Research

In recent years, the social coding platform GitHub has become synonymous with open-source software development. Behind the scenes, GitHub uses software called Git, which was originally developed as a distributed version control system for managing contributions of thousands of developers to the Linux kernel. In this hands-on workshop, we’ll introduce you to the joy of Git and GitHub for managing changes to codebases of any size, whether working along or as part of a team. We’ll also practice using the GitHub website, and we’ll utilize Git from the command line within a web-based Google Colab environment. Topics will include basic Git/GitHub concepts, like forking and cloning code repositories, as well as best practices for using branches and commits to create a well-organized history of code changes. We’ll also talk about how to sync code changes between local and remote environments like GitHub. Finally, we’ll use the GitHub web interface for pull requests, which are the standard mechanism for contributing to open-source projects. No knowledge of Git or GitHub will be assumed, and no software will need to be installed. In order to work through interactive examples, accounts will be needed for GitHub and Google. Complete setup steps, including how to setup and practice Git on a local machine after the workshop, will be provided at https://github.com/saspy-bffs/wuss-2022-how

99 : SASJS the coolest SAS code tool since Proc Sort!
Zeke Torres, Code629

Allan Bowe has created SASJS and its a cool open toolset. Learn about this great resource that has a growing community contributing to its features and success. This is ideally a HOW but can be a paper. This will also have supplemental Youtube media and resources delivered in time for the fall conference series.

103 : ODS Document & Item Stores: A New Beginning
Bill Coar, Axio, a Cytel Company

Over the years, there seems to be a constant need to improve processes for creating a single file deliverable from multiple (tens or hundreds?) tables, listings, and figures. However, the use of item stores, ODS Document, and Proc Document are available tools that often go unnoticed. Many options have been presented and thoroughly discussed, but relatively few discuss these techniques that are available with Base SAS. An item store is a SAS library member that consists of pieces of information (ie, procedure output) that can be accessed independently. With item stores, procedure output can be created at one point in time and accessed at a later point in time. Item stores are created using ODS Document statements and accessed using Proc Document. Before diving in to using item stores for combining tables, listings, and figures, we propose heading back to basics with an introduction to item stores, ODS Document, and Proc Document in a more general setting using basic procedures such as Proc Means, Proc Freq, and Proc Univariate. We then take these concepts and extend them to Proc Report, including the use of by-group processing. This Hands-on-Workshop will introduce the user to item stores, ODS Document, and Proc Document. By the end of the workshop, the user will have enough insights to use the technique to obtain a single file containing a set of TLFs. The use of ODS is required in this application using SAS 9.4 in a Windows environment.

119 : Simmering Data: Using Beautiful Soup and Python to scrape data from web pages
Joe Matise, NORC

Ever look at a table on a web page and wished you had it in a data set? You probably took one look at the source of the web page and then decided it wasn’t worth the hassle. SAS has some tools to help, but oftentimes tables have too much complexity to parse without hours of work. Instead, parse the web page on easy mode using Python – even if you don’t know any Python! In this paper, we will show an option using a Python library, Beautiful Soup, that allows users to easily navigate even fairly complex web pages, and quickly pull tables into Pandas DataFrames. Once in a Pandas DataFrame, that data can be easily uploaded to SAS to further processing. This paper can be useful for users at any level of SAS programming, and assumes no knowledge of Python.

124 : Map It Out: Using SG Attribute Maps for Precise Control of PROC SGPLOT Output
Josh Horstman, Nested Loop Consulting

The SGPLOT procedure, part of the ODS Statistical Graphics package, allows for extensive customization of nearly all aspects of plot output. These capabilities are commonly used to distinguish between groups or categories being compared through the use of distinct plot attributes, such as symbols and colors. However, there are times when it is advantageous to be able to associate specific plot attributes with specific data values. SG attribute maps provide functionality that does exactly that. This hands-on workshop will provide an introduction to the use of SG attribute maps in conjunction with PROC SGPLOT. A series of examples will demonstrate how attribute maps are used and why they are useful as a programming tool. Both discrete and range attribute maps will be used to modify a variety of plot attributes, such as plot marker symbols and colors, line styles and fill patterns.

137 : Proc Report Step by Step with Styles
Jane Eslinger, SAS Institute

In this hands-on workshop, you will learn to write PROC REPORT code to control styling aspects of each portion of a report table. Each style option will be explained, then you will spend time writing code to create your own report. WUSS will not be providing laptops. Please bring your own device so you can get the most out of this workshop.

Open Source

61 : Using LaTeX document class sugconf to write your paper
Ronald Fehd, Fragile-Free Software

SAS software user group conferences now accept papers written with the LaTeX document preparation system which produces a .pdf. This paper illustrates use of the LaTeX document class sugconf, shows a basic paper template and provides references to basic and advanced usage of LaTeX.

83 : Data-Driven Robotics: Leveraging SAS® and Python to Virtually Build LEGO MINDSTORMS Gear Trains for the EV3 Brick
Troy Hughes, Datmesis Analytics

LEGO MINDSTORMS Evolution 3 (EV3) represents the third-generation programmable “Brick,” a hand-held computer developed by the LEGO Group that intelligently drives and forms the cornerstone of LEGO robotics. Released in 2013, EV3 leverages LEGO Group-built sensors (including haptic, gyroscopic, ultrasonic, infrared, and others) and servomotors—rotary motors that track speed, degrees, and angle of rotation—to interpret, interact with, and respond to environmental and user stimuli. Although EV3 robotics locomotion begins with large and medium LEGO servomotors, gears and gear trains facilitate complex actions, movements, and the increase of speed or torque. To this end, this paper introduces LEGO gears and simple gear trains, and includes SAS® code that programmatically identifies how (and how well) LEGO gears mesh with each other in a two-dimensional (2D) plane. Data-driven software evaluates a table of 41 LEGO gears and programmatically determines where on a virtual 9×9 LEGO stud plane the gears can be placed to mesh. Moreover, by modifying additional tables, the 9×9 stud plane can be replaced with other LEGO Technic beams (or other bricks) to demonstrate where gears can be placed. Additionally, a FUZZ parameter enables the user to specify the number of millimeters of gap or overlap permitted between gears. This data-driven design maximizes software flexibility and configurability, providing dynamic output to meet the needs of different users by modifying tables and parameters only—not code. Finally, the interoperability of data-driven design is showcased in that equivalent SAS and Python code are included, both of which rely on the same parameters and control tables. For more information, please consult the unabridged 69-page text (https://communities.sas.com/t5/SAS-Communities-Library/Data-Driven-Robotics-Leveraging-SAS-and-Python-Software-to/ta-p/641990) and the 30-minute 4K video (https://youtu.be/rvFS0rj6ml4).

93 : Friends are better with Everything: Using PROC FCMP Python Objects in Base SAS
Matthew Slaughter, Kaiser Permanente Center for Health Research
Isaiah Lankham, University of California Office of the President

Flexibly combining the strengths of SAS and Python allows programmers to choose the best tool for the job and encourages programmers working in different languages to share code and collaborate. Incorporating Python into everyday SAS code opens up SAS users to extensive libraries developed and maintained by the open-source Python community. The Python object in PROC FCMP embeds Python functions within SAS code, passing parameters and code to the Python interpreter and returning the results to SAS. User-defined SAS functions executing Python code can be called from the DATA step or any context where SAS functions are available. User-defined call routines can also be used in DATA steps. This paper provides an overview of the syntax of FCMP Python objects and practical examples of useful applications incorporating Python functions into SAS processes. For example, we will demonstrate incorporating Python packages into SAS code for leveraging complex API calls such as validating email addresses, geocoding street addresses, generating highly formatted excel files, and converting a YAML file into a SAS dataset. All examples from this paper are available at https://github.com/saspy-bffs/wuss-2022-proc-fcmp-python

104 : SAS and Open Source Playing Nicely Together
Jim Box, SAS Institute
Samiul Haque, SAS Institute

Open source languages like R and Python are immensely popular and quite useful. Did you know you could write code blocks of Python and R inside of SAS programs? You can also invoke SAS analytics from open source programs. In this presentation, we will summarize all the ways SAS and open source can be used together to solve problems.

117 : Unlocking the Web With Python and SAS: Shortcuts to accessing data using Python and SAS
Joe Matise, NORC

Learning that you need to access your data through a REST API can be intimidating for data scientists, with complicated syntax, needing to manage authorization keys or login credentials, and no helpful SAS code anywhere to be seen for most. The good news is that you don’t have to be a web guru to use them – you just need to use the right tools. In this paper, we will show examples of using Python pulled directly from API documentation to connect to REST APIs, and then show how to get that data into SAS or Viya. No prior knowledge of Python or REST APIs is expected or required. Most SAS programmers capable of writing basic data step code should understand the level of code we use in this example.

151 : Data mining for the online retail industry: Customer segmentation and assessment of customers using RFM and k-means
Gowtham Varma Bhupathiraju, Oklahoma State University

This study applies to identifying potential wholesalers, providing relevant and timely data to the company. To enable the company to understand its customers and to conduct efficient customer-centric promotion. Based on the Monetary, Frequency, and Recency of customers. Further customers segmented using the k means clustering algorithm into different significant groups, and the primary attributes of customers have been determined in each segment. Accordingly, the company is even more supplied with a set of suggestions on consumer-centric marketing and advertising.

153 : Using Visual Studio Code for SAS Programming
Chris Hemedinger, SAS

Learn how to write your next SAS program in everyone’s favorite new coding tool, VS Code! SAS has published an official VS Code extension for SAS programmers to the Visual Studio Code Marketplace. It’s easy to install and free to use! During this session we’ll show how to get started developing and running SAS programs in VS Code. We’ll go into the special features of this extension and show how to get support from the community of developers who maintain it.

156 : Making survey systems talk with analytics software: Comparing connections to SAS and SAS Viya
Miriam McGaugh, Oklahoma State University

What happens when you have multiple different people writing code in different languages, spread across the country, and all trying to accomplish a portion of a single goal? You first of all, rethink your anything goes policy on programming languages. Then you sit down and make it work. This presentation will walk you through how you can bring information from Qualtrics into SAS 9.4 for analysis through open-source languages like python. Also, a comparison of the same process will be examined using pipelines in SAS Viya.

Pharma and Healthcare

43 : Standardized, Customized or Both? Defining and Implementing (MedDRA) Queries in ADaM Data Sets
Richann Watson, DataRich Consulting
Karl Miller, IQVIA

Investigation of drug safety issues for clinical development will consistently revolve around the experience and impact of important medical occurrences throughout the conduct of a clinical trial. As a first step in the data analysis process, Standardized MedDRA Queries (SMQs), a unique feature of MedDRA, provide a consistent and efficient structure to support safety analysis, reporting, and also address important topics for regulatory and industry users. A variance in working with SMQs is the ability to limit the scope for the analysis need (e.g., “Broad” or “Narrow”) but there is also the ability outside of the specific SMQs in allowing the ability to develop Customized Queries (CQs). With the introduction of the ADaM Occurrence Data Structure (OCCDS) standard structure, the incorporation of these SMQs, along with potential CQs, solidified the need for consistent implementation, not only across studies, but across drug compounds and even within a company itself. Working with SMQs one may have numerous questions: What differentiates the SMQ from a CQ and which one should be used? Are there any other considerations in implementation of the OCCDS standards? Where does one begin? Right here…

44 : Standardised MedDRA Queries (SMQs): Beyond the Basics; Weighing Your Options
Richann Watson, DataRich Consulting
Karl Miller, IQVIA

Ordinarily, Standardised MedDRA Queries (SMQs) aim is to group specific MedDRA terms for a defined medical condition or area of interest at the Preferred Term (PT) level, which most would consider to be the basic use of SMQs. However, what if your study looks to implement the use of SMQs that goes beyond the basic use? Whether grouping through the use of algorithmic searching, using weighted terms or not, or through the use of hierarchical relationships, this paper looks to cover advanced searches that will take you beyond the basics of working with SMQs. Gaining insight to this process will help you become more familiar in working with all types of SMQs and will put you in a position to become the “go-to” person for helping others within your company.

50 : Time Since Last Dose: Anatomy of a SQL Query
Derek Morgan, Bristol Myers Squibb

Even though much of what we need to accomplish SAS programmers can be accomplished using the DATA step, SQL can provide an alternative to large amounts of data manipulation. First, this paper will walk you through how SQL joins two datasets. Second, we will walk through the case of determining time since last (or first) dose, and present an SQL solution in place of multiple SORTs, MERGEs, TRANSPOSEs, and use of the LAG function. Not only is the code economical, but the execution is also economical. This may also help to increase your understanding of how the WHERE statement, and the SQL GROUP BY, and HAVING clauses work.

63 : Child Data Set: An Alternative Approach for Analysis of Occurrence and Occurrence of Special Interest
Lindsey Xie, Kite Pharma, Inc.
Richann Watson, DataRich Consulting
Jinlin Wang, Kite Pharma, Inc
Lauren Xiao, Kite Pharma, Inc.

Due to the various data needed for safety occurrence analyses, the use of a child data set that contains all the data for a given data point aids in traceability and support of the analysis. Adverse Events of Special Interest (AESI) represents adverse events (AEs) that are of particular interest in the study. These AESI could potentially have symptoms associated with them. AESI could be captured as clinical events (CEs) in the CE domain while the associated symptoms for each CE are captured as AEs in the AE domain. The relationship between CEs and associated AE symptoms are important part of safety profile for a compound in clinical trials. These relationships are not always readily evident in the source data or in a typical AE analysis data set (ADAE). The use of a child data set can help demonstrate this relationship, which provides enhanced data traceability to help the review effort for both sponsor and regulatory agency reviewer. This paper provides examples of using a child data set to preserve data with the relationship between CEs and associated AE symptoms in an ADAE child data set (ADAE). In addition, the paper will show that ADAE and ADAE serve as analysis-ready data sets for the summary of AE and AESI.

89 : Study of cause and effect in medical research via SAS Statistical package
Oleg Korovyanko, University of California, Davis

We looked how significantly loneliness and social support scores (LS and SSC in Refs. [Dong 2008, Dong 2009] for seniors is associated with elderly mistreatment (EM). TTEST procedure was applied for social support score (SSC) and loneliness score (LS). We have found that the social support score (SSC) and loneliness score (LS) are significantly different for EM and control groups (both Pr > |t| p-values were <0.0001). After we matched subjects with PSMATCH procedure, we got strong correlation for SSC (Pr > |t| p-value of 0.0034) and much weaker correlation with LS (Pr > |t| p-value of 0.1968). We looked also how autism diagnosis (Dx2) in children correlates with length of breastfeeding (C_BFMo) and corresponding categorical variable C_BFlength (1 = kid was breastfed 0 to 3 months, 2 = 3+ months). We had to limit both Dx2 and C_BFlength to two categories because of the nature of PROC PSMATCH. We analyzed results of CHARGE study [CHARGE 2006] for 1268 children, 778 with ASD diagnosis and 490 children in control TD group. We uncover here the role of socioeconomic factors together with gestational age and parents’ mental health variables on how significantly autism diagnosis (Dx2) is associated with length of breastfeeding (C_BFMo). Therefore, matching propensity scores was tested as a tool to indirectly address causation.

100 : Cautionary Notes when Working Interim Data
Bill Coar, Axio, a Cytel Company

Constructing a hypothesis, conducting an experiment, collecting and analyzing data, and sharing results are fundamental components of the scientific method. In many experiments (such as clinical trials), collecting data may take years. Furthermore, some analyses may be performed along the way with interim data which is incomplete and often erroneous. While there are statistical concerns with analyses and interpretation of interim data, we do not attempt to discuss them here. Rather, we focus on working with interim data as it relates to the conduct of the experiment. Interim data review can be critical to ensuring the experiment goes as expected, mitigating unforeseen circumstances if needed. This helps maintain the integrity of the original experiment. When working with interim/incomplete/sometimes erroneous data, programmers and statisticians should be cautious as definitions of expected data values, expected data structures, and derivations needed in analyses may differ from what may be found at the end of the experiment. Furthermore, programming with interim data should include additional defensive programming efforts as it is nearly impossible to predict what may be seen in future data. In the pharmaceutical industry, interim versions of data may be used to support activities such as manual data review, data cleaning, annual safety updates required by regulatory agencies, and support of Data Monitoring Committees(DMC). Manual data review and data cleaning result in high quality data. Safety updates and support of DMCs are critical to ensure patient safety. Such analyses cannot wait until the end of the experiment, yet the expected data values, expected data structures, and derivations at interim timepoints during study conduct may differ than those used in the final analysis. The focus of this presentation is to emphasize the cautionary use of early and interim data, and lessons learned from many years of supporting DMCs.

101 : Finding Your Latest Date
Bill Coar, Axio, a Cytel Company

During a clinical trial, participants are often required to visit a clinic multiple times throughout the course of treatment as well as for post-treatment follow-up. Some measurements are taken that day, thus have an association with that visit date. Other data, such as events that occurs between visits, will have different dates, usually the dates at which the events occurred. There is often a need to look for the last known data in which data was collected for a patient. This can involve scanning 20-30 different datasets that contain dates, and then identifying the latest date for which we have data. This quick tip presentation will utilize macro programming to compartmentalize the identification of the most recent date, including the dataset and variable from which it was obtained. When using standardized data structures, simple imputation can also be performed for partial dates to provide reasonable approximations of a date when working with interim data. The programming concepts presented in this quick tip can be extended to further automate finding a latest date without specifying dataset or variable names. Programming is done using SAS 9.4M6 in a Windows environment. CDISC standardized data structure will be used to demonstrate the application, though it’s use extends to other data structures.

110 : Complex heatmaps in Statistical analysis of Biomarkers and cancer genomics – Yoganand Budumuru, IQVIA.
Yoganand Budumuru, IQVIA

The National Cancer Institute (NCI) found that cancer patients in the United States bear a huge amount of cancer care costs. In 2019 alone, over $21 billion was wasted towards patients’ cancer care and the costs of cancer treatment continue to rise in future also. Billions of dollars are being spent on clinical trials every year for treating cancer with an average success rate of merely 10%. However, biomarkers offer huge potential to address these failure rates challenge and greatly increase the likelihood of approval fivefold (Discovery, 2021). The emergence of genomic biomarkers is promising the development of oncology trials with increased efficacy and safety changing the conventional clinical trials landscape in modern era of cancer genomics. This Paradigm shift from conventional treatments to targeted therapeutic approaches has lot of implications for all stakeholders within the health care industry. With the evolving Biomarkers, statistical analyses of cancer genomics data have been increasingly complex as Biomarker data are not reported in a consistent manner across clinical trials. Various heatmaps including heat-panels and clustering heatmaps generated by SAS offer an ultimate approach for better understanding the data visualization and patterns analyzing complex and high dimensional biomarkers trial data in drug development industry. This paper will explain the power of Complex Heatmaps programmed in SAS to easily understand and interpret real patterns and associations in multi variate analysis with real-world examples that help a great deal in developing every step of simple or complex heatmaps for statistical analyses of biomarkers/genomic alterations.

120 : The Functional Service Provider Model: A Comprehensive and Collaborative Solution
Jim Baker, K3-Innovations
David Polus, K3-Innovations
John Kurtz, K3-Innovations

The Functional Service Provider model, commonly known as FSP, is more specialized than complete outsourcing and more comprehensive and collaborative than staffing. There are many advantages to a well-run FSP relationship, both to the Sponsor and Provider companies. First and foremost, the project work is completely transparent as the work is performed on the client systems. Intellectual capital is also retained as co-employment issues are mitigated. This paper, drawing on 60+ years of FSP management experience, will explore the strategic advantages to utilization of an FSP approach to biometrics work, including the customization available, setup, maintenance, and ongoing governance. The reader will better understand the difference between staffing and a service solution.

134 : TrackCHG: A SAS Macro to Colorize and Track Changes Between Data Transfers in Subject-Level Safety Listings
Inka Leprince, PharmaStat, LLC
Elizabeth Li, PharmaStat, LLC
Carl Chesbrough, PharmaStat, LLC

Most ongoing clinical trials have a data monitoring committee and/or an internally motivated sponsor who periodically reviews the accumulating safety data of their enrolled subjects. Subject-level listings and patient profiles are essential aids in the review of clinical data. To ease the burden on the data reviewers, sponsors increasingly request color coding outputs in order to easily identify old/deleted records from the previous data transfer and highlight new records from the current data transfer. At PharmaStat, we have developed a SAS macro, TrackCHG, to meet this increasingly popular request for color-coded subject level listings and patient profiles. This paper explains the components and functions of the TrackCHG SAS macro within the context of generating of an adverse event listing. The provided sample datasets and SAS code allow the reader to experiment and implement the TrackCHG macro into existing listing generation programs and produce their own color-coded outputs.

161 : Learn the Basics About the Pharmaceutical Industry in 20 Minutes
Jim Box, SAS Institute
Matt Becker, SAS

The pharmaceutical industry is the discovery, development, and manufacturing of medications by public and private organizations. The creation of these medications spans centuries if not millenniums. But what are some of the key aspects of the pharmaceutical industry that are important to SAS users? In this paper and presentation, we will spend time on the key aspects of the industry and how analytics is prominent in medical research and development.

162 : Quick Jumpstart into Pharmaceutical SAS Programming
Jim Box, SAS Institute
Matt Becker, SAS

SAS analytics is used across many industries: retail, gaming, finance, insurance, and many others. But what is different when it comes to the pharmaceutical industry? What kind of SAS PROCs do we use for baseline, safety, efficacy in clinical development? What do baseline, safety, and efficacy in clinical development mean? In this paper and presentation, we will cover common topics and SAS programming methodologies used in clinical R&D analysis.

Professional Development

58 : So You Want to be a Successful Statistical Programmer?: The Importance of People Skills
Carey Smoak, Retired

Technical programming skills are a must for a statistical programmer! However, I will demonstrate, in this paper, that people skills are at least equally important as technical skills for a successful statistical programmer. The days of simply sitting in a cubicle and programming are fading. Even statistical programmers who work remotely need people skills. In this rapidly changing world of varied communications methods, people skills are an integral part of the success of a statistical programmer. I will challenge the notion that you are born with people skills. Moreover, I will show how people skills can be developed. The savvy statistical programmer would indeed be wise to pay attention to developing their people skills in order to ensure a successful career.

107 : Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
Josh Horstman, Nested Loop Consulting
Richann Watson, DataRich Consulting

While many statisticians and programmers are content in a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. In this paper, two seasoned consultants share their experiences going independent. Topics include the advantages and disadvantages of independent consulting, getting started, finding work, operating your business, and what it takes to succeed. Whether you’re thinking of declaring your own independence or just interested in hearing stories from the trenches, you’re sure to gain a new perspective on this exciting adventure.

122 : A Statistical Programmer’s Growth Journey: It is More than Learning New Code
Janette Garner, Bristol Myers Squibb

Statistical programmers often find themselves on the receiving end of directions provided by other functional groups. In such an environment, what could growth look like? An immediate answer would be to expand one’s programming toolbox. However, growth encompasses more than just technical skills. It also includes a way of thinking and behaving. This introspective paper explores the author’s experiences as a statistical programmer. Various scenarios will be explored, identifying initial thinking and subsequent reflection.

164 : Building a LinkedIn That Stands Out
Missy Hannah, SAS

Find out the best ways to step up your LinkedIn profile with this interactive workshop from an expert on the SAS Corporate Social Media team. Plus, learn tips for using other social media platforms for technology content, not only to improve your digital presence for job searches, but also ways to network and build connections in the tech industry.

Programming

19 : Form(at) or Function? A Celebratory Exploration of Encoding and Symbology
Louise Hadden, Abt Associates Inc.

The concept of encoding is built into SAS® software in a number of forms, including PROC FORMAT, which transforms values in variables for reporting and to create new variables. Similarly, symbol tables are built into SAS software, so as to communicate with different platforms and systems. This quick tip demonstrates how to PROC FORMAT, and by extension PROC FCMP, to create a system to convert user provided text into Morse Code, and then convert that Morse Code word into sounds, all using SAS. This fun exploration is highly informational about the sort sequence used by SAS software on different platforms, as well as demonstrating the use of PROC FORMAT, PROC FCMP, and sound generation in SAS.

21 : Designing and Implementing Reporting Meeting Federal Government Accessibility Standards with SAS®
Louise Hadden, Abt Associates Inc.

SAS® software provides a number of tools with which to build in and enhance accessibility / 508 compliance for data visualization and reporting, including ODS PDF options, ODS HTML5 options, the CALL SLEEP and CALL SOUND routines, and the SAS Graphics Accelerator. As amazing as these tools are, successfully implementing accessible reporting requires planning and design, and some creative use of additional SAS tools. This paper and presentation will demonstrate how to plan for success with accessible visualization design using the above mentioned tools, and more.

22 : SAS® PROC GEOCODE and PROC SGMAP: The Perfect Pairing for COVID-19 Analyses
Louise Hadden, Abt Associates Inc.

The new SAS® mapping procedure PROC SGMAP is adding capability with every release. PROC SGMAP was introduced in SAS 9.4M5 as an extension of (ODS) Graphics techniques to render maps and then overlay plots such as text, scatter, or bubble plots. It has contributed a lot of functionality which used to be reserved for SAS/GRAPH users to BASE SAS – including PROC GEOCODE. PROC GEOCODE has been available in SAS/GRAPH since Version 8.2, and recently became available in BASE SAS with a number of other tools in Version 9.4 Maintenance Release M5. SAS provides a link to files required for street level geocoding and more on SAS MAPSONLINE. The ongoing COVID-19 pandemic has produced massive amounts of epidemiological and surveillance data, much of which can be linked to geography as countries, including the United States, grapple with how to address the constantly transmuting contagion. The combination of PROC SGMAP and PROC GEOCODE is well positioned to help researchers address and visualize COVID-19 data. This paper and presentation will walk through the graphic representation of publicly available COVID data sources using PROC SGMAP and PROC GEOCODE.

24 : Talking to Your Host Interacting with the Operating System and File System from SAS
Kurt Bremser, Retired

The presentation will show how to interact with the file system underneath SAS, using SAS functions to retrieve metadata, scan directories, copying/moving/deleting files, and so on. In the second part, the different methods for running external commands from SAS code are explored, with examples for executing a series of commands in a single step. Products used: Base SAS, with XCMD enabled (for the second part). Intended for users with basic SAS coding knowledge.

27 : Essential Programming Techniques Every SAS® User Should Learn
Kirk Paul Lafler, sasNerd

SAS® software boasts countless functions, algorithms, procedures, options, methods, code constructs, and other features to help users automate and deploy solutions for specific tasks and problems, as well as to access, transform, analyze, and manage data. This paper identifies and shares essential SAS programming techniques that the pragmatic user and programmer should learn. Topics include determining the number of by-group levels that exist within classification variables; data manipulation with the family of CAT functions; sorting data; merging / joining multiple tables of data; performing table lookup operations with user-defined formats; creating single-value and value-list macro variables with PROC SQL; examining and processing the contents of value-list macro variables; determining FIRST, LAST, and Between by-group rows; and using fuzzy matching techniques.

28 : Ten Rules for Better Charts, Figures and Visuals
Kirk Paul Lafler, sasNerd

The production of charts, figures and visuals requires displaying data in the best way possible. However, this process is far from direct or automatic. There are so many different ways to represent the same data – histograms, scatter plots, bar charts, line charts and pie charts, to name just a few. Furthermore, the production of a visual may be perceived very differently depending on the application of various elements including color, shading, gradients and other elements. To ensure the production of the most useful charts, figures and visuals it is essential to adhere to the following guidelines – use the best visual for your data, present information in a non-confusing manner, and is neat and readable allowing information to be easily understood. This presentation highlights the work of Nicolas P. Rougier, Michael Droettboom, and Philip E. Bourne by sharing ten rules to improve the production of charts, figures and visuals.

41 : What Kind of WHICH Do You CHOOSE to be?
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

A typical task for a SAS® practitioner is the creation of a new variable that is based on the value of another variable or string. This task is frequently accomplished by the use of IF-THEN-ELSE statements. However, manually typing a series of IF-THEN-ELSE statements can be time-consuming and tedious, as well as prone to typos or cut and paste errors. Serendipitously, SAS has provided us with an easier way to assign values to a new variable. The WHICH and CHOOSE functions provide a convenient and efficient method for data-driven variable creation.

42 : “Bored”-Room Buster Bingo – Create Bingo Cards Using SAS® ODS Graphics
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.

Let’s admit it! We have all been on a conference call that just … well to be honest, it was just bad. Your misery could be caused by any number of reasons – or multiple reasons! The audio quality was bad, the conversation got sidetracked and focus of the meeting was no longer what it was intended, there could have been too much background noise, someone hasn’t muted their laptop and is breathing heavily – the list goes on ad nauseum. Regardless of why the conference call is less than satisfactory, you want it to end, but professional etiquette demands that you remain on the call. We have the answer – SAS®-generated Conference Call Bingo! Not only is Conference Call Bingo entertaining, but it also keeps you focused on the conversation and enables you to obtain the pertinent information the conference call may offer. This paper and presentation introduce a method of using SAS to create custom Conference Call Bingo cards, moving through brainstorming and collecting entries for Bingo cards, random selection of items, and the production of bingo cards using SAS reporting techniques and the Graphic Template Language (GTL). (You are on your own for the chips and additional entries based on your own painful experiences)! The information presented is appropriate for all levels of SAS programming and all industries.

45 : If its not broke, don’t fix it; existing code and the programmers’ dilemma
Jayanth Iyengar, Data Systems Consultants LLC

In SAS shops and organizational environments, SAS programmers have the responsibility of working with existing processes and SAS code which projects depend on to produce periodic output and results and meet deadlines. Some programming teams still cling to the old adage; if it not broke, don’t fix it. They’ve come to depend on code which runs clean, and is reliable. However, besides processing with no errors and warnings. there other criteria to judge the quality of a SAS program. Programming guidelines dictate that code should be well-documented, readable, and efficient, and conform to best practices. This paper challenges the conventional wisdom that code which works shouldn’t be modified.

53 : Best Practices for Efficiency and Code Optimization in SAS programming
Jayanth Iyengar, Data Systems Consultants LLC

There are multiple ways to measure efficiency in SAS programming; programmers’ time, processing or execution time, memory, input/output (i/o) and storage space considerations. As data sets are growing larger in size, efficiency techniques play a larger and larger role in the programmers’ toolkit. This need has been compounded further by the need to access and process data stored in the cloud, and due to the pandemic as programmers find themselves working remotely in distributed teams. As a criteria to evaluate code, efficiency has become as important as producing a clean log, or expected output. This paper explores best practices in efficiency from a processing standpoint, as well as others.

55 : SAS as a Tool in Data Curation: A Case Example with the Inter-university Consortium for Political and Social Research
Piotr Krzystek, ICPSR at the University of Michigan

The Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan specializes in archiving social science data by utilizing various practices, such as data curation, to make its objectives possible. Data curation is the practice of taking deposited data and documentation, running various forms of quality check to “clean up” the information, and then releasing the materials online for interested individuals to download and analyze on their part. Various software tools and statistical packages are utilized in the data curation process, including SAS. The purpose of this paper is to discuss how SAS is utilized as part of the data curation process in the setting of ICPSR. The planned outline for this paper will include discussing what ICPSR and data curation are, the SAS procedures that are utilized with each step of the data curation process, how issues that may arise in using SAS are resolved, and the potential for SAS in benefiting and improving the data curation process more in the future. The version of SAS utilized to perform the data curation tasks is 9.4, and users do not need to have prior experience with the program to edit and run code.

64 : A Configuration File Companion: testing and using environment variables and options; templates for startup-only options initstmt and termstmt
Ronald Fehd, Fragile-Free Software

The startup process of SAS software reads one or more configuration files, *.cfg, which have allocations of environment variables, the values of which are used in SAS startup-only options to provide access to libraries, sets of folders that contain files that SAS uses for functions, macros, and procedures. This paper provides programmers and advanced users programs to review the default configuration files; procedures, options, and sql to discover options; and a suite of programs to use in Test-Driven Development (TDD) to trace and verify user-written configuration files.

65 : An Autoexec Companion, Allocating Location Names during Startup
Ronald Fehd, Fragile-Free Software

Like other computer languages SAS software provides a method to automatically execute statements during startup of a program or session. This paper examines the names of locations chosen in the filename and libname statements and the placement of those names in options that enable all programs in a project to have standardized access to format and macro catalogs, data sets of function definitions and folders containing reusable programs and macros. It also shows the use of the global symbol table to provide variables for document design. The purpose of this paper is to examine the default values of options, suggest naming conventions where missing, and provide both an example autoexec and a program to test it.

115 : SAS Log Parsing: SAS Logs without the slog
Drew Metz, NORC

As any SAS programmer has undoubtedly noticed, the size of the SAS Log can grow exponentially with the complexity of the program. This is even more true and unpredictable when your SAS program is driven by an external source – such as when an Excel file is read in line by line to control the operations of a program. Picking out useful information from the Log can become burdensome when thousands of lines are generated. Further, there are potentially important issues SAS will flag as NOTES within the Log that can be lost among hundreds of innocuous notes. Programming a SAS Log parser can pay off when the SAS Log is growing too large to manually review. Additionally, SAS has several options and system level Macro variables related to the Log that are useful for interacting with the Log in a meaningful way. There are two key reasons why SAS Log parsing is an approachable task. First, a small bit of SAS code can output the Log as an external text file (ideal for parsing). Second, the patterns of SAS Logs are largely predictable. We know that they adhere to conventions such as errors will always be flagged at the start of the Log line with “ERROR:”. A basic understanding of what messages are output to the Log can be utilized to write a program that reports important information from the Log.

145 : Hash Tables Like You’ve Never Seen Them Before
Mathieu Gaoette, Prospective MG

Hash tables have been a thing in SAS for over 10 years now. They have since been widely adopted by experienced SAS users. Most of these users stick with the data lookup functionality. Therefore, hash objects can be thought of a one trick pony kind of tool. It really has a lot more to offer. This paper will guide you through the different hash object techniques along with progressively more complex examples that will make it both accessible to intermediate users and intriguing to more advance practitioners.

146 : Using SAS Formats
Tom Kari, Tom Kari Consulting

Formats are a secret weapon in SAS, for the initiated. But there are a lot of moving parts…What does a format look like? Where are my formats stored? How can I replace a format? How can I decide which format will be used? Answers to these questions, and a few well-travelled tips for solving common programming problems with formats, will provide some additions to your toolkit that will let you get even more use out of this versatile tool.

157 : 5 secrets of the SQL Goddess
Charu Shankar, SAS Institute

Are you a SQL beginner? Have you wanted a SQL goddess to show you the way 🙂 Look no further, SAS SQL goddess to the rescue! In this session, Charu will use her wisdom culled through teaching SAS for over 15 years. Allow her to distill her SQL knowledge, teaching thousands of learners to bring its pure essence to you so that you don’t have to think about researching best practices. Not only will she share code, but she will also share memory recall techniques so that you will always remember content. Secret #1 the best way to learn & recall the syntax order Secret #2 How SQL works under the covers, Secret #3 how to summarize data using the fancy Boolean Secret #4 Join tables, plus Secret #5 which will be revealed during this presentation. What are you waiting for? Come hang out with the SQL goddess & get all your SQL questions answered. Will be time well spent, we guarantee, otherwise money back.

159 : A Brief Introduction to DS2
Mark Jordan, SAS Institute, Inc.

DS2 is a SAS programming language available in Base SAS 9.4+ and in all versions of SAS Viya. The language is designed for simple, safe parallel processing using an intriguing combination of SQL and traditional DATA step syntax. DS2 is also capable of accessing and processing database high-precision numeric values such as BIGINT and DECIMAL without any loss of precision. The paper discusses the benefits of converting a base SAS DATA step to DS2 when the DATA step process is CPU-bound and shows the amazing boost in processing speed that can be achieved by running the process in CAS on SAS Viya.

160 : Fun with FILENAMEs
Mark Jordan, SAS Institute, Inc.

The FILENAME statement is one of the most versatile tools in the SAS programmer’s toolbox. It provides an easy to use, consistent interface to a wide variety of protocols for accessing and working with data in innovative ways. In this presentation you’ll see how to: Process raw data files directly from a remote FTP server; Process text files directly from a web server; Use PROC HTTP to download any type of file from a web server for local processing; Write to and read from files in ZIP archives; Even have SAS send emails on your behalf.

e-Posters

26 : Exploring the Skills Needed by the Data Scientist
Kirk Paul Lafler, sasNerd

As 2.5 quintillion bytes (1 with 18 zeros) of new data are created each and every day, the age of big data has taken on new meaning. More and more organizations across industries are embracing Data Science / Computer Research Scientist skills resulting in an emerging demand for qualified and experienced talent. According to the Bureau of Labor Statistics (BLS) the number of data science jobs is expected to grow 19 percent over the next two decades – nearly three times as fast as the average growth rate for all jobs. Energized by this employment outlook, students and professionals across job functions are preparing for tomorrow’s growing data science / analytic demands by acquiring a comprehensive skill set. To prepare for this growing demand, many colleges, junior colleges, Universities, and vocational training organizations offer comprehensive degrees and certificate programs to fulfill the increasing demand for analytical skills. This presentation explores the skills needed by the Data Scientist / Analytics professional including non-technical skills such as critical thinking; business acumen and verbal/written communications; and technical skills such as data access; data wrangling; statistics; use of statistical programming languages like SAS®, Python, and R; Structured Query Language (SQL); Microsoft Excel; and data visualization.

77 : From %let To %local; Methods, Use, And Scope Of Macro Variables In Sas Programming
Jayanth Iyengar, Data Systems Consultants LLC

Macro variables are one of the powerful capabilities of the SAS system. Utilizing them makes your SAS code more dynamic. There are multiple ways to define and reference macro variables in your SAS code; from %LET and CALL SYMPUT to PROC SQL INTO. There are also several kinds of macro variables, in terms of scope and other ways. Not every SAS programmer is knowledgeable about the nuances of macro variables. In this paper, I explore the methods for defining and using macro variables. I also discuss the nuances of macro variable scope, and the kinds of macro variables from user-defined to automatic.

81 : Failure To EXIST: Why Testing Data Set Existence with the EXIST Function Is Inadequate for Serious Software Development in Asynchronous, Multiuser, and Parallel Processing Environments
Troy Hughes, Datmesis Analytics

The Base SAS® EXIST function evaluates the existence (or lack thereof) of a SAS data set. Conditional logic routines commonly rely on EXIST to validate data set existence or absence before subsequent (dependent) processes can be dynamically executed, circumvented, or terminated based on business rules. In synchronous software design and where data sets cannot be accessed by other processes or users, EXIST is both a sufficient and reliable solution. However, because EXIST captures only a split-second snapshot of file state, it provides no guarantee of file state persistence. Thus, in asynchronous, multiuser, and parallel processing environments, data set existence can be evaluated by one process and be instantaneously modified thereafter by a concurrent process that creates or deletes the evaluated data set; this creates a race condition in which EXIST technically returns the correct results, but those results are invalidated before subsequent statements can execute. Because of this vulnerability, most classic implementations of the EXIST function within SAS literature are insufficient for evaluating data set existence in these more complex environments and scenarios. This text demonstrates reliable and secure methods to test SAS data set existence prior to performing conditional tasks in asynchronous, multiuser, and parallel processing environments.

113 : Heatmaps for Hot Housing Markets SAS an ultimate tool for analyzing real estate data
Harshita Budumuru, Green Level High School

The pandemic has introduced a new dynamic of societal uncertainty for stakeholders and home buyers fueled by rapid inflation and soaring house prices. The federal reserve has stepped in this past month by increasing interest rates to limit credit flows. Some analysts are predicting another housing bubble while others say logistic growth due to stagnant supply chains. Will this volatile housing market crash or are we in a perpetual state of exponential growth? The question of where and when to invest has become a challenging discussion, but is there a solution? Scientific literature on the use of heat maps for statistical analysis increased in almost every industry and field. Of late, heat maps have become increasingly popular in real estate, especially as a useful analytical tool for making informed decisions in the housing market. This paper aims to illustrate how SAS can be used to generate these heat maps which present complex data as simple visual aids. SAS can aid in the analysis of these hot housing markets to be readily available to entrepreneurs, mortgage institutions, and policy makers. In this article, real estate data will be used to identify key parameters such as employment growth, population growth, first time home buyers, listing prices, listing quantities, rental prices, occupancy rates, and much more. The future of the housing market landscape can be demonstrated and the differences between the spiraling recession and the booming pandemic can be addressed.

114 : Analysis of Chemicals in Beauty Products and its Impact on Consumers
Chhavi Nijhawan, Student

The cosmetic industry is a market of 80.73 billion USD worldwide. The share of makeup in the cosmetics industry is 16%. The predicted worth of skincare in this industry is around 145.2 billion USD. In 2020, the worldwide Cosmetic Chemicals market was estimated to be worth USD 19.38 billion. Today, more than 10,000 chemicals are used to create cosmetics, with only 11 being restricted by the FDA (Food and Drug Administration). With critical ingredients being water, thickeners, colors, and fragrances, there are significant impacts on the skin by the impact these chemical compounds make and are made of. Understanding the many chemicals used daily, we can see their impact on people and figure out alternatives and better chemicals to include in skincare. Cosmetic chemicals combine synthetically created chemical compounds that are the most often utilized components in personal care and cosmetic goods. Popular cosmetic chemicals are colorants, surfactants, rheology control agents, emulsifiers, emollients, and preservatives. Increased consumer awareness of beauty and skincare products and rising demand for goods containing active ingredients are assisting market expansion. The industry is also likely to profit from increased customer demand for natural components, which will open up opportunities for technological innovation. We will analyze the chemicals most commonly found in cosmetics today and gather data from product reviews to determine the effects of these chemicals on consumer skin.

116 : Generate Complex SAS code from File
Paul Silver, NORC at University of Chicago

General Purpose Macro I call RepeatMaskOverFileRows, which reads through a SAS File, and writes a “Mask” contents with each row’s variable values replacing placeholders in the “mask” expression to generate dynamic SAS Code which can be used anywhere in a sas program. instead of coding something like : Proc Sql; select catx(var1,’=’,var2) into :varrenamelist separated by ‘ ‘ from infile; Quit; and then using the macro variable in a code below, you could simply write data a; set b; rename %repeatMaskOverFileRows(infile,_mask=&var1=&var2,_sep_text=%str( )); run; or something similar. Mask can be almost any sas expression or statements that don’t include Macro statement.