WUSS 2023 will feature nearly 100 paper presentations, posters, and hands-on workshops. Papers are organized into 9 academic sections and cover a variety of topics and experience levels.
Note: This information is subject to change. Last updated 27-Oct-2023.
Advanced Programming & Techniques
Analytics & Statistics
Beginner’s Techniques
No. | Author(s) | Paper Title (click for abstract) |
BT-112 | Kirk Paul Lafler & Shaonan Wang & Nuoer Lu & Zheyuan Walter Yu & Daniel Qian & Swallow Xiaozhe Yan |
Data Access Made Easy Using SAS Studio |
BT-130 | Derek Morgan | The Essentials of SAS Dates and Times |
BT-193 | Kirk Paul Lafler | The 5 CATs in the Hat Sleek Concatenation String Functions |
Data Management & Administration
Hands-on Workshops
No. | Author(s) | Paper Title (click for abstract) |
HOW-111 | Kirk Paul Lafler & Stephen Sloan |
Application of Fuzzy Matching Techniques Using SAS Software |
HOW-134 | Louise Hadden | Zip Code 411: An In-Depth Approach to Analyzing, Visualizing and Reporting on Zip Code Level Data |
HOW-155 | Kirk Paul Lafler & Richann Watson & Josh Horstman & Charu Shankar |
The Battle of the Titans: DATA Step versus PROC SQL |
HOW-163 | Isaiah Lankham & Matthew Slaughter |
Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Getting Started) |
HOW-198 | Josh Horstman | Getting Started with the SGPLOT Procedure |
HOW-208 | Bart Jablonski | Share your code with SAS Packages – a Hands-on-Workshop |
HOW-224 | Wendy Christensen | Open-Source Essentials HOW – R and RStudio |
HOW-225 | Zeke Torres | SAS Users GIT Started! A HOW Introduction to GIT. |
Open Source
Pharma and Healthcare
No. | Author(s) | Paper Title (click for abstract) |
PH-116 | Hongbo Li | Another Glance at Good Programming Practice from the Perspective of FDA Reviewers |
PH-160 | Matt Zhou & Hui Zhou & Jaejin An |
A SAS Macro for 30-Year Cardiovascular Risk Predication |
PH-173 | Yanwei Han | How to Understand Therapeutic Area User Guide for Reactogenicity Events in Vaccine Studies |
PH-180 | Yoganand Budumuru | Demystifying the define.xml: Overcoming the challenges of CRT Package. |
PH-192 | Sarvar Khamidov | Navigating SAS(r) and CDISC Certification with Apprenticeship |
PH-194 | Bill Coar | Quality Control – Defining an Acceptable Quality Standard without Achieving Perfection |
PH-206 | Ballari Sen | Generation of AE (Adverse Events) summary tables by worst CTC Grade utilizing SAS |
PH-218 | Matt Becker | Visualizing Insights, Empowering Discoveries: SAS Viya Unleashed in Life Science Analytics |
Professional Development
No. | Author(s) | Paper Title (click for abstract) |
PD-106 | Thomas Mannigel | Mining for SAS Gold |
PD-123 | Stephen Sloan | Developing and running an in-house SAS Users Group |
PD-132 | Derek Morgan | Effective Presentations: More than Just PowerPoint |
PD-199 | Josh Horstman & Richann Watson |
Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream |
PD-202 | Mahsa Tahmasebi Ghorabi & Leon Davoody |
A Beginner’s Step-by-Step Guide to Digital Marketing Data Mining and Analysis Using SAS |
e-Posters
Abstracts
Advanced Programming & Techniques
APT-101 : Going Command(o): Power(Shell)ing Through Your Workload
Richann Watson, DataRich Consulting
Louise Hadden, Abt Associates Inc.
Thursday, 10:00 AM – 10:50 AM, Location: America’s Cup B/C
Simplifying and streamlining workflows is a common goal of most programmers. The most powerful and efficient solutions may require practitioners to step outside of normal operating procedures and outside of their comfort zone. Programmers need to be open to finding new (or old) techniques to achieve efficiency and elegance in their code: SAS by itself may not provide the best solutions for such challenges as ensuring that batch submits preserve appropriate log and lst files; documenting and archiving projects and folders; and unzipping files programmatically. In order to adhere to such goals as efficiency and portability, there may be times when it is necessary to utilize other resources, especially if colleagues may need to perform these tasks without the use of SAS software. These and other data management tasks may be performed via the use of tools such as command-line interpreters and Windows PowerShell (if available to users), used externally and within SAS software sessions. We will also discuss the use of additional tools, such as WinZip , used in conjunction with the Windows command-line interpreter.
APT-104 : Calculate Physical Length of String in RTF file with Times New Roman’ by SAS
Yueming Wu, Astex Pharmaceuticals
Steven Li, Medtronic PLC.
Thursday, 9:00 AM – 9:20 AM, Location: Coronado D
While using PROC REPORT with ODS RTF for output table/listing generation, people often wish to know the Physical Length’ of a string, instead of the number of characters in the string. The Physical Length’ of a string is the length which is displayed on the RTF output file. The physical length of a string could be used to decide the column’s minimum width without word wrapping up. This paper created a custom SAS function through PROC FCMP to calculate the physical length of either Times’, or Times New Roman’ font, and a concept of characteristic width’ of character is introduced.
APT-105 : 10 Tips for Getting Tipsy with SAS
Lisa Mendez, Catalyst Clinical Research
Richann Watson, DataRich Consulting
Thursday, 11:00 AM – 11:20 AM, Location: America’s Cup B/C
There are many useful tips that do not warrant a full paper but put some cool SAS code together and you get a cocktail of SAS code goodness. This paper will provide ten great coding techniques that will help enhance your SAS programs. We will show you how to 1) add quotes around each tokens in a text string, 2) create column headers based values using Transpose Procedure (PROC TRANSPOSE), 3) set missing values to zero without using a bunch of if-then-else statements, 4) using short-hand techniques for assigning lengths of consecutive variables and for initializing to missing, 5) calculating the difference between the visit’s actual study day and the target study and accounting for no day zero (0); 6) use SQL Procedure (PROC SQL) to read variables from a data set into a macro variable, 7) use the XLSX engine to read data from multiple Microsoft Excel Worksheets, 8) test to see if a file exists before trying to open it, such as an Microsoft Excel file, 9) using the DIV function to divide so you don’t have to check if the value is zero, and 10) use abbreviations for frequently used SAS code.
APT-107 : How to Assemble Macros from Metadata and Execute them
Karen Walker, Walker Consulting LLC
Thursday, 8:30 AM – 8:50 AM, Location: America’s Cup B/C
This paper dives into the world of Natural Language processing using Macros built from metadata text stored in a database. It will examine use of multiple types of program code, while leveraging the capability of SAS Studio. Be ready to see under the distributive processing veil. Be ready to build parameterized routine calls from the SAS/TOOLKIT, the X statement, X %SYSSEXEC Macro, PIPES, and an assortment of File Descriptors.
APT-109 : The Great Escape(char)
Louise Hadden, Abt Associates Inc.
Wednesday, 2:00 PM – 2:50 PM, Location: America’s Cup B/C
SAS provides programmers with many ways to enhance ODS output in addition to the use of both SAS-supplied and user-written ODS styles. Inline formatting of titles, footnotes, text fields, and table cells. Formatting of data within the DATA step and SAS procedures can be accomplished by using ODS ESCAPECHAR. Combining user-defined formats with ODS ESCAPECHAR allows users to use one variable to format another for effective traffic-lighting. A quick overview of syntax for some of the many possibilities for enhancing ODS output with ODS ESCAPECHAR currently available in SAS 9.4M7 will be presented, including underlines, pre- and post-images, special functions, line feeds, super- and subscripts.
APT-121 : Running Parts of a SAS Program while Preserving the Entire Program
Stephen Sloan, Accenture
Tuesday, 3:30 PM – 3:50 PM, Location: America’s Cup B/C
The Challenge: We have long SAS programs that accomplish a number of different objectives. We often only want to run parts of the programs while preserving the entire programs for documentation or future use. Some of the reasons for selectively running parts of a program are: Part of it has run already and the program timed out or encountered an unexpected error. It takes a long time to run so we don’t want to re-run the parts that ran successfully. We don’t want to recreate data sets that were already created. This can take a considerable amount of time and resources and can also occupy additional space while the data sets are being created. We only need some of the results from the program currently, but we want to preserve the entire program. We want to test new scenarios that only require subsets of the program.
APT-128 : Creating Dated Archive Folders Automatically with SAS
Derek Morgan, Bristol Myers Squibb
Wednesday, 11:30 AM – 11:50 AM, Location: Coronado E
When creating patient profiles, it can be useful for clinical scientists to compare current data with previous data in real time without having to request those data from an Information Technology (IT) source. This is a method for using SAS to perform the archiving via a daily scheduled job. The primary advantage of SAS over an operating script is its date handling ability, removing many difficult calculations in favor of intervals and functions. This paper details an application that creates dated archive folders and copies SAS data sets into those dated archives, with automated aging and deletion of old data and folders. The application allows clinical scientists to customize their archive frequency (within certain limits.) It also keeps storage requirements to a minimum as defined by IT. This replaced a manual process that required study programmers to create the archives, eliminating the possibility of missed or incorrectly dated archives.
APT-129 : Demystifying Intervals
Derek Morgan, Bristol Myers Squibb
Thursday, 10:00 AM – 10:50 AM, Location: Coronado D
Intervals have been a feature of base SAS for a long time, allowing SAS users to work with commonly (and not-so-commonly) defined periods of time such as years, months, and quarters. With the release of SAS 9, there are more options and capabilities for intervals and their functions. This paper will first discuss the basics of intervals in detail, and then we will discuss several of the enhancements to the interval feature, such as the ability to select how the INTCK() function defines interval boundaries and the ability to create your own custom intervals beyond multipliers and shift operators.
APT-145 : With a View to Make Your Metadata Function(al): Exploring SAS Sources of Information on SAS Formats
Louise Hadden, Abt Associates Inc.
Thursday, 9:00 AM – 9:20 AM, Location: America’s Cup B/C
Many SAS programmers are accustomed to using SAS metadata on SAS data stores and processes to enhance their coding and promote data driven processing. SAS metadata is most frequently accessed through SAS “V” functions, SAS View tables, and dictionary tables. Information can be gained regarding SAS data stores and drilling down, attributes of columns in data files. However, few programmers are aware of SAS’s similar resources and capabilities with respect to SAS formats. Additionally, the PROC FORMAT procedure itself provides valuable information, along with PROC CATALOG. This presentation will detail how to curate data from various resources to create a dictionary of formats, and then how to use PROC FCMP to create a function to leverage that dictionary.
APT-150 : Wrangling Excel Worksheets
Tom Kari, Unknown
Thursday, 9:30 AM – 9:50 AM, Location: Coronado D
Ah, Excel. Sadly, it’s Excel’s world and we only live in it. Anybody experienced with answering questions on the SAS Communities has seen the full gamut of problems that are encountered with worksheets: the data is in the wrong column; the data is in the wrong row; I’m expecting a number in a cell, but it’s a character value; the cell should contain a date, but it doesn’t; I have too many or too few rows or columns. Yes, it’s true that on occasion importing data into SAS from an Excel workbook can fail, but those are the GOOD results. Worse are the cases where it looks like the import worked, but you got the wrong data in the wrong place. I was faced with the requirement to import multiple Excel worksheets monthly, in a production environment. Of course, I was ASSURED that “there wouldn’t be any problems with the worksheets”, and promptly ran into all of the problems described above. To deal with these problems, I developed BASE SAS code to do the following… Import all of the data as character, and then edit it and convert to numeric when appropriate. After importing the data from Excel, transpose it into a one-variable SAS dataset. Use a template file to determing whether the contents of a cell should be empty, a constant, or a variable, and the type of variable. These techniques allow for a number of “bullet proofing” techniques, where we can emerge with some confidence that we have the right data.
APT-156 : Table Lookup by Enclosing Hash in FCMP
Wenyu Hu, Merck Sharp & Dohme
Wednesday, 10:30 AM – 10:50 AM, Location: America’s Cup B/C
Table lookup or searching is a common task performed in SAS . There are many lookup methods such as SET with KEY=, arrays, SQL joins, formats, MERGE with a BY statement, and DATA Step hash object. Beginning with SAS 9.3, hashing is available to user-defined subroutines through SAS Function Complier (FCMP) Procedure. FCMP enables programmers to create and store frequently used complex and repetitive code into reusable and customized functions and subroutines. Embedding a hash object in PROC FCMP functions can enhance a user’s capability to handle large data by improving performance while retaining the programs’ simplicity. This paper introduces table lookup using hashing in Proc FCMP and demonstrate how it can improve performance and streamline an existing program.
APT-157 : TARDIS: Tracking Alterations and Record Differences in SAS, a Macro to Add Color to Your Dataset Comparisons
Inka Leprince, PharmaStat, LLC
Tuesday, 4:30 PM – 4:50 PM, Location: America’s Cup B/C
Time Lords are an ancient race of time- and space-travelers from the planet Gallifrey. They have taken upon the vital role of safeguarding time-travel technology to protect and preserve order in the universe. The bureaucratic part of this responsibility requires that the High Council of Time Lords keeps a record of events and continually cross-references the current version of events against previous iterations of events. To describe the task of monitoring individuals visiting (or potentially invading) planet Earth let alone the rest of the universe as “challenging” would be putting it lightly. The Tracking Alterations and Record Differences in SAS (TARDIS) macro was developed to ease the High Council’s burden by dynamically coloring records to identify antiquated or expunged incidents from the preceding version of events when compared against the current version of events. Additionally, TARDIS applies text formatting to highlight new events that had previously not occurred. After reading this paper, SAS users will acquire the knowledge to seamlessly integrate the TARDIS macro into existing listing programs, generate color-coded listings, and preserve order in the universe.
APT-166 : Undo SAS Fetters with Getters and Setters: Supplanting Macro Variables with More Flexible, Robust PROC FCMP User-Defined Functions That Perform In-Memory Lookup and Initialization Operations
Troy Martin Hughes, Datmesis Analytics
Tuesday, 4:00 PM – 4:20 PM, Location: America’s Cup B/C
Getters and setters are common in some object-oriented programming (OOP) languages such as C++ and Java, where “getter” functions retrieve values and “setter” functions initialize (or modify) variables. In Java, for example, getters and setters are constructed as conduits to private classes, and facilitate data encapsulation by restricting variable access. Conversely, the SAS language lacks classes, so SAS global macro variables are typically utilized to maintain and access data across multiple DATA steps and procedures. Unlike an OOP program that can categorize variables across multiple user-defined classes, however, SAS maintains only one global symbol table in which global macro variables can be maintained. Additionally, maintaining and accessing macro variables can be difficult when quotation marks, ampersands, percentage signs, and other special characters exist in the data. This text introduces user-defined getter functions and setter subroutines designed using the FCMP procedure, which enable data lookup and initialization operations to be performed within DATA steps. Among other benefits, user-defined getters and setters can facilitate the evaluation of complex Boolean logic expressions that leverage data stored across multiple data sets all concisely performed in a single SAS statement! Getters and setters are thoroughly demonstrated in the author’s text: PROC FCMP User-Defined Functions: An Introduction to the SAS Function Compiler. (Hughes, 2023)
APT-171 : Picking Scabs and Digging Scarabs: Refactoring User-Defined Decision Table Interpretation Using the SAS Hash Object To Maximize Efficiency and Minimize Metaprogramming
Troy Martin Hughes, Datmesis Analytics
Louise Hadden, Abt Associates Inc.
Wednesday, 10:00 AM – 10:20 AM, Location: America’s Cup B/C
Decision tables allow users to express business rules and other decision rules within tables rather than coded statically as conditional logic statements. In the first author’s 2019 book, SAS Data-Driven Development, he describes how decision tables embody the data independence that data-driven programming requires, and demonstrates a reusable solution that enables decision tables to be interpreted and operationalized through the SAS macro language. In their 2019 white paper Should I wear pants?, the authors demonstrate the configurability and reusability of this solution by utilizing the same data structure and underlying code to interpret unrelated business rules that describe unrelated domain data pants wearing and vacationing in the Portuguese expanse. Finally, in the current paper, the authors refactor this code by replacing metaprogramming techniques and macro statements with a user-defined function (that leverages a dynamic hash object) that performs the decision table lookup. The new hash-based interpreter is unencumbered by inherent macro metaprogramming limitations. This anecdotal “scab picking” the subtle refactoring of software to expand functionality or improve performance yields a more flexible interpreter that is more robust to diverse or difficult data, including special-character-laden data sets. In recognition of the authors’ combined love for all-things-archaeological, the decision rules in this text separately model Mayan ceramics excavation and Egyptian scarab analysis.
APT-172 : Automating Reports Using Macros and Macro Variables
Ekaterina Roudneva, UC Davis
Tuesday, 3:00 PM – 3:20 PM, Location: America’s Cup B/C
Report generation can involve manually updating code when you get new information. This paper will go over some techniques that can help you automate your reports and remove the need to manually change code when a dataset is updated or when working with different data. Topics include using PROC SQL to create macro variables and data driven programs. This will be applied to a real-world example that automates the creation of a data dictionary and a frequency report of all variables for specified datasets. In the example, macro variables in combination with %DO loops and macro arrays are used to create dynamic code that links variables with defined formats and outputs a report of all frequencies for every single variable. Using macros and %IF %THEN branching logic can further make the code more flexible and validate the input. This paper is intended for an audience that has basic knowledge of the SAS macro language.
APT-174 : Waterfall vis a vis Spider plots: Complex oncology efficacy endpoint made simpler.
Yoganand Budumuru, IQVIA
Tuesday, 2:30 PM – 2:50 PM, Location: America’s Cup B/C
Over the past 50 years, cancer diagnosis has transformed medicinal research and advancement. Such changes have only increased the complexity of clinical trials. To analyze such data from oncology studies, scientists measure certain efficacy endpoints such as Best Overall Response (BOR) to evaluate the effect of treatments on tumor response. Among various graphical representations, Waterfall plots and Spider plots have become increasingly efficient in visually depicting tumor shrinkage. However, each of these competing techniques have their share of benefits and drawbacks, causing perplexity within the clinical domain. Waterfall plots, on one hand, display a series of bars, each representing a patient’s change in tumor size. While they are clear and concise at allowing scientists to assess differences, they are limited to exhibiting smaller cohorts of patients. On the other hand, Spider plots are a series of web-like projections that represent the change from Baseline for tumors in different groups of individuals. While they provide for simultaneous comparison of multiple variables, they can often oversimplify complex relationships. As there is no comprehensive review comparing the merits of Waterfall and Spider plots, this paper aims to explore appropriate SAS code to generate these figures and to determine the best method of measuring BOR.
APT-187 : Leveraging Teradata ClearScape Analytics to Improve ROI
Gregory Goralski, Teradata
Paul Segal, Teradata
Tuesday, 5:30 PM – 6:20 PM, Location: America’s Cup B/C
Teradata recently introduced ClearScape Analytics, which is a branding that encompasses all of Teradata’s analytical capabilities, which includes in database functions (covering profiling, feature engineering, machine learning, time series analysis and digital signal processing), as well as modelops – which includes the ability to take models built outside of Teradata Vantage (including SAS models) and deploy them inside the database complete with model monitoring, and input monitoring. It also encompasses open source tooling (such as R and Python), and integration with CSP analytical toolkits (such as AWS Sagemaker, Azure ML, and Vertex AI). In this session you will be introduced to all these concepts, and then shown how to make use of these functions from within SAS 9.x, SAS Viya and Python. You will also see how Teradata ClearScape Analytics has benefited business and significantly increased ROI (and decreased TCO).
APT-189 : A SAS Code Hidden in Plain Sight
Bart Jablonski, yabwon
Wednesday, 11:00 AM – 11:50 AM, Location: America’s Cup B/C
Sometimes there is a need to share a SAS macro in a way that the execution of the macro is possible, but at the same time the source code is not “readable” (e.g., the code is a proprietary solution). In this article, a programming-based solution utilizing encrypted user-written functions (FCMP) and SAS data sets, which gives requested effect, will be explained. Furthermore, the solution (in contrary to operating-system-dependent SAS catalogs) allows for code exchange between different operating systems! As a cherry on top, the GSM (Generate Secure Macros) package, which allows users to create secured macros stored in SAS Proc FCMP functions and share them between different operating systems without showing their code, will be presented.
APT-209 : List Processing using SQL Select Into to Replace Call Symputx Creating Indexed Arrays of Macro Variables
Ronald Fehd, Fragile-Free Software
Thursday, 11:30 AM – 11:50 AM, Location: America’s Cup B/C
SAS software provides the ability to allocate a macro variable in a data step with its call symputx routine. This routine can be used to make a sequentially-numbered series of macro variables mvar1 mvarN which is refered to as an indexed array of macro variables. The purpose of this paper is to examine the algorithm of macro variable array usage and provide sql consolidations of the various tasks.
APT-210 : A Batch Processing Companion, how to write Windows *.bat and *.cmd files for my-program.sas
Ronald Fehd, Fragile-Free Software
Wednesday, 8:30 AM – 8:50 AM, Location: America’s Cup B/C
This paper review issues in writing Windows batch and command files for processing SAS software *.sas programs. The purpose of this paper is to provide programmers and users information about Windows operating system environment variables and how to use them to write a .cmd file: sas.cmd, which creates a date+time-stamped .log file.
APT-214 : List Processing Macro Call-Macro
Ronald Fehd, Fragile-Free Software
Thursday, 8:30 AM – 8:50 AM, Location: Coronado D
This paper provides an explanation of the general-purpose list processing routine Call-Macro in SAS software. The parameters of this macro are a data set name and a macro name. It reads each row of the date set and generates a call to the macro named in its parameter. The purpose of this paper is to show how to use the macro function sysfunc with the SCL data set functions that open a data set, read and process each variable in an observation, and then close the data set.
APT-227 : Tapping the Power of PRX Functions in SAS 9.4 and in SAS Viya
John LaBore, SAS
Edwin Xie, SAS
Wednesday, 4:00 PM – 4:50 PM, Location: America’s Cup B/C
SAS programmers learn to use many basic SAS functions within the DATA step, but typically only a few advanced users learn about the SAS PRX (Perl Regular Expression) functions and call routines. The SAS PRX functions provide a powerful means to handle complex string manipulations by enabling the same end result with fewer lines of code, or by enabling the analysis of data previously out of reach of the basic string manipulation functions. The PRX functions and call routines that became available in SAS version 9 are accessible within the DATA step, and are tools that every advanced SAS programmer should have in their toolkit. A subset of these PRX functions are already available in SAS Viya. Examples are provided in SAS 9 and in SAS Viya to give a quick introduction to the syntax, along with a review of the resources available to the programmer getting started with PRX functions.
APT-228 : Super Demo:Getting Started with Bayesian Analysis
Danny Modlin, SAS
Thursday, 9:30 AM – 9:50 AM, Location: America’s Cup B/C
This presentation introduces the audience to the realm of Bayesian analyses and concepts. Participants will be able to see the difference between the Bayesian approach and the classical approach to statistics. Convergence diagnostics, images, and a sample example of PROC MCMC will be shared.
APT-229 : Super Demo: Sandwich Your SAS Data Set to Excel Pivot Tables
Charu Shankar, SAS Institute
Tuesday, 5:00 PM – 5:20 PM, Location: America’s Cup B/C
Excel is universally loved. SAS has a way to bring excel into SAS so that you can analyze your data. Users now ask “Great, I can analyze my data in SAS, but my end users don’t have SAS on their desktops. How can I give them SAS data in excel form”. We’ll go even further, instead of taking SAS into a standard Excel workbook, what if you could take SAS to an excel pivot table? Now you can. In this demo watch how quickly you can take a SAS dataset to excel pivot tables. See how in minutes, the Excel table shapes and forms right under your own eyes.
APT-230 : Super Demo: Missing Data and Proc MCMC
Danny Modlin, SAS
Wednesday, 3:00 PM – 3:20 PM, Location: America’s Cup B/C
This presentation will show the participants how to incorporate missing data into the Bayesian analysis and not be subjected to complete case analysis. Posterior distributions for the missing values will be generated and the uncertainty of the missing will be captured within the final model.
APT-231 : Super Demo: Tips and Tricks for Sending Emails Using SAS
Chris Hemedinger, SAS
Wednesday, 11:00 AM – 11:20 AM, Location: Regatta B/C
An email message is a great way to send a notification when a SAS job completes, or to distribute a result from SAS as an attached report or spreadsheet. Learn how the SAS programming language allows you to send email as an output via the FILENAME EMAIL method. You will learn how to: Send email with a SAS program. Configure SMTP to send email. Send email with attachments or embedded images. Send email to multiple recipients, customizing the message for each. Send a text message using SAS.
APT-232 : Super Demo: How Do I Debug in SAS Data Step
Chris Hemedinger, SAS
Thursday, 11:30 AM – 11:50 AM, Location: Regatta B/C
PUT statements might be “good enough” for some coders, but you can learn how to use the SAS DATA step debugger to save time and be a smart SAS programmer. You will learn: what the SAS DATA step debugger is and where to find it in SAS Enterprise Guide, SAS Studio and Base SAS. how to use the debugger to set breakpoints, watch values and solve logic problems what the limitations of the debugger are.
Analytics & Statistics
AS-102 : Blinding Indexes – Generalized and Unified Framework – a SAS Macro
Eduard Poltavskiy, NA
Rima Nandi, Herbert Wertheim School of Public Health, University of California, San Diego, USA
Jeehyoung Kim, Department of Orthopedic Surgery, Seoul Sacred Heart General Hospital, Seoul, Korea
Heejung Bang, Division of Biostatistics, Department of Public Health Sciences, University of California, Davis, CA, USA
Thursday, 11:30 AM – 11:50 AM, Location: Coronado D
Currently, James’ and Bang’s blinding indexes are considered as the reference standard measure to analyze blinding related data (Howick et al., 2020) in clinical trials, and computing modules are available in Microsoft Excel , R, SAS , and Stata (Chen, 2008; Kim, 2014, 2016; Kim et al., 2021; Schwartz & Mercaldo, 2022; Wu & Zhu, 2022). However, the capabilities of these modules are somewhat limited to simplistic scenarios with different functions available in different software; e.g., for two treatment arms and three possible responses; no option for re-asking the subject who answered Don’t know’ (e.g., ancillary or validation purpose); no option for user-defined weights; and/or no selection of the direction of confidence intervals (CI). This paper reviews the background and context and provides a new macro for these two common indexes for blinding assessment for comprehensive data/settings in a unified framework. In order to make this macro work, you need to verify that SAS/IML component is licensed at your site.
AS-108 : Introduction to Machine Learning – Descriptions and Best Practices
Jim Box, SAS Institute
Tuesday, 5:00 PM – 5:50 PM, Location: America’s Cup D
Artificial Intelligence and Machine Learning (AI & ML) are THE hot topics these days. In this talk, we’ll define ML and go over the many types of models and explain how they work and what they are used for. We’ll also cover the best practices around using ML models, including how to identify the best model. Finally, we will explore some of the concerns about implementing ML models – specifically covering how to analyze the model process for human biases that can have dramatic impacts on society.
AS-113 : Regression Analysis Made Easy Using SAS Studio
Kirk Paul Lafler, sasNerd
Zheyuan Walter Yu, Optimus Dental Supply
Nuoer Lu, University of North Carolina at Chapel Hill
Nicklas (Rebel) Yee, University of Washington, Seattle
Yanzhang Gavin Chen, University of California, Los Angeles
Kai Kang, University of California, Berkeley
Yixuan (Jason) Xiang, University of California, Davis
Tuesday, 4:30 PM – 5:20 PM, Location: Coronado B
SAS OnDemand for Academics (ODA) provides students, faculty, and SAS learners with free access to SAS software and the SAS Studio user interface using a web browser. SAS Studio provides a comprehensive and customizable integrated development environment (IDE) for all SAS users. To showcase SAS Studio’s many features, numerous techniques will be introduced to access, clean, transform, analyze, and visualize data using the point-and-click features found in SAS Studio’s Navigation Pane’s Tasks and Utilities. Plus, we’ll demonstrate the generated SAS code that is automatically produced from the point-and-click techniques. To obtain a high-level understanding of the datasets being used, we’ll demonstrate tasks associated with exploratory data analysis (EDA) to identify missing values, explore outliers, and evaluate trends in the data. Two types of regression will be demonstrated simple linear regression where one independent variable is used to explain or predict the outcome of the dependent variable and multiple linear regression where two or more independent variables to explain or predict the outcome of the dependent variable to assist with decision-making activities. Key takeaways will be provided to assist in learning regression analysis techniques using effective examples.
AS-117 : Dashboard Development: From Prepping to Visualizing Data
Bert Cisneros, Arizona Supreme Court – Adminstrative Office of the Courts
Richard Rivera, Arizona Supreme Court
Tuesday, 5:30 PM – 6:20 PM, Location: Coronado B
Enhance your career and knowledge in data dashboards development. Technological innovations, such as data dashboards, help organizations track performance and key performance indicators (KPIs) as well as have utility for policy and decision making. SAS is a powerful analytical tool to prepare data for any data visualization software. We will discuss stages of dashboard development, such as how to prepare data in SAS 9.5 Windows, how to frame data to enhance visualizations, and what to consider in creating efficient, practical and accessible data dashboards. We will illustrate concepts via two dashboards: (a) Superior Court Case Activity: filing, disposition & sentencing and (b) Court Personnel: personnel characteristics with incorporation of two external data sources: Census and State Bar demographics. https://www.azcourts.gov/statistics/Interactive-Data-Dashboards Data dashboard development will improve and expand your job performance and increase job opportunities.
AS-119 : Advanced Project Management beyond Microsoft Project, Using PROC CPM, PROC GANTT, and Advanced Graphics
Stephen Sloan, Accenture
Lindsey Puryear, SAS Institute
Tuesday, 3:00 PM – 3:20 PM, Location: Coronado B
The Challenge: Instead of managing a single project, we had to craft a solution that would manage hundreds of higher- and lower-priority projects, taking place in different locations and different parts of a large organization, all competing for common pools of resources. Our Solution: Develop a Project Optimizer tool using the CPM procedure to schedule the projects and using the GANTT procedure to display the resulting schedule. The Project Optimizer harnesses the power of the delay analysis feature of PROC CPM and its coordination with PROC GANTT to resolve resource conflicts, improve throughput, clearly illustrate results and improvements, and more efficiently take advantage of available people and equipment.
AS-126 : A unique and innovative end-to-end demand planning and forecasting process using a collection of SAS products
Stephen Sloan, Accenture
Kevin Gillette, Accenture Federal Services
Thursday, 11:00 AM – 11:20 AM, Location: Coronado D
Forecasting demand can be a very tricky process. Questions arise about which statistical algorithms to use when forecasting based on past sales, how to incorporate business knowledge into the forecast, planning for unforeseen events, and planning for unique events that would not be predictable from sales history. As an example, COVID caused many forecasts to be wrong about quantities when purchasing switched from services to goods, from in-store sales to remote purchases, and from work in the office to remote work. Even when the total quantities of a product were forecasted incorrectly, the percentage distribution of the sales for some of the subcategories within the larger categories was often accurate, and vice versa. An example of this is the switch from eating and drinking in restaurants to take-out, where the total quantities of the items might have been the same, but the packaging changed (fewer kegs, more six-packs). The solution, then, is to leverage the useful information from the statistical forecasts while allowing the people who know the business to make individual or mass updates. All of this can be accomplished using existing SAS products: using SAS EG and SAS DI to read in, manipulate, and output data; using the High-Performing Forecasting (HPF) SAS PROCs, which underpin SAS Enterprise Miner, to create statistical forecasts; and using SAS Financial Management (SAS FM), which incorporates Excel features while remaining within SAS, to allow users to make individual and mass changes to SAS data sets.
AS-158 : Univariate Outlier Detection Using SAS
Fan Yang, Johnson and Johnson Vision
Wednesday, 9:00 AM – 9:20 AM, Location: Coronado B
Outlier detection is an important task for many data analysis projects. It can help identify unusual observations in a dataset that may affect the statistical or analytical method used or have a large influence on the analysis results. This paper will introduce five statistical methods to detect outliers in a univariate dataset. These five methods range from a simple use of the mean and standard deviation statistics to the use of more advanced statistical techniques such as Robust Regression and the Medcouple statistic. This paper also covers some programming aspects about how to implement these five methods in SAS. A custom SAS macro is built using the SAS/IML language to implement an advanced FAST’ algorithm to calculate the Medcouple statistic for outlier detection that seems not to have been done yet in the SAS community. Finally, the sensitivity of outlier detection for these five methods are compared in a simulation study with some recommendations drawn. The target audience for this paper are those SAS practitioners or data analysts who may want to refresh or enrich their knowledge about the statistical methods for outlier detection and who may benefit from learning how to apply these methods in SAS.
AS-164 : The Delta Method in Statistical Inference, with Applications in the SAS IML Procedure
Carter Sevick, Division of Health Care Policy and Research, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
Camille Moore, Center for Genes, Environment and Health, National Jewish Health, Denver, Colorado, USA
Samantha MaWhinney, Department of Biostatistics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
Wednesday, 8:30 AM – 8:50 AM, Location: Coronado B
Testing of substantive hypotheses may involve complex computations on estimated model parameters without clear, or immediate, standard errors. Possible examples include a difference in group proportions, from a multivariable logistic regression, or summary effects from a pattern mixture model. Inference can often be readily carried out using resampling methods, such as the bootstrap, but these strategies may be impractical in cases of large data, computationally intensive models or both. The Delta Method has a long and rich history in statistics and may be able to deliver valid and reliable inferences within acceptable time frames when resampling methods would not. In this work, the Delta Method is reviewed with attention to cases when it works well and when other strategies would be advisable. Included examples are graded from simple to complex, and detailed code using the IML procedure is provided.
AS-170 : Sorting a Bajillion Variables: When SORTC and SORTN Subroutines Have Stopped Satisfying, User-Defined PROC FCMP Subroutines Can Leverage the Hash Object to Reorder Limitless Arrays
Troy Martin Hughes, Datmesis Analytics
Tuesday, 2:30 PM – 2:50 PM, Location: Coronado B
The SORTC and SORTN built-in SAS subroutines sort character and numeric data, respectively, and are sometimes referred to as “horizontal sorts” because they sort variables rather than observations. That is, all elements within a SORTC or SORTN sort must be maintained in a single observation. A limitation of SORTC and SORTN is their inability to sort more than 800 variables when called inside the FCMP procedure. To overcome this disagreeable, arbitrary threshold, user-defined subroutines can be engineered that leverage the hash object to sort limitless variables. The hash object orders values that are ingested into it using the ORDERED argument, which can specify either ASCENDING or DESCENDING. This text demonstrates three failure patterns that occur when the OF operator specifies an array inside the FCMP procedure, which affect both character and numeric arrays, and which cause all built-in functions and subroutines to fail with runtime errors.
AS-175 : What’s next after model comparison? Model selection and model averaging in SAS
Chong Ho Yu, Hawaii Pacific University
Charlene Yang, Azusa Pacific University
Wednesday, 10:30 AM – 10:50 AM, Location: Coronado B
To enhance the generalizability of the findings and to overcome the replication crisis, it is a common practice to run multiple models under the framework of ensemble methods, instead of drawing a conclusion based on a single analysis. At the stage of model comparison, the analyst can choose between model selection and model averaging. In model selection, the best model is retained based on predictive accuracy, error rates, or parsimony, such as F1 score, generalized R-square, RASE, AIC and BIC. By doing so, the second-best and other less adequate models are totally discarded. On the other hand, in model averaging, information from all models is utilized in several ways, such as averaging the prediction estimates from all models, selecting the highest estimates of all models, and returning the proportion of the models that can substantially contribute to the outcome. While these options are available in SAS Viya, SAS Enterprise Miner, and JMP Pro, no consensus exists on how model selection and model averaging should be properly used under different situations. There have been prior research studies that found, in most cases, model averaging and model selection lead to comparable results. In contrast, another study found that model averaging can yield more accurate results than model selection when researchers study complex models, such as nonlinear mixed-effect models. Further, some researchers argue that model averaging generally outperforms a single best model, especially when the analyst does not have information on the relative performance of the candidate models. Nonetheless, model averaging is based on the assumption that all models have certain merits that can contribute to the assembly of the final model, but in reality, it might not always be the case. In this presentation, the merits and disadvantages of both approaches will be discussed and illustrated using SAS software applications.
AS-178 : Unraveling the Layers within Neural Networks: Designing Artificial and Convolutional Neural Networks for Classification and Regression Tasks using TensorFlow in Python
Ryan Lafler, Premier Analytics Consulting, LLC
Anna Wade, Premier Analytics Consulting, LLC
Wednesday, 2:00 PM – 2:50 PM, Location: Coronado B
Capable of accepting and mapping complex relationships hidden within structured and unstructured data, Neural Networks are composed from layers of neurons with functions that interact, preserve, and exchange information between each other to develop highly flexible and robust predictive models. Neural Networks are versatile in their applications to real-world problems; capable of regression, classification, and generating entirely new data from existing data sources, Neural Networks are accelerating the breakthroughs in deep learning methodologies. Given the recent advancements in graphical processing unit (GPU) cards, cloud computing, and the availability of interpretable APIs like the Keras interface for TensorFlow, Neural Networks are quickly moving from development to deployment in industries ranging from finance, healthcare, climatology, movies, video streaming, business analytics, and marketing given their versatility in modeling complex problems using structured, semi-structured, and unstructured data. This Paper offers users an intuitive, example-oriented guide to designing basic Artificial Neural Network and Convolutional Neural Network architectures in Python for non-parametric regression and image classification tasks.
AS-196 : Creating and Customizing High-Impact Excel Workbooks from SAS with ODS EXCEL
Josh Horstman, Nested Loop Consulting
Thursday, 8:30 AM – 9:20 AM, Location: Coronado B
Love it or hate it, Microsoft Excel is used extensively through the business world. As a SAS user, you can enhance the impact of your work by using the ODS EXCEL destination to create high-quality, customized output in Excel format directly from SAS. This paper walks through a series of examples demonstrating the flexibility and power of this approach. In addition to complete control over visual attributes such as fonts, colors, and borders, the ODS EXCEL destination allows the SAS user to take advantage of Excel features such as multiple tabs, frozen or hidden rows and columns, and even Excel formulas to deliver the high-impact results you and your customers want!
AS-203 : Survey Data Analysis in SAS
Denis Nyongesa, KAISER PERMANENTE
Marianne Jurasic, Boston University
Gregg Gilbert, University of Alabama at Birmingham
Tamara Lischka, Kaiser Permanente Center for Health Research
Thursday, 11:30 AM – 11:50 AM, Location: Coronado B
While few surveys use a simple random sampling design to collect data, regular statistical software analyzes data as if the data were collected using simple random sampling. Ignoring the sampling design in analysis of survey data may lead to incorrect point estimates and their standard errors. This paper provides an overview of the SAS procedures used in analyzing survey data. The SAS procedures handled in this paper include PROC SURVEYMEANS, PROC SURVEYFREQ, and PROC SURVEYLOGISTIC. The STRATA and WEIGHT statements for the procedures mentioned above will be discussed in relation to the sampling design used during the generation of the weighted estimates. Both univariate and multivariable weighted logistic regression models will be discussed. The cross-sectional questionnaire data from the National Dental Practice-Based Research Network study entitled “Deep Caries Removal Strategies: Findings from the National Dental Practice-Based Research Network” will be used to demonstrate the above-mentioned SAS procedures in SAS 9.4.
AS-216 : D-I-D the policy have an impact? Difference-in-difference methods applied to survey data in SAS
Melanie Dove, UC Davis
Thursday, 9:30 AM – 10:20 AM, Location: Coronado B
Difference-in-difference (DID) methods can be used to examine the impact of health policies by comparing changes in outcomes over time between a population exposed to a health policy and a population that is not. These methods are commonly used with survey data, as survey data is usually collected on a regular basis over time. In this paper, we will describe how we used DID methods to examine the impact of flavored tobacco sales restrictions on youth e-cigarette use in SAS, using data from the California Healthy Kids Survey. PROC SURVEYLOGISTIC was used to estimate DID models to compare how youth e-cigarette use changed pre-policy (2017/2018) to post-policy (2019/2020), and between students exposed to a flavor restriction relative to unexposed students. To obtain the DID odds ratio, we included an interaction term between year (2019/2020 compared with 2017/2018) and exposure group (flavor restriction: yes or no) on the model’ statement. Estimate’ statements were used to obtain an odds ratio with the appropriate comparisons. One of the assumptions of DID models is that trends in the outcome in the exposed and unexposed groups were parallel prior to the policy, called the parallel trends assumption. We will describe how we assessed this assumption by 1) visually examining the percent of high school students who used an e-cigarette from 2015/2016 to 2019/2020 for students exposed to a flavor restriction and unexposed students, and 2) including an interaction term (year flavor restriction), using 2015/2016 as the referent category, in logistic regression models.
AS-219 : Data Wrangling and Descriptive Statistics
Tom Grant, SAS
Wednesday, 11:00 AM – 11:50 AM, Location: Coronado B
Data Wrangling and Descriptive Statistics
AS-221 : Tales from a Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
Chevell Parker
Thursday, 10:30 AM – 11:20 AM, Location: Coronado B
Tales from a Tech Support Guy: The Top Ten Most Impactful Reporting and Data Analytic Features for the SAS Programmer
AS-223 : Quick Start to Mixed Modeling
Danny Modlin, SAS
Tuesday, 3:30 PM – 4:20 PM, Location: Coronado B
In this presentation, you learn aspects of linear mixed models, code and results. Emphasis is on a quick-start approach to learning the SAS programming techniques and interpreting results. Examples of mixed models will be shown.
AS-226 : Leveraging Logistic Regression and Bootstrap Sampling to Assess a Decision Threshold in Classification Predictive Analytics
Colleen McGahan, BC Cancer Agency
Wednesday, 9:30 AM – 10:20 AM, Location: Coronado B
This presentation will focus on a healthcare example to illustrate the practical application of logistic regression in SAS in combination with bootstrap validation and the utilization of ROC curve data to obtain sensitivity and specificity which are used to assess a decision threshold as part of the prediction capability of the model. Logistic regression is a statistical technique commonly used for classification and predictive analytics. It enables us to estimate the probability of an event occurring based on various input variables and with the use of a decision threshold can be used as a classification tool. Its utility extends across many application areas, including natural language processing, manufacturing, wildlife habitat, finance, healthcare, and marketing.
Beginner’s Techniques
BT-112 : Data Access Made Easy Using SAS Studio
Kirk Paul Lafler, sasNerd
Shaonan Wang, Columbia University
Nuoer Lu, University of North Carolina at Chapel Hill
Zheyuan Walter Yu, Data Analyst and Biostatistics Professional Optimus Dental Supply
Daniel Qian, Ten Square International, Inc.
Swallow Xiaozhe Yan, President, US Education Without Borders
Tuesday, 2:30 PM – 3:20 PM, Location: America’s Cup D
SAS OnDemand for Academics (ODA) gives students, faculty, and SAS learners free access to SAS software and the SAS Studio user interface using a web browser. SAS Studio provides users with a comprehensive and customizable integrated development environment (IDE) for all SAS users. A number of techniques will be introduced to showcase SAS Studio’s ability to access a variety of data files; the application of point-and-click techniques using the Navigation Pane’s Tasks and Utilities; the importation of external delimited text (or tab-delimited), comma-separated values (CSV), and Excel spreadsheet data files by accessing Import Data under Utilities; read JSON data files; and view the results and SAS data sets that are produced. We also provide key takeaways to assist users learn through the application of tips, techniques, and effective examples.
BT-130 : The Essentials of SAS Dates and Times
Derek Morgan, Bristol Myers Squibb
Wednesday, 4:00 PM – 4:50 PM, Location: Coronado B
The first thing you need to know is that SAS software stores dates and times as numbers. However, this is not the only thing that you need to know. This presentation gives you a solid base for working with dates and times in SAS. It introduces you to functions and features that enable you to manipulate your dates and times with surprising flexibility. This paper shows you some of the possible pitfalls with dates (and times and datetimes) in your SAS code and how to avoid them. We show you how SAS handles dates and times through examples, including the ISO 8601 formats and informats and how to use dates and times in TITLE and FOOTNOTE statements. The presentation closes with a brief discussion of Excel conversions.
BT-193 : The 5 CATs in the Hat Sleek Concatenation String Functions
Kirk Paul Lafler, sasNerd
Tuesday, 6:00 PM – 6:20 PM, Location: America’s Cup D
SAS functions are an essential component of the SAS Base software. Representing a variety of built-in and callable routines, functions serve as the “work horses” in the SAS software providing users with “ready-to-use” tools designed to ease the burden of writing and testing often lengthy and complex code for a variety of programming tasks. The advantage of using SAS functions is evident by their relative ease of use, and their ability to provide a more efficient, robust, and scalable approach to simplifying a process or programming task. SAS functions span several functional categories, including character, numeric, character string matching, data concatenation, truncation, data transformation, search, date and time, arithmetic and trigonometric, hyperbolic, state and zip code, macro, random number, statistical and probability, financial, SAS file I/O, external files, external routines, sort, to name a few. This SAS programming tip highlights the old, alternate, and new methods of concatenating strings and/or variables together.
Data Management & Administration
DMA-110 : How To Easily Ingest (Lots Of) External Data Into The SAS Ecosystem By Only Setting One Parameter – Welcome to Data Ingestion Auto Pilot (DIAP)
Stephan Weigandt, SAS Institute
Wednesday, 8:30 AM – 8:50 AM, Location: Coronado D
This article is about the “Data Ingestion Auto Pilot – DIAP” custom step which is freely available in the public GitHub SAS Studio Custom Step repository for download and runs on Viya 4. One of the highlights is that DIAP automatically determines the separators (comma, semicolon, pipe, tab, exclamation mark, hash, blank) when ingesting a text file and can deal with XLSX, JMP, SHP, JSON, XML, CSV, TXT files. This list is constantly expanding. DIAP can handle most exceptions which will be outlined more in detail during this paper. It has been interesting for me to observe (first within myself, but then also with others), that the task of ingesting external files often comes with a form of frustration and can form a stumbling block in the progress of a project. Even though there are many out-of-the-box capabilities for importing external files (making it supposedly easy), in many cases though there is more to be done. It quickly becomes cumbersome when there are many different files to be read in. Sometimes reading in even only a few files can cause headaches, because there are impurities in the file (on the meta level, but also in the actual content) that can cause unexpected outcomes. And in some cases, it might be required to go through a whole directory tree to read in all files. Dealing with similar issues repeatedly, I was determined to create a tool for resolving it. My nature, that I must automate something when I must do something twice, took over – DIAP, the Data Ingestion Auto Pilot was born. The purpose is to take care of those re-occurring headaches in an automated way.
DMA-120 : Reducing the space requirements of SAS data sets without sacrificing any variables or observations
Stephen Sloan, Accenture
Wednesday, 9:00 AM – 9:20 AM, Location: Coronado D
The efficient use of space can be very important when working with large SAS data sets, many of which have millions of observations and hundreds of variables. We often need to fit the data sets into a fixed amount of space. Many SAS data sets are created by importing Excel or Oracle data sets or delimited text files and the default length of the variables in the SAS data sets can be much larger than necessary. When the data sets don’t fit into the available space, we sometimes need to make choices about which variables and observations to keep, which files to zip, and which data sets to delete and recreate later. There are things we can do to make the SAS data sets more compact and use our space more efficiently. These things can be done in a way that allows us to keep all the desired data sets without sacrificing any variables or observations. SAS has compression algorithms that can shrink the space of the entire data set. There are also tests we can run to shrink the length of different variables and evaluate whether they are more efficiently stored as numeric or character variables. These techniques often save a significant amount of space; sometimes as much as 90% of the original space is recouped. We can use macros so that data sets with large numbers of variables can have their space reduced by applying the above tests to all the variables in an automated fashion.
DMA-122 : Getting a Handle on All of Your SAS 9.4 Usage
Stephen Sloan, Accenture
Wednesday, 3:30 PM – 3:50 PM, Location: Coronado D
SAS is popular, versatile, and easy to use, so it proliferates rapidly through an organization. It can handle systems integration, data movement, and advanced statistics and AI, has links to a large amount of file types (Oracle, Excel, text, and others), and has functionality for almost every need. In addition to SAS Base, there are a large number of specialized SAS products, some of the most popular of which are SAS STAT, SAS OR, SAS Graph, SAS Enterprise Miner, and SAS Forecast Server. SAS EG allows for quick creation of useful artifacts and facilitates saving the generated code for later use as part or all of another program. As a result, sometimes it is difficult to track all the SAS programs and artifacts being used across an organization, economies of scale can be overlooked, and repetition and “reinventing the wheel” sometimes take place. Programs and macros developed in one area can be useful in other areas and they can be improved by this internal crowd-sourcing. Understanding all the places where SAS is used is also important when upgrading a system that makes heavy use of SAS, or when upgrading SAS itself to a new version like Viya. It can also help an organization identify which SAS products it is using and how much use these products are getting. To accomplish the above, we’ve developed a set of programs to search a Unix server or a Windows server or machine to find, catalog, and identify the SAS usage on the machine.
DMA-161 : Processing Large Volumes of Communicable Disease Data: SAS Techniques from the COVID-19 Experience in San Diego County
Whitney Webber, County of San Diego
Nathaly Moran, County of San Diego
Jacob Ritz, County of San Diego
Fatema Sakha, County of San Diego
Jacquelyn Ho, County of San Diego
Jennifer Nelson, County of San Diego
Jeffrey Johnson, County of San Diego
Wednesday, 11:30 AM – 11:50 AM, Location: Coronado D
San Diego County is one of very few counties in California that maintains its own electronic communicable disease reporting and surveillance system. Epidemiologists routinely extract and analyze data from the system to identify and report on the status of diseases in the county. The large influx of COVID-19 lab reports during the pandemic challenged the system and surveillance operations and the ability of epidemiologists to provide daily counts of COVID-19 cases, hospitalizations, and deaths in a timely, efficient manner. This paper will examine how the COVID-19 surveillance epidemiologists in San Diego County, primarily new users of SAS, reimagined how to manage and analyze COVID-19 data with Base SAS to meet the information needs of leaders (for priority-setting and policy decisions) and the public. SAS techniques from the COVID-19 experience in San Diego County include PROC APPEND, PROC DATASETS, macros and arrays, KEEP= option with Colon (:) Modifier, ODBC libraries, PROC IMPORT, FIRST. and LAST. processing, IF-THEN-DO statements, LAG, and INTCK. This paper will also explore the successes and challenges of using SAS for COVID-19 surveillance during the pandemic, and how SAS is now being leveraged to respond to current communicable disease outbreaks.
DMA-167 : Make You Holla’ Tikka Masala: Creating User-Defined Informats Using the PROC FORMAT OTHER Option To Call User-Defined FCMP Functions That Facilitate Data Ingestion Data Quality
Troy Martin Hughes, Datmesis Analytics
Wednesday, 3:00 PM – 3:20 PM, Location: Coronado D
You can’t just let any old thing inside your tikka masala you need to carefully curate the ingredients of this savory, salty, sometimes spicy delicacy! Thus, when reviewing a data set that contains potential tikka masala ingredients, an initial data quality evaluation should differentiate approved from unapproved ingredients. Cumin, yes please; chicken, the more meat the merrier; coriander, of course; turmeric, naturally; yeast, are you out of your naan-loving mind?! Too often, SAS practitioners first ingest a data set in one DATA step, and rely on subsequent DATA steps to clean, standardize, and format those data. This text demonstrates how user-defined informats can be designed to ingest, validate, clean, and standardize data in a single DATA step. Moreover, it demonstrates how the FORMAT procedure can leverage the OTHER option to create a user-defined informat that calls a user-defined FCMP function to perform complex data evaluation and transformation while simulating _ERROR_ variable functionality. Control what you put inside your tikka masala with this straightforward solution that epitomizes data-driven software design while leveraging the flexibility of the FORMAT and FCMP procedures!
DMA-207 : SAS Outputs as the Hub of Your Organization’s Data Ecosystem
Michael Aleman, WUSS
Wednesday, 3:00 PM – 3:50 PM, Location: Coronado B
Are data outputs just for viewing and analyzing or are outputs able to do more? One of my sore points in a business environment is that SAS is a diverse platform of data inputs, throughput and outputs. As SAS users, analysts, and programmers, the community tends output data in spreadsheets, databases, and infographics. We can provide more value to our organizations with creating processes that feed other platforms with the organization’s vital data as well as provide world class data delivery. The presentation will cover some common data outputs such as Excel, JSON, XMLs and their benefits and drawbacks in delivering data or creating products through data. I will elaborate more on XMLs (one of my favorite ways to deliver data) by using XSLs (stylesheets), XSDs, and PROC XSL and SAS XML Mapper that transforms row level XML SAS output and transforms it into a universal data format that platforms can ingest to make SAS the main hub of your data ecosystem.
DMA-213 : Q&A with the Macro Maven: Do we need Macros? An Essay on the Theory of Application Development
Ronald Fehd, Fragile-Free Software
Wednesday, 10:30 AM – 11:20 AM, Location: Coronado D
This paper examines the theoretical steps of applications development (ApDev) of routines and subroutines in SAS software. It compares and contrasts the benefits of using the %include statement versus macros. It examines the methods of calling subroutines, e.g., sql, call execute and macro loops. The purpose of this paper is to highlight the benefits of using macros to support unit and integration testing, and searching for and finding issues during maintenance.
DMA-220 : The SAS Viya ETL Playbook
Charu Shankar, SAS Institute
Wednesday, 9:30 AM – 10:20 AM, Location: Coronado D
The SAS Viya ETL Playbook
DMA-222 : How Do I Read and Write Excel Files Using SAS?
Chris Hemedinger, SAS
Wednesday, 2:00 PM – 2:50 PM, Location: Coronado D
How Do I Read and Write Excel Files Using SAS?
Hands-on Workshops
HOW-111 : Application of Fuzzy Matching Techniques Using SAS Software
Kirk Paul Lafler, sasNerd
Stephen Sloan, Accenture
Thursday, 8:30 AM – 9:45 AM, Location: Regatta B/C
Data comes in all forms, shapes, sizes and complexities. Stored in files and datasets, SAS users across industries recognize that data can be, and often is, problematic and plagued with a variety of issues. Data files can be joined without problem when each file contains identifiers, or “keys”, with unique values. However, many files do not have unique identifiers and need to be joined by character values, like names or E-mail addresses. These identifiers might be spelled differently, or use different abbreviation or capitalization protocols. This paper illustrates datasets containing a sampling of data issues, popular data cleaning and user-defined validation techniques, data transformation techniques, traditional merge and join techniques, the introduction to the application of different SAS character-handling functions for phonetic matching, including SOUNDEX, SPEDIS, COMPLEV, and COMPGED, and an assortment of SAS programming techniques to resolve key identifier issues and to successfully merge, join and match less than perfect, or “messy” data. Although the programming techniques are illustrated using SAS code, many, if not most, of the techniques can be applied to any software platform that supports character-handling.
HOW-134 : Zip Code 411: An In-Depth Approach to Analyzing, Visualizing and Reporting on Zip Code Level Data
Louise Hadden, Abt Associates Inc.
Wednesday, 3:30 PM – 4:45 PM, Location: Regatta B/C
The SASHELP.ZIPCODE file is a SAS provided data set containing ZIPCODE level information for the United States including ZIPCODE centroids (x, y coordinates), Area Codes, city names, FIPS codes, and more. The file is indexed on ZIPCODE to facilitate processing, and is updated on a regular basis by SAS. SAS Maps Online is a part of the SAS documentation specific to mapping with SAS. It contains a number of useful tools regarding mapping, including both traditional and the GfK map data sets and updates, related data sets (such as SASHELP.ZIPCODE, SASHELP.MAPFMTS, the new GfK WORLD_CITIES_ALL, and WORLDCTS), Geocoding and sample programs. SAS MAPS ONLINE archives older map data sets and versions of SASHELP.ZIPCODE so that users can match their response data and/or software version appropriately. This HOW will demonstrate the various analytic, reporting, and geographic uses of SAS Maps Online and the SASHELP.ZIPCODE file, including producing camera ready visualizations of zip code data. All SAS products used are a part of BASE SAS. Sample data sets and exercises will be provided.
HOW-155 : The Battle of the Titans: DATA Step versus PROC SQL
Kirk Paul Lafler, sasNerd
Richann Watson, DataRich Consulting
Josh Horstman, Nested Loop Consulting
Charu Shankar, SAS Institute
Wednesday, 2:00 PM – 3:15 PM, Location: Regatta B/C
Should I use the DATA step or the SQL procedure to process my data? Which approach will give me the control, flexibility, and scale to process data exactly the way I want it? Which approach is easier to use? Which approach offers the greatest power and capabilities? And which approach is better? If you have these and other questions about the pros and cons of the DATA step versus PROC SQL, this presentation is for you. We will discuss, using real-life scenarios, the strengths (and even a few weaknesses) of the two most powerful and widely used data processing approaches used in SAS (as we see it). We will provide you with the knowledge you need to make that difficult decision about which approach to use to process all that data you have.
HOW-163 : Everything is Better with Friends: Using SAS in Python Applications with SASPy and Open-Source Tooling (Getting Started)
Isaiah Lankham, Legacy Health
Matthew Slaughter, Kaiser Permanente Center for Health Research
Tuesday, 2:30 PM – 3:45 PM, Location: Regatta B/C
Interested in learning Python? How about learning to make Python and SAS work together? In this hands-on training, we’ll practice writing Python scripts using Google Colab (https://colab.research.google.com/), which is a free online implementation of JupyterLab, and also link to SAS OnDemand for Academics (https://welcome.oda.sas.com/) to access the SAS analytical engine. We’ll also learn to use the popular pandas package, whose DataFrame objects are the Python equivalent of SAS datasets. Along the way, we’ll work through common data-analysis tasks using both regular SAS code and Python together with the SASPy package, highlighting important tradeoffs for each and emphasizing the value of being a polyglot programmer fluent in multiple languages. This will include a beginner-friendly overview of Python syntax and data structures. SASPy is a module developed by SAS Institute for the Python programming language, providing an alternative interface to the SAS system. With SASPy, SAS procedures can be executed in Python scripts using Python syntax, and data can be transferred between SAS datasets and their Python DataFrame equivalent. This allows SAS programmers to take advantage of the flexibility of Python for flow control, and Python programmers can incorporate SAS analytics into their scripts and applications. This class is aimed at SAS programmers of all skill levels, including those with no prior experience using Python or JupyterLab. Accounts for Google and SAS OnDemand for Academics will be needed to interact with code examples. All class materials, including complete setup instructions, will be made available through https://github.com/saspy-bffs/wuss-2023-how
HOW-198 : Getting Started with the SGPLOT Procedure
Josh Horstman, Nested Loop Consulting
Thursday, 10:00 AM – 11:15 AM, Location: Regatta B/C
Do you want to create highly-customizable, publication-ready graphics in just minutes using SAS? This workshop will introduce the SGPLOT procedure, which is part of the ODS Statistical Graphics package included in Base SAS. Starting with the basic building blocks, you’ll be constructing basic plots and charts in no time. We’ll work through several different plot types and learn some simple ways to customize each one.
HOW-208 : Share your code with SAS Packages – a Hands-on-Workshop
Bart Jablonski, yabwon
Wednesday, 8:30 AM – 9:45 AM, Location: Regatta B/C
When working with SAS code, especially when it becomes more and more complex, there is a point in time when a developer decides to break it into small pieces. The developer creates separate files for macros, formats/informats, and for functions or data too. Eventually the code is ready and tested and sooner or later you will want to share code with another SAS programmer. Maybe a friend has written a bunch of cool macros that will help you get your work done faster. Or maybe you have written a pack of functions that would be useful to your friend. There is a chance you have developed a program using local PC SAS, and you want to deploy it to a server, perhaps with a different OS. If your code is complex (with dependencies such as multiple macros, formats, datasets, etc.), it can be difficult to share. Often when you try to share code, the receiver will quickly encounter an error because of a missing helper macro, missing format, or whatever… Small challenge, isn’t it? How nice it would be to have it all (i.e. the code and its structure) wrapped up in a single file – a SAS package – which could be copied and deployed, independent from OS, with a one-liner like: %loadPackage(MyPackage). In the presentation an idea of how to create such a SAS package in a fast and convenient way, using the SAS Packages Framework, will be shared. We will discuss: 1) concept of a package, 2) the framework 3) overview of the process, and 4) how to build a package. The intended audience for the presentation is intermediate or advanced SAS developers (i.e. with good knowledge of Base SAS and practice in macro programming) who want to learn how to share their code with others.
HOW-224 : Open-Source Essentials HOW – R and RStudio
Wendy Christensen, University of Colorado Anschutz Medical Campus
Tuesday, 4:00 PM – 5:15 PM, Location: Regatta B/C
In this Open-Source Essentials Hands-On Workshop, attendees will learn the basics of using the R/RStudio environment to create and import data sets, manipulate data, combine data sets, and produce summary statistics. Previous experience with these topics in SAS or other software is useful but not required, and no previous experience with R or RStudio is assumed all levels of experience are welcome. Attendees of this interactive workshop will access workshop material and complete hands-on exercises in RStudio through Posit Cloud (free account required). Posit Cloud requires an internet-connected computer and access to a web browser; no installations required. After the workshop, materials will be available through Github.
HOW-225 : SAS Users GIT Started! A HOW Introduction to GIT.
Zeke Torres, Code629
Wednesday, 10:00 AM – 10:50 AM, Location: Regatta B/C
This intro gives a SAS user the chance to start working smarter and more efficient with GIT. Covers the installation, basics and operation of GIT and local repositories. We will install basic GIT components on users machines and also create a user profile on GITHUB. Where a single or a few users are having difficulties the HOW will still proceed but with other examples to supplement. We will review what remote repositories are that the user will start to encounter. There is also brief mention and overview of team related GIT topics which are covered in the GIT Team course. Topics like: team standards to consider for commits and code edits as well as code review. A user will find this course essential as they create SAS (or any code) especially as the code is shared and/or maintained by more than one person. The examples of best practices and suggested “concepts” come from a SAS code point of view which embraces a “team” and collaborative code ecosystem. Regardless of which software (SAS, Python, R, others) or which interface (SAS EG, GitHub Desktop, etc) this course centers around the “USER” and the benefits to the User from GIT regardless if the rest of the team is using GIT or not. Our typical user will see improvement in “code” and the documentation of the “code” with GIT. Solving issues with “why did i change that?”. The use of GIT will improve the testing and development of code and iterative work. The user will also see how the practices of GIT reduces code inaccuracy because we will be ‘reviewing’ our code in our revision of our overall workflow.
Open Source
OS-143 : Panel Discussion: Benefits, Challenges, and Opportunities with Open Source Software (OSS) Integration
Kirk Paul Lafler, sasNerd
Ryan Lafler, Premier Analytics Consulting, LLC
Joshua Cook, Andrews Research & Education Foundation
Anna Wade, Premier Analytics Consulting, LLC
Stephen Sloan, Accenture
Wednesday, 4:00 PM – 4:50 PM, Location: Coronado E
The open-source world is alive, well and growing in popularity. This panel discussion assembles a group of open-source software (OSS) experts for the purpose of sharing insightful perspectives, opinions, experiences, and ah-ha moments about the benefits, trends, challenges, and opportunities that OSS integration offers. Attendee questions are encouraged, so be prepared for an engaging session about different perspectives and experiences, where ideas are bounced off each other to highlight the many benefits found with open-source software including its flexibility, agility, and talent attraction; the challenges including compatibility vulnerability issues, security limitations, intellectual property issues, warranty issues, and inconsistent developer practices; and the many opportunities coming out of the open-source community including cloud architecture, open standards, and the collaborative nature of community.
OS-154 : From Interactive Mapmaking to Beautiful Geospatial Visualizations: Harnessing the Power of Python and Google Earth Engine for Extracting, Analyzing, and Visualizing High Resolution Spatiotemporal Data
Ryan Lafler, Premier Analytics Consulting, LLC
Anna Wade, Premier Analytics Consulting, LLC
Thursday, 9:30 AM – 10:20 AM, Location: Coronado E
Google Earth Engine is a powerful cloud-based storage platform for accessing publicly available geospatial data from third party sources, including satellite imagery, geophysical, socioeconomic, climatological, census, and meteorological data measured over time for academic-use, personal-use, research, and business applications. Through a combination of beautiful visualizations and easy-to-implement Python code, users will be given the tools to conduct their own analysis with Google Earth Engine. Using the intuitive Python API, along with a suite of visualization packages and map-making libraries available for Python, this paper showcases methods for accessing, querying, extracting, and visualizing Earth Engine’s spatiotemporal data to develop interactive maps. Optimized techniques permitting intensive spatiotemporal analysis on large, complex datasets are introduced through server-side operations in Google Earth Engine. By the end of this Paper, users will feel comfortable setting-up, configuring, and linking Earth Engine to Python, become acquainted with commonly-used formats for storing various types of spatial data, understand methods for querying, selecting, uploading, and exporting datasets from Earth Engine, effectively visualize high resolution spatiotemporal data using Python’s `Geemap` package, and be able to conduct analysis using server-side operations to efficiently complete resource-intensive tasks.
OS-159 : SAS Log Parsing Made Easy with Python
Erin O’Dea, NORC at the University of Chicago
Thursday, 10:30 AM – 10:50 AM, Location: Coronado E
The intention of this presentation is to provide an educational experience for SAS Users who are interested in learning more about using Python in their work. Many SAS users will regularly run the exact same program daily, weekly, or monthly – sometimes they will run 50 or 100 SAS programs on a regular basis. Each of these programs might output a log, and many of those logs may have an error or warning that the user expects to see. While best practice in coding is to completely remove these warnings or errors, there are many situations where it would require a significant amount of re-writing to remove them. SAS users may have inherited this code or may be under time constraints in their work. Python has the ability to read log outputs, and with some clever engineering, can compare those outputs to previous runs and output a list of new warnings or errors. This cuts down substantially on the time it takes to read and interpret SAS log outputs. Without needing to install additional programs or libraries, it is possible (and relatively easy) to parse hundreds of SAS logs within seconds using Python. This is intended for users with a basic understanding of Python programming including loops and arrays.
OS-165 : A Dash of SAS , a Pinch of R: Cooking up Dashboards Using Two Very Different Programming Languages
Joshua Cook, Andrews Research & Education Foundation
Wednesday, 3:00 PM – 3:20 PM, Location: Coronado E
Dashboards are often utilized by businesses, schools, and hospitals as a key tool to organize, analyze, and visualize data that is subject to frequent updates and changes. In this way, effective dashboards enable leaders to make data-informed decisions regarding their organizations. However, there are many approaches to “cooking up” effective dashboards that utilize both proprietary and open-source software, including the SAS and R programming languages. Using base-SAS code as your primary ingredient, one might build an effective dashboard using a colorful combination of the Output Delivery System (ODS), PROC PRINT, PROC SGPLOT, and PROC SGPIE. In someone else’s kitchen, they might include R packages as the main ingredients for their dashboard, such as quarto, readxl, tidyverse, and gt. As Julia Child once said, “You don’t have to cook fancy or complicated masterpieces just good food from fresh ingredients.” With the same notion, this paper offers a direct comparison of two dashboards prepared from two fresh, but very different ingredients: SAS and R. This paper was written with the intent to duplicate the dishes as much as possible given the flavor (strengths) and aromas (weaknesses) of each ingredient, with emphasis being placed on comparing the setup, steps, and code of each.
OS-168 : What’s black and white and sheds all over? The Python Pandas DataFrame, the Open-Source Data Structure Supplanting the SAS Data Set
Troy Martin Hughes, Datmesis Analytics
Thursday, 11:00 AM – 11:50 AM, Location: Coronado E
Python is a general-purpose, object-oriented programming (OOP) language, consistently rated among the most popular and widely utilized languages, owing to powerful processing, user-friendly syntax, and an unparalleled, abundant open-source community of developers. The Pandas library, a freely downloadable resource, extends Python functionality, and has become the predominant Python analytic toolkit. The Pandas DataFrame is the primary Pandas data structure, akin to the SAS data set in the SAS ecosystem. Just as SAS built-in procedures, functions, subroutines, and statements manipulate and interact with SAS data sets to transform data and to deliver business value, so too do Python and Pandas methods, functions, and statements deliver similar functionality. And what’s more, Python does it for free!!! This text demonstrates basic data manipulation and analysis performed on US Census and Centers for Disease Control and Prevention (CDC) data, providing functionally equivalent SAS (9.4M7) and Python (3.10.5) syntax, with the goal of introducing SAS practitioners to open-source alternatives. Discover the fattest counties and states in the US, and do so while learning Python Pandas!
OS-184 : From Data Access to Exploratory Data Analysis My Journey into the World of Python
Leon Davoody, student
Wednesday, 3:30 PM – 3:50 PM, Location: Coronado E
Kids who learn to code with Python can improve their critical and logical thinking and problem-solving skills. They can better understand everything by breaking complex tasks into smaller steps. And by building something from scratch, they can exercise their creativity and see firsthand the result of their efforts. In this paper I will teach you how to basic statistics using Python.
OS-185 : An Overview of the SASSY System
David Bosak, r-sassy.org
Thursday, 8:30 AM – 9:20 AM, Location: Coronado E
This presentation will provide an overview for a set of R language packages called the SASSY system. The SASSY system is designed to make R easier for SAS Programmers. The system includes tools to create a log, declare a libname, perform a datastep, define a format, generate a report, and more. The overall experience makes writing code in R more similar to writing code in SAS. If you are a SAS programmer who wants to be more comfortable and productive in R, then this presentation will be of interest to you. The presentation will be given by David J. Bosak, who is the author of the system.
OS-201 : How Do We Git There? Best Practices for using Git with SAS
Joe Matise, NORC
Wednesday, 2:00 PM – 2:50 PM, Location: Coronado E
Git is a powerful tool for managing version control that is commonly used in professional programming environments as well as by researchers, academics, and students across the globe. It can be, however, somewhat overwhelming for new programmers, particularly those who do not have exposure to it through other languages. In this presentation, we will give a brief introduction to Git, tour a few tools for helping to manage Git workflows, and explain best practices for using Git in a SAS programming environment. This presentation is aimed at novice Git users, regardless of SAS programming level; it requires no previous experience with SAS or Git.
Pharma and Healthcare
PH-116 : Another Glance at Good Programming Practice from the Perspective of FDA Reviewers
Hongbo Li, Amylyx Pharmaceuticals Inc.
Wednesday, 10:30 AM – 10:50 AM, Location: Coronado E
Statistical programs/codes are part of submission packages to regulatory agencies for new drug/biological product applications in the pharmaceutical industry. In the PharmaSUG 2023 conference at San Francisco, one of the FDA panel presentations did raise some critical issues about the submitted SAS programs based on FDA reviewer’s perspective, for example, lack of sufficient comments, lack of necessary information in the program header, not all macros called in the program have been submitted, all analysis performed in one program, lack of usefulness. Therefore, it is essential that the statistical programmers in this industry follow the guidelines of Good Programming Practice (GPP). PhUSE GPP Steering Board is a voluntary industry group with representatives from a diverse array of health and life sciences organizations, who had developed a guidance on GPP in March 2014 which could be found on phUSE Wiki. It highlights the main standards that should be followed and what should be avoided. However, it seems that this GPP guidance is not well spread out or raised attention to the entire industry so that the FDA reviewers have found the above common issues which could be easily avoided if the programmers follow the principles and coding conventions of GPP. This paper will summarize the essentials of GPP combining the above issues aroused by FDA reviewers, illustrate the fundamental principles and coding conventions of GPP, clarify What to Do/Don’t in the program and explore the appropriate layout and header format as well as showing the examples of good and bad codes. It will be helpful to all statistical programmers, especially the beginner programmers, and also the managers who would like to embed GPP into SOP or Work Practice documentation.
PH-160 : A SAS Macro for 30-Year Cardiovascular Risk Predication
Matt Zhou, Kaiser Permanente
Hui Zhou, Kaiser Permanente
Jaejin An, Kaiser Permanente
Wednesday, 11:00 AM – 11:20 AM, Location: Coronado E
Cardiovascular disease (CVD) is the leading cause of death globally, resulting in over 17.9 million deaths annually. It is critical to accurately predict both short-term (10-year) and long-term (30-year or lifetime) cardiovascular risk for earlier detection and intervention. While the 10-year cardiovascular risk prediction tool is commonly used in clinical practice, the use of 30-year risk prediction tool is limited partly because there was no automated programs or applications to assess predictions at a population level. The original 30-year risk calculator was provided as four separated Excel forms to calculate 30-year cardiovascular risk for a single person depending on the predicted events (either hard CVD outcome or full ranges of CVD outcome) and body mass index (BMI) information availability. Specifically, the cumulative incidence at each time point (1,053 rows) were provided to predict a hard CVD outcome (coronary death, myocardial infarction, fatal or non-fatal stroke) separately for the models with BMI and without BMI. To predict a full CVD outcome (hard CVD plus coronary insufficiency, angina pectoris, transient ischemic attack, intermittent claudication or congestive heart failure), 1,340 rows of cumulative incidence at each time point were provided separately for the models with BMI and without BMI. However, they are not very intuitive to apply to a population in a systematic way. To address this gap, we developed a SAS macro to automate the predication calculations at a population level after importing a parameter file including sex, systolic blood pressure, age, diabetes, smoking, treated hypertension, total cholesterol, high density lipoprotein, and BMI. The macro is straightforward to use, which can produce 30-year risk of both hard CVD and full CVD outcomes and when there are available BMI and no BMI. We believe the application of this macro will help develop a population level approach to assess 30-year CVD risk and intervention.
PH-173 : How to Understand Therapeutic Area User Guide for Reactogenicity Events in Vaccine Studies
Yanwei Han, CSLSeqirus
Wednesday, 8:30 AM – 8:50 AM, Location: Coronado E
Vaccine clinical trial safety data submission should include reactogenicity events, unsolicited AEs, medically attended adverse events (MAAEs), and death. Reactogenicity is a set of pre-defined adverse events collected within a prespecified time frame, typically collected on either diary cards or reactogenicity case report forms. Reactogenicity events can be classified as either administration site events (e.g., redness, itching, and pain) or systemic events (e.g., fever, fatigue, vomiting, and headache). Reactogenicity data are required to be represented primarily in the SDTM Clinical Events (CE) with Findings About Clinical Events (FACE) domain and Vital Sign (VS) domain to provide the specific information for each event. In this paper, I will discuss on how to understand Therapeutic Area Data Standards User Guide for Vaccines (TAUG-VAX) three model strategies “Flat Model”, “Nested Model”, and “Highly Nested Model” when mapping diary cards into FACE, VS, and CE domains. How to represent SDTM data when reactogenicity events continue beyond the planned assessment period.
PH-180 : Demystifying the define.xml: Overcoming the challenges of CRT Package.
Yoganand Budumuru, IQVIA
Wednesday, 9:00 AM – 9:20 AM, Location: Coronado E
New regulatory guidelines for electronic submissions have significantly changed the statistical programming life and sometimes it is very tedious and inefficient particularly when we are creating Case Report Tabulation (CRT) package. Regulatory submissions are the most crucial part in clinical trials data analysis and accurate CRT package is needed during submissions for accelerating research investments to help bringing the advantage of new drugs or treatments to patients. Submitting poor quality CRT package would risk the outcome of the trial and hamper the years of research and development including cost escalation as Sponsors are investing huge efforts and billions of dollars in new technologies and solutions to optimize the processes to improve the quality of electronic submissions. Creating Case Report Tabulation (CRT) can come with several challenges. Ensuring consistency and harmonization across different data sources is crucial for accurate CRTs. Creating a comprehensive and well-structured CRT requires clear and concise reporting. It can be challenging to present complex medical information in a standardized format that effectively communicates the key findings, treatment outcomes, and adverse events. Ensuring consistency in the presentation style and adherence to reporting guidelines is essential. Generating CRTs involves significant time and resources. Researchers and healthcare professionals may face constraints due to limited availability, competing priorities, or restricted budgets. These limitations can affect the depth and quality of the case report, leading to potential gaps or limitations in the CRT. Overcoming these challenges requires meticulous planning and addressing common errors and warnings can help address these challenges and improve the quality of CRTs. This paper will describe the preparation of various steps including cleaned specs and addressing some common errors in developing specification and provide recommendations for best practices in planning and preparing CRT packages for regulatory submissions that helps speedy review by regulatory agencies for better outcome sooner.
PH-192 : Navigating SAS(r) and CDISC Certification with Apprenticeship
Sarvar Khamidov, EDAClinical
Wednesday, 9:30 AM – 9:50 AM, Location: Coronado E
EDA is developing an apprenticeship program to address a shortage of SAS and Study Data Tabulation Model (SDTM) specialists. The program focuses on achieving SAS and Clinical Data Interchange Standards Consortium (CDISC) certification in a condensed timeframe of 12- to 24-weeks through immersive learning. The primary aim is to equip participants with the essential knowledge and skills for success as a clinical trials programmer. In this paper, I discuss the effectiveness of this accelerated approach, which is characterized by intensive training sessions, hands-on exercises, group projects, and mentorship, which has spanned more than three months thus far. I share the challenges and achievements during my journey as I evaluate the program’s structure. This case study aims to provide guidance for the development of future clinical talent programs to enhance the work force and the industry.
PH-194 : Quality Control – Defining an Acceptable Quality Standard without Achieving Perfection
Bill Coar, Axio, a Cytel Company
Wednesday, 3:30 PM – 3:50 PM, Location: America’s Cup B/C
In statistical programming, we spend vast amounts of time trying to have a zero-error rate. But time and time again we see that errors still occur even with the gold standard of independent programming. Risk based approaches to quality control have been proposed to emphasize focusing on what matters most. It is human to make mistakes even though there is an inherent drive to achieve perfection. We often spend time trying to be perfect even with the understanding that some issues may be minor and/or inconsequential. Mistakes will continue to happen, but that does not necessarily imply poor quality. In Juran’s Quality Handbook, quality is defined as “fitness for purpose” [reference]. The author suggests that to be fit for purpose, products or services must meet two criteria: (1) the product/service must have the right features to satisfy their needs, and (2) they must be free from failure. While quality control has been a topic of many programming presentations, focus tends to be on (2) without any mention of (1). The purpose of this presentation is to further expand on the concept of “fitness for purpose” and how it can be applied in our day-to-day environment as statisticians and programmers in the pharmaceutical industry. The end goal should be to develop acceptable quality standards to ensure a high quality product or service recognizing that perfection is not achievable. To date, objective criteria have not yet been identified. Our hope is to begin discussion with other industry leaders recognizing that perfection is not required to have a product or service that allows our customers to use our products or services with confidence.
PH-206 : Generation of AE (Adverse Events) summary tables by worst CTC Grade utilizing SAS
Ballari Sen, Bristol Myers Squibb
Wednesday, 10:00 AM – 10:20 AM, Location: Coronado E
Adverse event (AE) analysis is an essential piece in the safety assessment, and AEs are collected in almost every trial and clinical study report [1]. Hence, it is essential that we are clear and very careful while creating these tables and displaying correct counts. AE summary tables by worst CTCAE (Common Terminology Criteria for Adverse Events) grades are tabulated and programmed using MedDRA System Organ Class (SOC) and Preferred Term (PT), and each subject is counted only once within SOC and within PT. Sorting is by descending frequency order of SOC and PT within a treatment arm of interest.[1]. AE tables with additional CTCAE sub-categories (e.g., Grades 3-4, Grade 5), percentage cuts (e.g., >= 5% frequency), and long MedDRA dictionary text (i.e., up to 40 characters) creates additional programming complexity [2]. This paper presents a practical approach for generating different types of AE toxicity grade tables using the SAS software approach. A set of guidelines is presented to simplify the programming process. The methodology aims to facilitate the analysis and interpretation of AE data, enabling researchers and clinicians to make informed decisions regarding patient safety. This paper will further discuss this approach in detail and share the code in getting the work done in an efficient way. This paper also addresses some of the programming validation edit-checks which will further help to cross-check the counts that are generated and displayed.
PH-218 : Visualizing Insights, Empowering Discoveries: SAS Viya Unleashed in Life Science Analytics
Matt Becker, SAS
Wednesday, 9:00 AM – 9:50 AM, Location: America’s Cup B/C
The life sciences industry has recently produced a tremendous amount of data at a quick rate from a variety of sources, including clinical trials, genomics, proteomics, and patient records. For improvements in research, medication discovery, and patient care, it is essential to draw insightful conclusions from this data and make defensible decisions. In this setting, SAS Visual Analytics (SAS VA), which provides sophisticated data visualization, analytics, and exploration capabilities, emerges as a potent tool. This paper explores the application of SAS Visual Analytics in the life sciences sector, highlighting its benefits, features, and real-world use cases.
Professional Development
PD-106 : Mining for SAS Gold
Thomas Mannigel, Self
Wednesday, 4:00 PM – 4:50 PM, Location: Coronado D
Mining for SAS Gold SAS applications can be a gold mine for SAS developers, their managers, and their customers. The author Tom Mannigel knows this because he spent over four decades working as an SAS programmer, building successful SAS applications, some earning millions of dollars. In this paper, he divulges how to find those million-dollar and other high-value SAS applications. The presentation will help you locate those golden SAS opportunities by describing several places to look(dig).
PD-123 : Developing and running an in-house SAS Users Group
Stephen Sloan, Accenture
Tuesday, 4:30 PM – 4:50 PM, Location: America’s Cup D
Starting an in-house SAS Users Group can pose a daunting challenge in a large worldwide organization. However, once formed, the SAS Users Group can also provide great value to the enterprise. SAS users (and those interested in becoming SAS users) are often scattered and unaware of the reservoirs of talent and innovation within their own organization. Sometimes they are Subject Matter Experts (SMEs); other times they are new to SAS but provide the only available expertise for a specific project in a specific location. In addition, there is a steady stream of new products and upgrades coming from SAS Institute and the users may be unaware of them or not have the time to explore and implement them, even when the products and upgrades have been thoroughly vetted and are already in use in other parts of the organization. There are often local artifacts like macros and dashboards that have been developed in corners of the enterprise that could be very useful to others so that they don’t have to “reinvent the wheel”.
PD-132 : Effective Presentations: More than Just PowerPoint
Derek Morgan, Bristol Myers Squibb
Tuesday, 5:30 PM – 6:20 PM, Location: Regatta B/C
Ever dreaded giving a presentation? Have you ever wondered why you get so many blank looks during a presentation? Have you ever looked at someone else’s PowerPoint slides and wished you could communicate as effectively? Have you ever run out of time before you get halfway through a presentation? Then this is the seminar for you. Topics covered will include: how to start; organizing your presentation; developing your presentation; tips for making a bigger impact with PowerPoint; and speaking.
PD-199 : Adventures in Independent Consulting: Perspectives from Two Veteran Consultants Living the Dream
Josh Horstman, Nested Loop Consulting
Richann Watson, DataRich Consulting
Tuesday, 3:30 PM – 4:20 PM, Location: America’s Cup D
While many statisticians and programmers are content in a traditional employment setting, others yearn for the freedom and flexibility that come with being an independent consultant. In this paper, two seasoned consultants share their experiences going independent. Topics include the advantages and disadvantages of independent consulting, getting started, finding work, operating your business, and what it takes to succeed. Whether you’re thinking of declaring your own independence or just interested in hearing stories from the trenches, you’re sure to gain a new perspective on this exciting adventure.
PD-202 : A Beginner’s Step-by-Step Guide to Digital Marketing Data Mining and Analysis Using SAS
Mahsa Tahmasebi Ghorabi, Digital Marketing SAS User
Leon Davoody, student
Wednesday, 11:30 AM – 11:50 AM, Location: Regatta B/C
ABSTRACT Digital marketing leverages search engine optimization (SEO), data analytics, and content management to capture and nurture customer leads. The most used digital channels involve search engines, social media, and email marketing. Each channel generates important customer data which can be analyzed to glean business insights and develop effective inbound and outbound marketing campaigns. This paper will present the “hard skills” in terms of the SAS tools and techniques as well as the “soft skills” in terms of the personal essential skills for the beginner digital marketers. It is important to develop a balanced combination of both types of skills such that aspiring digital marketers can get a head start for a successful career.
e-Posters
PO-114 : Soft Skills to Gain a Competitive Edge in the 21st Century Job Market
Kirk Paul Lafler, sasNerd
The 21st-century economy is converging with existing technologies such as software, the Internet, robotics, and the Cloud with emerging technologies such as artificial intelligence (AI), machine learning, IoT, nanotechnology, and biotechnology. As we collect, curate, and process vast quantities of information at breakneck speeds, the boundaries between humans and machines are blurring. A consequence of this technological revolution is an economy that is changing faster than ever, and organizations, along with today’s workforce, must learn how to adapt. Today’s economy requires it workforce to acquire two types of skills: hard skills or job-related knowledge and abilities to help us perform specific job responsibilities effectively, and soft skills or personal qualities that help us thrive in the workplace. So, what are examples of hard skills? Examples of hard skills include SAS and Python programming, data analysis, project management, and market research. Soft skills on the other hand are not always measurable and consist of non-technical skills that describe the characteristics, attributes, and traits that are associated with one’s personality. Soft skills enable effective and harmonious interaction with others in the workplace and are acquired from the roles and/or experiences one has had. The good news is that soft skills can be learned and, more importantly, provide one with a competitive edge in today’s demanding and evolving workplace.
PO-136 : A Deep Dive into Enhancing SAS/GRAPH and SG Procedural Output with Templates, Styles, Attributes, and Annotation
Louise Hadden, Abt Associates Inc.
Enhancing output from SAS/GRAPH has been the subject of many a SAS paper over the years, including my own, some written with co-authors. The more recent graphic output from the SG procedures is often “camera-ready” without any user intervention, but occasionally there is a need for additional customization. SAS/GRAPH is a separate SAS product for which a specific license is required, while the SG procedures and the Graph Template Language are available to all BASE SAS users. This presentation will explore both new opportunities within BASE SAS for creating remarkable graphic output as well as creating visualizations with SAS/GRAPH. Techniques in SAS/GRAPH, SG procedures and GTL such as PROC TEMPLATE, PROC GREPLAY, PROC SGRENDER, and GTL, SAS-provided annotation macros and the concept of “ATTRS” in SG procedures will be explored, compared, and contrasted. As background, a discussion of the evolution of SG procedures and the rise of GTL will be provided. Sample data and programs will be provided via Github.
PO-139 : Great Time to Learn GTL
Richann Watson, DataRich Consulting
Kriss Harris, SAS Specialists Ltd
It’s a Great Time to Learn GTL! Do you want to be more confident when producing GTL graphs? Do you want to know how to layer your graphs using the OVERLAY layout and build upon your graphs using multiple LAYOUT statement?
PO-140 : When ANY Function Will Just NOT Do
Richann Watson, DataRich Consulting
Karl Miller, IQVIA
Have you ever been working on a task and wondered whether there might be a SAS function that could save you some time? Let alone, one that might be able to do the work for you? Data review and validation tasks can be time-consuming efforts. Any gain in efficiency is highly beneficial, especially if you can achieve a standard level where the data itself can drive parts of the process. The ANY and NOT functions can help alleviate some of the manual work in many tasks such as data review of variable values, data compliance, data formats, and derivation or validation of a variable’s data type. The list goes on. In this poster, we cover the functions and their details and use them in an example of handling date and time data and mapping it to ISO 8601 date and time formats.
PO-144 : Ten Rules for Better Charts, Figures and Visuals
Kirk Paul Lafler, sasNerd
The production of charts, figures, and visuals should follow a process of displaying data in the best way possible. However, this process is far from direct or automatic. There are so many ways to represent the same data: histograms, scatter plots, bar charts, and pie charts, to name just a few. Furthermore, the same data, using the same type of plot, may be perceived very differently depending on who is looking at the figure. A more inclusive definition to produce charts, figures, and visuals would be a graphical interface between people and data. This paper highlights and expands upon the work of Nicolas P. Rougier, Michael Droettboom, and Philip E. Bourne by sharing ten rules to improve the production of charts, figures, and visuals.
PO-152 : Let’s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic
Richann Watson, DataRich Consulting
Lynn Mullins, PPD
As programmers, we are often asked to program statistical analysis procedures to run against the data. Sometimes the specifications we are given by the statisticians outline which statistical procedures to run. But other times, the statistical procedures to use need to be data dependent. To run these procedures based on the results of previous procedures’ output requires a little more preplanning and programming. We present a macro that dynamically determines which statistical procedure to run based on previous procedure output. The user can specify parameters (for example, fshchi, plttwo, catrnd, bimain, and bicomp), and the macro returns counts, percents, and the appropriate p-value for Chi-Square versus Fisher Exact, and the p-value for Trend and Binomial CI, if applicable.
PO-169 : Five Reasons To Swipe Right on PROC FCMP, the SAS Function Compiler for Building Modular, Maintainable, Readable, Reusable, Flexible, Configurable User-Defined Functions and Subroutines
Troy Martin Hughes, Datmesis Analytics
PROC FCMP (aka, the SAS function compiler) empowers SAS practitioners to build our own user-defined functions and subroutines callable software modules that containerize discrete functionality, and which effectively extend the Base SAS programming language. This presentation, taught by the author of the 2023 SAS Press PROC FCMP book, explores five high-level problem sets that user-defined functions can solve. Learn how to hide a hash object (and its complexity) inside a function, how to design a format (or informat) that calls a function, and even how to run a DATA step (or procedure) inside a DATA step using RUN_MACRO! Interwoven throughout the discussion are the specific software quality characteristics such as modularity, flexibility, configurability, reusability, and maintainability that can be improved through the design and implementation of PROC FCMP user-defined functions!
PO-204 : Who’s Bringing That Big Data Energy? A 47-Year Longitudinal Analysis of 30,000 Presentations in the SAS User Community To Elucidate Top Contributors and Rising Stars
Troy Martin Hughes, Datmesis Analytics
This analysis examines presentations at SAS user group conferences between 1976 and 2023. It includes presentations referenced on www.LexJansen.com (aka “the LEX”) during this timeframe, which are drawn from multiple conferences, including: SAS User Group International (SUGI, may she rest in peace), SAS Global Forum (SGF), SAS Explore, Western Users of SAS Software (WUSS), Midwest SAS Users Group (MWSUG), South Central SAS Users Group (SCSUG), Southeast SAS Users Group (SESUG), Northeast SAS Users Group (NESUG), Pacific Northwest SAS Users Group (PNWSUG), and Pharmaceutical Software Users Group (PharmaSUG). This analysis identifies top contributors, including authors who have presented most abundantly at specific conferences, as well as across all conferences. For example, the SAS superstars and most prolific presenters of all time (in order) are recognized Kirk Paul Lafler, Arthur L. Carpenter, Louise Hadden, Charlie Shipp, and Ronald J. Fehd! Rising stars, who may be new to the conference scene, yet are contributing significantly, are also identified. In addition to quantifying and extolling the contributions of these authors, this analysis aims to assist the leaders of future conferences in identifying key speakers to invite. Finally, and perhaps with some irony, Python 3.11.5 (and Pandas 2.1.0) was exclusively used to ingest, clean, transform, and analyze all data.