Business Analytics Track

Analyzing factors influencing viewer count of TED Talks through Text Analytics ** BEST CONTRIBUTED PAPER **

Yasha Pastaria

starBusiness Analytics Best Paper Winner

The objective of this research paper was to explore the TED Talks data and generate some insights including understanding popularity trends of TED Talks over the years in terms of views, comments and ratings. In addition, this project explored possible drivers of the trend like occupation of the speaker, duration of the ted talk, number of speakers among other items. The analysis will be useful to the consumer in understanding where TED Talks are heading over the years. It will ultimately help them design the best TED Talks and avoid the mistakes of the worst ones.
The data source for the analysis was a Kaggle dataset, TED Talk Data. The main dataset contained metadata about every TED Talk hosted on the website until September 21, 2017.There were 2,550 rows and 14 variables where each row contained data for a particular TED Talk.
SAS Viya and SAS Enterprise Miner were used to conduct the data preparation and cleaning, text analytics and sentiment analysis, which was conducted to determine how viewers felt about the talk. A descriptive analysis examined the factors that affect viewer count and the trends that have been observed over the years in the TED Talks.

Decision Trees: a Gentle Introduction

Richard Hector

Every car is a vehicle, but not every vehicle is a car. Similarly, classification and regression trees (CART) and decision trees look similar. Both begin with a single node followed by an increasing number of branches. However, they serve different purposes. The purpose of a family automobile is different from that of giant mining truck. This paper is a gentle introduction to decision trees using PROC DTREE. When you need to explore the relationship to factors and an outcome, CART is a useful non-parametric tool. The branching algorithm facilitates moving through the variables in the data to determine their effect on the outcome. In contrast, in decision trees the variables are chosen because the decision maker knows or can infer their effects on the outcome. Also, the decision maker knows or can estimate their distribution among relevant groups. Finally, the result (reward or pay-off) of each pathway is known or can be estimated. The goal of a decision tree is to ascertain the most desirable outcome given the combination of variables and costs (in other words, the best pathway). In addition, the amount of risk the decision maker is willing to accept can be incorporated in a decision tree analysis. This paper focuses on an example from medical care. Intermediate level familiarity with the data step is sufficient for understanding this paper.

Loan Default Prediction

Amarjeet Cheema and Sandeep Chitoor

Interest on loans and associated fees are the biggest revenue sources for most of the banks and credit unions. More than 44 million borrowers collectively owe about $3.5 trillion in total outstanding consumer credit as of October 2015. However, more than 1 million people default on loans each year. A report from The Urban Institute, a non-profit research institute, found that nearly 40% of borrowers are expected to default on their student loans by 2023.
Considering the magnitude of risk and financial loss involved, it is essential for banks to give loans to credible applicants who are highly likely to pay back the loan amount. The objective of my project is to assess the likelihood of loan default based on customer demographics and financial data. Furthermore, as an outcome of this project we found the most significant variables that contribute to determining loan default. The project will also categorize loan type based on highest risk of default. Such insights will help the banks in significantly reducing the risk of losing money associated with loans. The original data-set had 887,000 observations and 24 columns, however, it was filtered to only those rows of data for which the loan status was either fully paid or default. The resulting data-set had 209,000 rows with 24 variables. SAS Enterprise Miner was used to create logistic regression and decision trees models with different configurations and SAS Viya was used for data visualization.

Telecom Industry : Customer Churn Prediction

Aakash Dwivedi and amritha purushotham

Nowadays, the telecom industry faces fierce competition in satisfying its customers. With the advent of newer technology, the services offered by telecom companies have increased from being only calls to calls, data and web services. This means a constant struggle to strike a perfect balance among services and pricing of these services. In order to survive this market, telecom companies need to innovate, offer better services and increase its customer base. With newer companies entering the market and increasing freedom of customers to switch telecom companies, it’s now becoming increasingly important to focus resources in retaining existing customers. According to an article in Harvard Business Review (Gallo,2014), it was determined that the cost of acquiring a customer is five to twenty-five times more than retaining an existing one. Furthermore, increasing retention by five percent can lead to an increase in profits by twenty-five to ninety-five percent.
This paper aims to segment customers and find the factors contributing to churn in each customer segment. Customer churn rate was defined as the percentage of customers who end their relationship with a company in a particular period. Additionally, this paper discusses a churn prediction model developed to identify those customers who are likely to churn. The records available for analysis is around 71 thousand records. For the analysis, SAS Enterprise Miner was used. Using the insights from the customer segmentation and prediction models, an action plan was developed for each segment.

Who Do We Need to Follow-Up With? Developing a Process to Track Study Response Rates

Vincent Chan

Low response rates are often a challenge in randomized control trials that bases an outcome measure off data collected through a survey or other forms. In this paper, we go over the process of developing a SAS program to create response rate reports in the context of an evaluation of a teacher professional development program. In this education study, a literacy assessment is hosted online by a testing company, but the project team needs to continually follow-up with participating teachers to ensure their students take the assessment during the testing window. Each week and sometimes ad-hoc, the team (nonprogrammers) requires a generated report to check on test response rates calculated at the treatment, course, teacher, school, and district level. This report is used to determine whether additional follow-up is needed, and if a certain course has reached for an acceptable percentage of completed assessments.

In addition to the development of the process, this paper will briefly highlight macro techniques used to automate response rate calculations and how we utilize the Dynamic Data Exchange (DDE) method to output these rates into an accessible and easy-to-update Microsoft Excel spreadsheet. Lastly, the paper will share successes and challenges in the process, lessons learned in quality assurance and data validation, and possible improvements.