Excel is the American cheese of analytics

Joshua Wu, PhD
4 min readMay 26, 2020

Excel is the American cheese of the analytics world.

Like American cheese, Excel is underappreciated and derided as an inferior product to more sophisticated data management and statistical analyses solutions. Excel is a limited software platform that should not be used for purposes for which it is not well suited. But it is useful and can play important roles in your analytics workflow. Here are the three ways I incorporate Excel in my day-to-day statistical analyses.

Store and share results from statistical software. Excel is a useful resource to store results of statistical analyses from your data science platform. Sharing screenshots of model results or copy-pasting results is not only inefficient but can also be confusing to team members not accustomed to reading and interpret model results. Learn how to export model results directly to Excel. Once results are stored in Excel, there is flexibility to format the results for effective sharing. For example, instead of sharing full model results to senior stakeholder, highlight the statistically significant predictors of interest in Excel and hide model output (such as control or interaction terms used to optimize model specification, the standard error of coefficients, or overall model fit) not essential to substantive interpretation of results. Or use Excel’s conditional formatting to “color” result tables and highlight findings with the largest or smallest effects.

Intake and request form. Second, Excel is an effective way for others to share information with me. I ask collaborators to put their data processing and analytics specs into an Excel form. Depending on the function and frequency of use, this request form is templatized with dropdowns and other formatting features to make it easier to complete.

Using Excel as an “order” sheet is beneficial in four ways. First, by enumerating modeling requests and data processing notes clearly on individual survey questions, it facilitates clarity about specific analyses requested. Second, it helps with version control; by using comments and duplicating new tabs to indicate updates, it is easy to track modifications to the analyses by iteration and date. Third, the form is a checklist to ensure that requested analyses have been completed by my team. And finally, much like a patient chart helps different care providers quickly understand the current status of a patient, so an Excel file that tracks previous iterations and already completed analyses helps project handoff and ensures better continuation as different people work on the analyses.

Back-of-the-envelope cocktail napkin. Third, I use Excel to show back-of-the-envelope math and statistical reasoning. It is my virtual cocktail napkin where I can show the math and assumptions used to calculate the final result. This is especially useful when contextualizing results to show real world costs.

Take for example a hypothetical client interested in the costs of absenteeism in their workforce.

From the results of an employee survey, employees report missing an average of 1.21 days a month; annualized, this is equivalent to 14.52 days. I then calculate annual costs per worker by multiplying median hourly wage ($30.07) by average work day (8 hours) by average number of days missed. Finally, by multiplying annual costs per worker (~$3,493) by number of workers (31,491), I calculate the total absenteeism cost of nearly $110 million. By using Excel, I can show the algebra used in each step of the calculation and show proof of how I arrived at the final result. The Excel sheet also enables quick on the fly re-calculation of final outcomes. By inputting a lower possible rate of absenteeism in the above example, I can calculate the costs savings per year so that the client can use that in their cost-benefit analysis to determine what workplace interventions will have a net-positive fiscal impact.

Excel is a useful tool to store and share model results, to record and track requests from others, and a virtual napkin to show back-of-the-envelope math. Like American cheese that should not be used for some purposes (cheese plate, pizza, soup) but whose accessibility and ease of use makes it ideal for others (grilled cheese, classic cheesesteak, mac and cheese), Excel is a limited tool with defined use cases. So instead of being quick to dismiss Excel, examine how it could be useful and consider how it can be used to improve your analytics workflow.

--

--