Beyond Desktop Databases

We recently spoke with a experienced auditor  about how his organization has made analytics its top audit-related priority.  This is a very reasonable decision given that work can be automatically executed, documented, and continuously recycled as a “continuous audit” procedure.  The efficiencies are quite appealing to auditors who perform much of their re-work on a quarterly/annual cycle.  Ironically, my colleague noted their primary tool is Microsoft Access (R), which is capable of none of the most valuable benefits.  Let’s examine… Access (R) is limited in how it cannot natively:

  • Produce sharable analytic procedures to import, clean, and transform data;
  • Automatically execute analytic procedures to perform work without human action;
  • Produce logs of multiple processing steps that serve as audit documentation; and
  • Restrict access to private information.

So where is an inspiring analytic auditor to start?

sql

What tools can an experienced decision scientist recommend developing analysts?  We feel SQL is a valuable first step in any analytics career.  SQL (Structured Query Language) is so pervasive that the International Organization for Standardization (ISO) has codified it.   In today’s digitalized world with massive amounts of data being gathered every day and stored into a database, knowing how to query and program with SQL is the most useful tool we can imagine for an analytic auditor.  Lots of people use it, so it’s a transferrable skill.  Furthermore, SQL solutions are strong in many performance areas that are key to analytic auditing, including:

  • Connect to multiple SQL data sources, which is a popular platform for operational data;
  • Produce scripts that perform multiple processing actions and can be shared among different individuals and retained as audit documentation;
  • Provide for access controls to databases, tables, and individual records.

There are multiple “flavors” of SQL, it is used by Microsoft SQL, Oracle, MySql, Amazon’s Redshift, and many many other popular platforms.  Each of these solutions uses a slightly different version of the SQL language because each product has custom functions they have developed to differentiate their products.   But the good news is, these functions are not necessary to perform all of the basic steps in the analytic process.  If you’re organization uses a type of SQL, then we suggest you begin using it and almost all of the skills you learn will be transferrable to the other solutions!  The most important decision is the decision to begin using SQL if you are pursuing a career in analytics.  Learning is not supposed to be comfortable, so just get started! To help you on this journey, we’ve compiled some useful resources:

For more free and valuable content, subscribe to this blog on the top right.

Should You Trust Analytics II: Data Provenance

The process of turning data into information to present it in a simple manner can be incredibly complex.  I believe this irony is primarily because most available data is not formatted for analysis.  Building a large, custom data set with the exact list of features you desire to analyze (Design of Experiments) can be very expensive.  If you have pockets as deep as big Pharma or are ready to dedicate years to a PhD, it’s definitely a great way to go.

Our last blog on trusting data analytics explored how the industry practice of “data cleaning” can spoil the reliability of an entire analysis.  But problems can also occur with perfect, clean, complete, and reliable data.  In this post we will explore the topic of data provenance and how the complexities of data storage can sabotage your data analytics.

Data Provenance 2

The truth is… business data is structured and formatted for business operations and efficient storage.  Observations are usually:

  • Recorded when it is convenient to do so, resulting in time increments that may not represent the events we actually want to measure;
  • Structured efficiently for databases to store and recall, resulting in information on real world events being shattered across multiple tables and systems; and
  • Described according to the IT departments’ naming conventions, resulting in the need to translate each coded observation;

Continue reading “Should You Trust Analytics II: Data Provenance”

Free eBooks for those getting started! (Limited Time)

Learning is a constant part of decision science and, for those looking to advance your analysis skills, it never hurts to have some extra resources.  Microsoft (R) Director of Sales Excellence, Eric Ligman, is offering TONS of free eBooks on their products.

Continue reading “Free eBooks for those getting started! (Limited Time)”

Graphic SQL Reference for Data Wrangling

I don’t mind text-based technical references, but they aren’t for everyone.  So a graphic SQL cheat sheet may help the 65% of the population who are visual learners.

SQL Data Manipulation Language Cheat SheetIn the spirit of collaboration the downloadable versions are available in PNG and PDF formats on my GitHub repository (click “desktop version” from phone). The flowchart covers many common T-SQL data manipulation commands that my team uses regularly in a format that can help quickly build statements from left-to-right with fields, rows, values, etc. that are color coded for easy reference.