About the Certified Analytics Professional (CAP)

About four months ago I decided to take my passion for decision science to a new level by pursuing the Certified Analytics Professional (CAP) certification.

CAP Logo

Coming from a non-technical background, some people (particularly those with computer science backgrounds) were skeptical of my knowledge and abilities working with large amounts of data and writing predictive models.  (Ironically, one of the same data scientists with a heavy CS background inspired a separate post on the pitfalls of common data cleaning procedures.)  I feel a relevant certification is a great way to give others confidence in my foundation of knowledge in data analytics.

The CAP seems to be the best branded, most well recognized, and best sponsored option for data science related certifications.  In a July 2014 article titled 16 big data certifications that will pay off in CIO magazine, the CAP exam was listed as the first item on the list. Continue reading “About the Certified Analytics Professional (CAP)”

Statistical Version of 100 Year War

After 100+ years of being silent on the inadequacies of the statistic behind many “statistically significant” conclusions, the ASA published a new statement harshly criticizing p-values online last week. Here’s a link for those who are interested, but a short synopsis follows: http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108

The ASA’s actual statement starts on page 8 and includes the following statements:

“Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither.” Ouch. Continue reading “Statistical Version of 100 Year War”

Visualized Correlations

One interesting approach to root cause analysis is to correlate descriptive variables about errors with one another.  I created this correlogram to visualize every possible combination of correlation coefficients among observations from a large information system.  At the intersection of two numbers is a square that represents the correlation of those two variables across hundreds of observations.


Blue shows a positive correlation, red represents a negative, and darker saturation signifies a stronger relationship.  What trends that might give insights to the root causes?  I chose to explore variables 14 (vertical blue trend), 25 (horizontal), and 27 (horizontal).

The analysis was performed in Excel and also in R using the correlogram package.