About four months ago I decided to take my passion for decision science to a new level by pursuing the Certified Analytics Professional (CAP) certification.
Coming from a non-technical background, some people (particularly those with computer science backgrounds) were skeptical of my knowledge and abilities working with large amounts of data and writing predictive models. (Ironically, one of the same data scientists with a heavy CS background inspired a separate post on the pitfalls of common data cleaning procedures.) I feel a relevant certification is a great way to give others confidence in my foundation of knowledge in data analytics.
The CAP seems to be the best branded, most well recognized, and best sponsored option for data science related certifications. In a July 2014 article titled 16 big data certifications that will pay off in CIO magazine, the CAP exam was listed as the first item on the list. Continue reading “About the Certified Analytics Professional (CAP)”
The process of turning data into information to present it in a simple manner can be incredibly complex. I believe this irony is primarily because most available data is not formatted for analysis. Building a large, custom data set with the exact list of features you desire to analyze (Design of Experiments) can be very expensive. If you have pockets as deep as big Pharma or are ready to dedicate years to a PhD, it’s definitely a great way to go.
Our last blog on trusting data analytics explored how the industry practice of “data cleaning” can spoil the reliability of an entire analysis. But problems can also occur with perfect, clean, complete, and reliable data. In this post we will explore the topic of data provenance and how the complexities of data storage can sabotage your data analytics.
The truth is… business data is structured and formatted for business operations and efficient storage. Observations are usually:
- Recorded when it is convenient to do so, resulting in time increments that may not represent the events we actually want to measure;
- Structured efficiently for databases to store and recall, resulting in information on real world events being shattered across multiple tables and systems; and
- Described according to the IT departments’ naming conventions, resulting in the need to translate each coded observation;
Continue reading “Should You Trust Analytics II: Data Provenance”
Lack of trust in source data is a common concern with data analytic solutions. A friend of mine is a product manager for a large software company that uses analytics for insights into product sales. He told me the first thing executives and managers do when new analytic products are released in his NYSE-traded, multi-billion dollar company is… manually recalculate key metrics. Why would a busy manager or executive spend valuable time opening up a spreadsheet to recalculate a metric? Because he or she has been burned before by unreliable calculations.
I’ve been exploring the subject of unreliable data since a recent survey of CEOs revealed that only 1/3 trust their data analytics. I have also been studying for an exam next week to earn a Certified Analytics Professional designation to formalize my knowledge on the subject. While studying each step in the analytics process on INFORMS’ analytic process, the sponsoring organization for the Certified Analytics Professional exam, I’ve considered how things could go wrong and result in an unreliable outcome. In the flavor of Lean process improvement (an area I specialized earlier in my career), I pulled those potential pitfalls together in a fishbone diagram:
Continue reading “Should You Trust Analytics III: Analytics Process”
Just one-third of 400 surveyed CEOs responded that they trust their data analytics according to KPMG’s 2016 CEO Outlook article. This is an astonishingly low rate that the decision science industry should take as a wake-up call to shine light on their processes. Auditors should also take note because most of these CEOs are from companies that have invested heavily in data analytics.
Continue reading “Should You Trust Analytics?”
Learning is a constant part of decision science and, for those looking to advance your analysis skills, it never hurts to have some extra resources. Microsoft (R) Director of Sales Excellence, Eric Ligman, is offering TONS of free eBooks on their products.
Continue reading “Free eBooks for those getting started! (Limited Time)”
I don’t mind text-based technical references, but they aren’t for everyone. So a graphic SQL cheat sheet may help the 65% of the population who are visual learners.
In the spirit of collaboration the downloadable versions are available in PNG and PDF formats on my GitHub repository (click “desktop version” from phone). The flowchart covers many common T-SQL data manipulation commands that my team uses regularly in a format that can help quickly build statements from left-to-right with fields, rows, values, etc. that are color coded for easy reference.