About four months ago I decided to take my passion for decision science to a new level by pursuing the Certified Analytics Professional (CAP) certification.
Coming from a non-technical background, some people (particularly those with computer science backgrounds) were skeptical of my knowledge and abilities working with large amounts of data and writing predictive models. (Ironically, one of the same data scientists with a heavy CS background inspired a separate post on the pitfalls of common data cleaning procedures.) I feel a relevant certification is a great way to give others confidence in my foundation of knowledge in data analytics.
The CAP seems to be the best branded, most well recognized, and best sponsored option for data science related certifications. In a July 2014 article titled 16 big data certifications that will pay off in CIO magazine, the CAP exam was listed as the first item on the list. Continue reading “About the Certified Analytics Professional (CAP)”
Lack of trust in source data is a common concern with data analytic solutions. A friend of mine is a product manager for a large software company that uses analytics for insights into product sales. He told me the first thing executives and managers do when new analytic products are released in his NYSE-traded, multi-billion dollar company is… manually recalculate key metrics. Why would a busy manager or executive spend valuable time opening up a spreadsheet to recalculate a metric? Because he or she has been burned before by unreliable calculations.
I’ve been exploring the subject of unreliable data since a recent survey of CEOs revealed that only 1/3 trust their data analytics. I have also been studying for an exam next week to earn a Certified Analytics Professional designation to formalize my knowledge on the subject. While studying each step in the analytics process on INFORMS’ analytic process, the sponsoring organization for the Certified Analytics Professional exam, I’ve considered how things could go wrong and result in an unreliable outcome. In the flavor of Lean process improvement (an area I specialized earlier in my career), I pulled those potential pitfalls together in a fishbone diagram:
Continue reading “Should You Trust Analytics III: Analytics Process”
Recently used logistic regression on supersamples from 400,000,000 paired invoices in a payment system to identify the factors that best predict if an invoice was submitted more than once. Some less scrupulous business partners do this in hopes of getting paid twice for the same job. Positive values in the graph increase the probability of an erroneous payment, negative values decrease that probability, and the width of the line surrounding each point provides a 95% confidence interval that is based on the observations.
I expected the invoice number to be a much larger coefficient but it looks like that number is popular to “fudge” for those that are trying to squeeze an extra payment out of a business partner. It also looks like questionable invoices are more often submitted at values less than $5K, so businesses aren’t willing to take the same risks on high value invoices. Is this consistent with what your company has experienced? Has your company used methods other than logistic regression to get different results? I’d love to hear about it!