Journal of Accountancy on Analytics

While trying to get this blog to appear higher in internet searches, I ran into an interesting article on the Journal of Accountancy discussing Analytic Auditing.  The article is written from an external audit perspective and focuses on two benefits.  First analytics provides auditors with greater insights into their clients’ business that help to quickly get up to speed on the external audit customer’s business model.  Also the article mentions how analytics provide better service to Continuous Auditing Processclients.

The article also states a position that I’ve had since learning analytics as an external auditor… that external auditors’ use of analytics lags far behind that of internal auditors.  I think a key reason for this is access to and familiarity with data.  As an external auditor it took several weeks for me to gain access to a new client system. Once the client granted my access, I didn’t have much time to pull something useful together.  Rarely did my projects have more than one or two models.  As an internal auditor, I’ve had the same difficulties getting initial access to the system.  But once granted access I can continue to develop models as long as my results are useful to the organization.  On certain projects/systems this period lasted for several years and allowed for deep exploration and understanding.

To me one of the most impactful sentences from this article is “The profession [external auditing] needs to achieve a “quantum leap” to redesign audit processes using today’s technology, rather than using information technology to computerize legacy audit plans and procedures. ”

See more at: http://www.journalofaccountancy.com/issues/2015/apr/data-analytics-for-auditors.html#sthash.QvfD7Lqt.dpuf”

Visualized Correlations

One interesting approach to root cause analysis is to correlate descriptive variables about errors with one another.  I created this correlogram to visualize every possible combination of correlation coefficients among observations from a large information system.  At the intersection of two numbers is a square that represents the correlation of those two variables across hundreds of observations.

2015-05-12.correlogram2

Blue shows a positive correlation, red represents a negative, and darker saturation signifies a stronger relationship.  What trends that might give insights to the root causes?  I chose to explore variables 14 (vertical blue trend), 25 (horizontal), and 27 (horizontal).

The analysis was performed in Excel and also in R using the correlogram package.

Using Regression to Predict Duplicate Payments

Recently  used logistic regression on supersamples from 400,000,000 paired invoices in a payment system to identify the factors that best predict if an invoice was submitted more than once.  Some less scrupulous business partners do this in hopes of getting paid twice for the same job.  Positive values in the graph increase the probability of an erroneous payment, negative values decrease that probability, and the width of the line surrounding each point provides a 95% confidence interval that is based on the observations.

Duplicate_Pmt_Diagnostics2

I expected the invoice number to be a much larger coefficient but it looks like that number is popular to “fudge” for those that are trying to squeeze an extra payment out of a business partner.  It also looks like questionable invoices are more often submitted at values less than $5K, so businesses aren’t willing to take the same risks on high value invoices.  Is this consistent with what your company has experienced?  Has your company used methods other than logistic regression to get different results?  I’d love to hear about it!

Graphic SQL Reference for Data Wrangling

I don’t mind text-based technical references, but they aren’t for everyone.  So a graphic SQL cheat sheet may help the 65% of the population who are visual learners.

SQL Data Manipulation Language Cheat SheetIn the spirit of collaboration the downloadable versions are available in PNG and PDF formats on my GitHub repository (click “desktop version” from phone). The flowchart covers many common T-SQL data manipulation commands that my team uses regularly in a format that can help quickly build statements from left-to-right with fields, rows, values, etc. that are color coded for easy reference.