Recently used logistic regression on supersamples from 400,000,000 paired invoices in a payment system to identify the factors that best predict if an invoice was submitted more than once. Some less scrupulous business partners do this in hopes of getting paid twice for the same job. Positive values in the graph increase the probability of an erroneous payment, negative values decrease that probability, and the width of the line surrounding each point provides a 95% confidence interval that is based on the observations.
I expected the invoice number to be a much larger coefficient but it looks like that number is popular to “fudge” for those that are trying to squeeze an extra payment out of a business partner. It also looks like questionable invoices are more often submitted at values less than $5K, so businesses aren’t willing to take the same risks on high value invoices. Is this consistent with what your company has experienced? Has your company used methods other than logistic regression to get different results? I’d love to hear about it!