The days of exploring data, hoping to stumble across a fraud scheme have ended. In fact, auditors are now expected to integrate fraud detection into the audit program. Leonard Vona has used fraud data analytics for more than three decades and shares his proven methodology through books, hands-on courses, presentations, and now, through his blog.
The Uncovering Fraud Using Fraud Data Analytics blog series provides 10 steps for using data analytic techniques to identify fraud schemes. Step six provides best practices for building search routines.
While Leonard Vona refers to programmers, we know that an experienced IDEA user can accomplish the same goals with IDEA’s powerful functionality and VBA-style scripting language.
To start from the beginning and review previous steps, visit: https://www.leonardvona.com/blog.
What are the steps to designing a fraud data analytics search routine?
In my third book, I used the analogy that building a fraud data analytic was the same as building a house. The fraud risk statements are the foundation of the house and the fraud data analytics plan was the blueprint for building the house. Step six is the process of actually building the house.
In step six, we are providing the programmer with the system design specifications for writing the code to interrogate the data. The following eight steps are necessary to build data interrogation routines:
Data interrogation steps one – four are discussed in the previous blogs. Therefore, this blog will focus on data interrogation steps five – eight. I need to stress, jumping directly to step eight is a recipe for disaster.
Data is not perfect. There are anomalies caused by many factors; the goal of this step is to anticipate the type of errors that will occur. The plan should either determine if the false positives can be minimized through the data interrogation routine or whether the auditor will need to resolve the false positive through document examination.
Logical errors will occur because of input error or data integrity, the method of input by different employee’s and the day to day pressure to process a transaction will cause short cuts, etc. All of these factors will create false positives. If you do not have a plan for resolving false positives at the programming stage, then the field auditor allocates time to hunting down false positives versus finding evidence of the fraud risk statement.
The inclusion/exclusion theory is a critical step in building the fraud data analytics plan. The inclusion is the data that are consistent with the fraud data profile and the exclusion is the data that are not consistent with the fraud data profile. The theory is consistent with shrinking the haystack. Whether or not the fraud auditor actually creates separate files is a matter of style, whereas, the concept of inclusion/exclusion is necessary for identifying anomalies.
The importance of the inclusion and exclusion step varies by the nature of the inherent fraud scheme, the fraud data analytics strategy and the size of the data file.
Let’s assume the vendor master file has 50,000 vendors. 5,000 vendors are inactive. The first homogenous data set would be only active vendors. The fraud risk statement is a shell company created by an internal source. The data interrogation procedure focuses on missing data as the primary selection criteria. This test identifies 100 vendors meeting the search criteria.
The transaction file contains a million vendor invoices. Should we test all million invoices for shell company attributes or only those invoices that meet the shell company missing criteria test? The inclusion theory would only select those transactions for the 100 vendors identified in the missing analysis.
In the selection criteria, there are two fundamental strategies. The first is to identify all entities or transactions that meet criteria. The purpose of the test is to exclude all data that do not meet the criteria. Since that test operates on one criterion, the sample population tends to be large, although much smaller than the total population. The auditor then can use either a random selection or auditor judgment on selecting the sample. The advantage is that the auditor has improved the odds of selecting a fraudulent transaction.
The second strategy is to select all data that meet the testing criteria, referred to as the fraud data profile. The selected strategy is a key criterion in selecting the sample:
So, what is the difference between the two strategies? The first strategy uses an exclusion theory to reduce the population whereas; the second strategy uses an inclusion theory as a basis for sample selection. Remember, after identifying all transactions meeting the criteria, data filtering can be used to shrink the population.
It is interesting to see how different individuals program the software to create data interrogation routines. Since programming is software dependent, I offer the following strategies to avoid faulty logic in the design of the search routine:
A note of appreciation, I have had the opportunity to work with two of the best programmers, Jill Davies and Carol Ursell from Audimation. Their skill set was critical to the success of my fraud data analytics projects.