Uncovering Fraud Using Fraud Data Analytics

The days of exploring data, hoping to stumble across a fraud scheme have ended. In fact, auditors are now expected to integrate fraud detection into the audit program. Leonard Vona has used fraud data analytics for more than three decades and shares his proven methodology through books, hands-on courses, presentations, and now, through his blog.

The Uncovering Fraud Using Fraud Data Analytics blog series provides 10 steps for using data analytic techniques to identify fraud schemes. Step six provides best practices for building search routines.

While Leonard Vona refers to programmers, we know that an experienced IDEA user can accomplish the same goals with IDEA’s powerful functionality and VBA-style scripting language.

To start from the beginning and review previous steps, visit: https://www.leonardvona.com/blog.

Step 6 – Building Search Routines

What are the steps to designing a fraud data analytics search routine?
In my third book, I used the analogy that building a fraud data analytic was the same as building a house. The fraud risk statements are the foundation of the house and the fraud data analytics plan was the blueprint for building the house. Step six is the process of actually building the house.

In step six, we are providing the programmer with the system design specifications for writing the code to interrogate the data. The following eight steps are necessary to build data interrogation routines:

Identify the components of the fraud risk statement—the person committing, the type of entity, and the action statement.
Identify the data that relates to the fraud risk statement.
Select the strategy consistent with the scope of the audit, sophistication of concealment, degree of accuracy (exact, close or related), and the nature of the test.
Based on the data availability, reliability, and usability of the data cleans the data set for overt errors.
Identify the logical errors that will occur with the test.
Create your homogeneous data sets using the using inclusion and exclusion theory.
Establish the primary selection criteria, followed by the remaining selection criteria.
Create the test using the programming routines to identify all entities or transactions that meet the testing criteria.

Data interrogation steps one – four are discussed in the previous blogs. Therefore, this blog will focus on data interrogation steps five – eight. I need to stress, jumping directly to step eight is a recipe for disaster.

5. Logical Errors

Data is not perfect. There are anomalies caused by many factors; the goal of this step is to anticipate the type of errors that will occur. The plan should either determine if the false positives can be minimized through the data interrogation routine or whether the auditor will need to resolve the false positive through document examination.

Logical errors will occur because of input error or data integrity, the method of input by different employee’s and the day to day pressure to process a transaction will cause short cuts, etc. All of these factors will create false positives. If you do not have a plan for resolving false positives at the programming stage, then the field auditor allocates time to hunting down false positives versus finding evidence of the fraud risk statement.

6. Homogenous data sets

The inclusion/exclusion theory is a critical step in building the fraud data analytics plan. The inclusion is the data that are consistent with the fraud data profile and the exclusion is the data that are not consistent with the fraud data profile. The theory is consistent with shrinking the haystack. Whether or not the fraud auditor actually creates separate files is a matter of style, whereas, the concept of inclusion/exclusion is necessary for identifying anomalies.

The importance of the inclusion and exclusion step varies by the nature of the inherent fraud scheme, the fraud data analytics strategy and the size of the data file.

Let’s assume the vendor master file has 50,000 vendors. 5,000 vendors are inactive. The first homogenous data set would be only active vendors. The fraud risk statement is a shell company created by an internal source. The data interrogation procedure focuses on missing data as the primary selection criteria. This test identifies 100 vendors meeting the search criteria.

The transaction file contains a million vendor invoices. Should we test all million invoices for shell company attributes or only those invoices that meet the shell company missing criteria test? The inclusion theory would only select those transactions for the 100 vendors identified in the missing analysis.

7. Selection Criteria

In the selection criteria, there are two fundamental strategies. The first is to identify all entities or transactions that meet criteria. The purpose of the test is to exclude all data that do not meet the criteria. Since that test operates on one criterion, the sample population tends to be large, although much smaller than the total population. The auditor then can use either a random selection or auditor judgment on selecting the sample. The advantage is that the auditor has improved the odds of selecting a fraudulent transaction.

The second strategy is to select all data that meet the testing criteria, referred to as the fraud data profile. The selected strategy is a key criterion in selecting the sample:

Specific identification. The sample should be the transactions that meet the criteria.
Control avoidance. The sample should be the transactions that circumvent internal control.
Data Interpretation. The sample is based on the auditor’s judgment.
Number anomaly. The sample is based on the number of anomalies identified and auditor judgment.

So, what is the difference between the two strategies? The first strategy uses an exclusion theory to reduce the population whereas; the second strategy uses an inclusion theory as a basis for sample selection. Remember, after identifying all transactions meeting the criteria, data filtering can be used to shrink the population.

8. Start the programming

It is interesting to see how different individuals program the software to create data interrogation routines. Since programming is software dependent, I offer the following strategies to avoid faulty logic in the design of the search routine:

Flowchart the decision process prior to writing the search routine. The order of the searching criteria will impact the sample selection process.
Create record counts of excluded data and then reconcile the new control count to the calculated control count. It is easy to reverse the selection criteria, thereby excluding what should have been included. The reconciliation process helps avoids this error.
Perform a visual review of the output. Ask yourself, does the result seem consistent with your expectations?
Create reports that can function as a working paper. Remember, there needs to be sufficient information to locate the documents. Reports with too many columns are difficult to read on the screen and are difficult to read in a printed format.

A note of appreciation, I have had the opportunity to work with two of the best programmers, Jill Davies and Carol Ursell from Audimation. Their skill set was critical to the success of my fraud data analytics projects.

Visit Leonard Vona’s Fraud Auditing, Detection, and Prevention Blog.

Best Practices , Fraud

By Leonard Vona
Leonard W. Vona is the CEO of Fraud Auditing, Inc. He is a forensic accountant with more than 38 years of diversified fraud auditing experience, including a distinguished 18-year private industry career. His firm, Fraud Auditing, Inc., advises clients in areas of litigation support, financial investigations, fraud detection and fraud prevention.

Ready, Set, Script!

Mar 21 More than ever, auditors are taking on the role of programmers; writing scripts to run automated analysis and tests on a regular basis for the organization. Luc...

Analytics: It's All About Insights & Efficiency

Jul 21 Technology impacts every aspect of business. It has the ability to increase productivity, do more with fewer resources and deliver meaningful insights. Both int...

Tech Tip: Understanding Join and Visual Connector

Jun 19 Using IDEA’s Join and Visual Connector features can help you search for matches and correlations between different data sets, but they are often confused with...