IDEA Users Working Remotely: We’re here to help! Please contact our IDEA Help Desk to ensure there is minimal disruption to your daily workflow.


Blog Image

Fighting Fraud with Data Mining & Analysis

Throughout our lives, we search for things. As children, games such as “hide and seek” or “Where’s Waldo?” taught us search-and-find skills that we now use to strategically find our reading glasses or car keys. Those of us who do need reading glasses might also remember looking for the hidden pictures in Highlights magazine, by scanning the pictures back and forth, or in a haphazard way. As auditors searching for fraud in today’s sophisticated business systems, we must take those search-and-find skills to the next level.

Fraud auditing is a proactive approach to detecting fraud. There are two key components to fraud auditing: 1.) using a fraud data mining plan, 2.) using fraud audit procedures. Both work closely together in that if the sample does not include a fraudulent transaction, the audit procedure cannot reveal the fraudulent transaction.

While there are a few individuals in the history of auditing that possessed the unique ability to detect fraud, often referred to as “fraud hound dogs,” the rest of us need a step-by-step methodology to build a fraud data mining plan. The key to fraud detection is to look where the fraud occurs. Sounds simple, but a properly designed fraud data mining plan begins with simply looking for instances where a fraud scenario is most likely to occur, much like a search-and-find game.

Effective fraud data mining also requires awareness, or the ability to interpret the data for the indicators, of the fraud scenario. While the simple fraud scenarios can be detected via a properly designed fraud data procedure, a fraud scenario with a sophisticated concealment strategy requires the ability to see through the concealment strategy. To illustrate this concept, consider the following scenario:

  • When a perpetrator sets up a false vendor and uses their home address, this can be detected with a simple matching routine. The data mining plan would match vendor address to an employee’s address.
  • When a perpetrator uses a PO Box in an out-of-state city, using a mail forward technique, the matching routine would not detect the scheme. The data mining plan would be required to interpret the vendor database on the frequency and pattern of vendor invoice numbers, dates, and amounts. The routine would search for a sequential pattern of vendor invoice numbers or identify a range of vendor invoice numbers that is illogical based on 3 the total dollars or the nature, type or perceived size of the vendor. The routine would correlate the pattern and frequency to a manager or a cost center.

The concept of fraud auditing, or responding to the risk of fraud, is a relatively new concept for auditors. In reviewing the professional literature and audit standards, authors are using words interchangeably to define the same concept. For example, what is the difference between an identified fraud risk, a fraud scheme or a fraud scenario? This article helps define some of these terms and concepts.

What is Fraud Audit?

Fraud audit is the application of audit procedures to a population of business transactions in a manner to increase the propensity of identifying fraud. The first step in a fraud audit is to identify fraud scenarios within the audit scope. By analyzing the likelihood and significant factors linked to a fraud scenario, the auditor then decides which fraud scenarios require an audit response. At this time, the auditor should start developing a fraud data analysis to interrogate the database for a transaction, which is consistent with the data profile of the fraud scenario. This is an important point. Too many auditors are trying to perform fraud data mining routines without truly understanding what exactly the routine is intended to search and find. More importantly, what variation of the fraud scenario of the data mining routine is missing?

What is Fraud Data Analysis?

Fraud data analysis is the process of extracting and interpreting information to identify transactions that are consistent with a fraud data profile that links to an identified fraud scenario. The goal of the analysis is to identify a discreet number of transactions that can be examined using fraud audit procedures. In essence, the sampling approach is a focused bias sampling approach designed to identify a specific fraud scenario. It’s also referred to as a form of discovery sampling. To illustrate the concept, the auditor starts with a million transactions in the population. Through data mining, we filter the population through a funnel and identify the 100 transactions that are consistent with our fraud theory. Then, through fraud audit procedures, we identify the fraudulent vendor. The analysis requires both methodology and data interpretation skills. The first step is to create the data mining report, and the second step is visually to review the report for the red flags of your fraud theory.

What is a Fraud Data Profile?

Developing a fraud data profile is the process of drawing a picture of a fraud scenario with data. The clarity of the picture will depend on the availability and integrity of the information in the database. The fraud data profile will focus on the master file description, the transaction, or both the master file and the transaction.

What is a Fraud Scenario?

The fraud scenario starts with the inherent scheme. The fraud scenario is how the inherent scheme would occur in the company’s business system. The key considerations are to understand the variations of the scenario that are caused by the fraud opportunity, entity variation, transaction variation and scheme variation. For example, one variation of a false billing scheme through a false company is when the accounts payable takes over the identity of a dormant vendor on the database and charges invoices to a large cost center. A second variation is when a manager creates a front company using their authority to trigger accounts payable to process a false invoice and charge the invoice to their cost center. Good data mining develops a specific plan for each scenario. In developing the plan, there will be a logical overlap between the scenarios and the various plans.

Fraud Data Mining Methodology

Our fraud data mining methodology is a structured step-by-step approach to identifying transactions consistent with a fraud scenario, as described through the fraud data profile.

Identify the Inherent Fraud Scheme.

The first step is to establish the scope of the fraud audit. Each business system has five to seven inherent fraud schemes. The audit plan should identify which inherent fraud schemes are within the fraud audit scope. This article focuses on the fictitious vendor and the false billing scheme to illustrate the methodology. False billing is paying for goods or services not provided; and the vendor is serving as a front company.

Build the Fraud Scenario.

The fraud scenarios are how an inherent fraud scheme would occur in a specific company. The auditor must consider the variations of the fraud scheme based on the variations of the entity, opportunity and transactional consideration.

Data mining must be driven by the fraud scenario versus the data mining routine. A good example is when the auditor matches the vendor master file to the employee master file. The purpose of the match was to identify fictitious vendors. However, the only fictitious vendors identified are those vendors where the perpetrator was obtuse enough to use their home 5 address. The design of the routine excludes all fictitious vendors with a concealed address.

Obtain the Data.

In a sense, the concept sounds easy. However, one of the greatest impediments to fraud data mining is the identification and extraction of the data from the IT environment. While IDEA® – Data Analysis Software may be used to convert the data format, you must also consider storage capacity, table identification, data location and IT cooperation to build an effective data mining environment.

Identify and Link the Data to the Fraud Scenario.

One of the most critical stages of the data mining plan is to understand the available data and how to use the data to identify fraud scenarios, often referred to as the data mapping phase of the plan. Data mapping is the process of starting with each field in the database, understanding how the data correlates to the fraud scenario and how to search the data for indicators that link to the scheme. In essence, data mapping is the process of drawing a picture of a fraud scenario with data. Auditors should focus on both master file data and the transactional data associated with the business system.

To illustrate this concept, consider the following scenario where front companies and false billing are used to commit fraud.

The vendor master file, vendor name, address, telephone number, government identification number and bank account number, are useful to identify false vendors. The auditor would search for vendors missing key information, illogical information, and information that matches to other key databases. An inherent assumption is that the accounting department populates the database and the information has integrity. Using the vendor telephone number illustrates the concept of data mapping to identify fraud:

  • Missing telephone number is an indicator of a false vendor
  • Matching telephone number to an employee is an indicator of a false vendor
  • Area codes that are not consistent with the vendor address is an indicator the mail forward technique of a false vendor
  • The first three numbers of a telephone number can be correlated to cell phone number exchanges

The transactional data or the vendor invoice file would be searched for the frequency and pattern of vendor invoices. The key fields are the vendor invoice number, invoice date, and invoice amount. The corresponding purchase order information can provide information regarding vendor invoices that are circumventing the procurement process. Correlating the vendor invoice pattern to the circumvention theory would be an effective data mining routine.

Developing the Data Interrogation Procedures.

The data interrogation plan starts with a fraud scenario. The second step is to determine the extent of data interpretation within the audit process to search for fraudulent transactions. In other words, will the data interrogation look for a fraud scenario with a low sophistication concealment strategy or a high level of sophistication? Our plan focuses on developing data interrogation based on the following concepts:

1. Pattern and frequency. The analysis creates statistical reports by vendor, customer, employee, and transaction type in an attempt to identify an anomaly within the data. You can create these reports using IDEA by utilizing the “Summarize,” “Sort” and “Data Extraction” features.

2. Circumvention strategies. The analysis searches for transactions that exhibit a pattern or frequency which suggests someone was processing transactions below the control threshold. Creating averages, maximum and minimums, and then drilling down on the particular line will highlight these situations.

3. Duplicate analysis. The search routine searches for duplicate information within the data file, or external to the data file, that should not exist. Proceed with caution as this analysis often produces false positives. The duplicate search routines within a file or comparing two files will identify these transactions.

4. Changes. The analysis searches for changes in data that would be consistent with a fraud scenario. Changes such as new, delete, update, and void may all be signs of changes. These transactions maybe located via transaction codes or comparisons to files and two points in time.

5. Illogical. The analysis searches for transactions that do not fit the normal frequency or pattern that would be expected in the data file.

6. Trends. The increasing or decreasing nature of the activity is not consistent with the established norm.

7. Mistakes or an unsophisticated perpetrator. In this phase, the data interrogation is simply looking for errors that would be indicative of a fraud scenario.

8. Data interpretation challenge or the sophisticated perpetrator. In this phase, the auditor will need to create several reports and study the reports for small signs of a fraudulent transaction. An analogy would be trying to see someone in the distance on a foggy day.

9. Master file. The analysis searches for duplications, changes, missing or illogical data patterns.

10. Transactional history. The analysis searches for patterns which are illogical, changes or trends consistent with the fraud scenario.

11. Overt versus covert. Data can be concealed by the way the information is recorded in a data file. (i.e., Postal Box 934, PO Box 934, P.O. Box 934 or Box 934) Using the features of IDEA, the auditor can focus on the number, 934, to provide a match.

The end result of the data mining plan is to identify a fictitious vendor, in which invoices were submitted by an operations manager for services not performed. Creating reports around the proceeding theory will make this process simple.

Normalize the Data.

The intent of this step is to shrink the population through the use of the exclusion and inclusion theory. The exclusion theory is intended to create data files that have a high degree of commonality. In this way, an anomaly becomes more obvious. In the inclusion theory, the search routines are designed to search for data characteristics or red flags consistent with the fraud scenario.

Interpret the Data.

Data mining does not have the mathematical precession of an algebraic formula. The intent is to shrink the population and create a report, which the auditor can review. The key is the auditor’s ability to interpret the data for signs of frequency and patterns consistent with the identified fraud scenario.

Respond to the Indicators.

The auditor will need to develop an audit response to a specific fraud scenario. In the case of a fictitious vendor, the auditor may start with covert audit procedures, such as telephone pretext calls or Web searches. The overt procedure would be a site visit to verify the existence of the vendor. Illustration of a fraud data mining plan. Our fraud scenario is false vendors invoicing for services not performed and our inclusion theory will focus on four different considerations. Our fraud theory is to focus on vendors created by managers within the last four years. The second analysis will focus on vendors missing vendor master file data. The third analysis will focus on matching vendor master file data to our employee master file. The fourth analysis will focus on duplicate vendor master file data. The purpose of the four-part data analysis plan is to create a list of vendors that appear suspicious based on the master file data.

By the nature of the inclusion theory, our analysis will not focus on vendors more than four years old, vendors with changes to key fields, temporary vendors, or vendors established through identity theft schemes. The intent of the exclusion theory is not to exclude the vendor from the audit, but rather exclude the vendor from this specific data analysis. Remember, once one fraudulent vendor is located, the audit will expand to look for other fraudulent vendors.

The second tier of data mining is based on the vendor transactional data. For the selected vendors, the data analysis would focus on a pattern and frequency of vendor invoices that are sequential or illogical. The sequential pattern is easy with IDEA, whereas, the illogical pattern will require a significant amount of auditing work. The next section illustrates this process.

Illustration Using the Vendor Invoice Number.

The vendor invoice number is created by the perpetrator of the scheme. Experience shows the invoice number will have a pattern and frequency when the number is created by an individual.


The vendor invoice numbers will follow a pattern ranging from the sequential, interval, special symbol or letter and random. Sequentially issued invoices are a red flag of a created vendor. The interval approach is based on incrementing the next invoice number by a set number. The special symbol or letter may be used by a perpetrator to distinguish false invoices from real invoices. The random approach is used by a more sophisticated individual that understands that auditors search for patterns.

Frequency of Vendor Invoices.

The number of instances is typically a byproduct of what financial pressures are impacting the individual, the individual’s dollar approval level, the individual’s ability to record the disbursement in multiple accounts and duration that the scheme has occurred. In most cases, the number of invoices will be less than 52 invoices a year. Remember, there are no absolutes regarding the frequency. Variables such as the duration of the scheme, pressures facing the perpetrator, and the perpetrator’s management autonomy will impact the frequency the individual commits the scheme in one year.

Range of the Invoice Numbers.

Within a fiscal year, a company would expect to issue “x” number of invoices. The nature of the industry, size of the company, and the company’s billing practices would all impact the number of invoices issued by the company. In this phase, the beginning invoice number, the ending invoice number, the number of invoices and the dollar amount of the invoices is identified.

In conclusion, effective data mining begins with a properly defined fraud scenario. Search routines help focus on concealment strategies and the resulting “red flags” of the fraud scenario. Data mining can help narrow the population to a manageable size, enabling the application of fraud audit procedures. By using data interpretation, you can develop reports or documentation and interpret the data. More importantly, fraud can be detected using sophisticated “search and find” techniques and effective data analysis software, such as IDEA.

Leonard W. Vona is a financial investigator with more than 30 years of diversified auditing and forensic accounting experience, including a distinguished 18-year private industry career. His firm, Fraud Auditing, Inc., advises clients in areas of litigation support, financial investigations, and fraud prevention. He may be reached at [email protected] or 518-784-2250.

Best Practices , Data Analytics , Fraud

Posted By

By Team Audimation

Related Posts
No Image
Mar 19 We surveyed our extensive database of accounting, auditing and financial professionals to find out what data analysis project or technique has brought them the ...
5 Reasons to Use a Professional Data Analytics Tool
Feb 01 Spreadsheets, like paper, haven’t gone away. And we understand why everyone likes using them for analytics. They come along with the office applications y...
Infusing IDEA into Your Organization
Nov 16 Building a successful and sustainable data analytics program requires a mix of people, processes, and products. At the IDEA Innovations Conference in Houston, l...

This website has been designed for modern browsers. Please update. Update my browser now