Raw data is often incomplete, messy and inconsistent. One of the first steps in the data preparation process is the detection and removal of duplicate records. This Tech Tip provides step-by-step instructions for identifying duplicates to improve data quality and integrity.
This example uses an INVOICE FILE, however, these same steps could be applied to other files.
Open: INVOICE FILE
Data tab: APPEND FIELDS
ABS_AMOUNT
Type: Numeric
Size: 2 decimal places
Parameter: @ABS(AMOUNT)
Description: Absolute Amount
PREC_NO
Type: Numeric
Size: 0 decimal places
Parameter: @PRECNO()
Description: Physical Record Number
Analysis tab: DUPLICATE KEY – EXCLUSION *
Create new file: INVOICE REVERSALS
Fields to match (example)
VENDOR
ABS_AMOUNT
VENDOR_INVOICE_NO
INVOICE_DATE
PO_NO
Fields that must be different
AMOUNT
Analysis tab: JOIN*
Create new file: INVOICE FILE – NO REVERSALS
Primary File: INVOICE FILE
Secondary File: INVOICE REVERSALS
Match: PREC_NO
Match Type: RECORDS WITH NO SECONDARY MATCH
Open: INVOICE FILE – NO REVERSALS
Analysis tab: DUPLICATE KEY – DETECTION
Create new file: INVOICE DUPLICATES
Fields to match (example)
VENDOR
AMOUNT
VENDOR_INVOICE_NO
INVOICE_DATE
PO_NO
* NOTE: Systems like SAP may have multiple reversals on a PO, thus removing reversals requires more than one “pass” on these two steps – usually dropping INVOICE_DATE (as a match) on at least one pass.
Try it yourself! This mock data matches the column names used above. There are 14 reversals and 6 duplicates.
By Kris Willison Kris joined the Professional Services team in January of 2015 as a Solutions Specialist. She has an extensive background in Software and Database Development accumulated from thirty years in IT support with twenty years’ experience in database development, cleanup, audit and migration using Microsoft Access. In her time with Audimation, she has received client praise for her “Top Tier” engagement on Monitor and Scripting projects. Kris enjoys looking at problems from new angles to determine the most efficient means of meeting the clients’ needs. Kris has been breeding/showing purebred Balinese cats since 1972 and Oriental Longhairs since 1996. She also hosts one of the largest online pedigree database sites for Siamese and related breeds with nearly 600 users worldwide.
Jul 20
While no two audits are the same, most auditors follow the same processes and strive to improve performance. Whether your organization has formal or informal be...
Jun 19
Become a Certified IDEA Data Analyst (CIDA) and/or Certified IDEA Script Expert (CISE)
CaseWare Analytics announces a professional development program to inc...
Jul 18
Importing data into IDEA is a popular topic. But sometimes, it might be useful to take those IDEA databases and work on them in another application such as Tabl...
BROWSER NOT SUPPORTED
This website has been designed for modern browsers. Please update. Update my browser now