Audimation Services has been acquired by Caseware International Learn More.

X
Icon


Blog Image

Tech Tip: Duplicate Identification


Raw data is often incomplete, messy and inconsistent. One of the first steps in the data preparation process is the detection and removal of duplicate records. This Tech Tip provides step-by-step instructions for identifying duplicates to improve data quality and integrity.

This example uses an INVOICE FILE, however, these same steps could be applied to other files.

  1. Open: INVOICE FILE
    1. Data tab: APPEND FIELDS
      1. ABS_AMOUNT
        1. Type: Numeric
        2. Size: 2 decimal places
        3. Parameter: @ABS(AMOUNT)
        4. Description: Absolute Amount
      2. PREC_NO
        1. Type: Numeric
        2. Size: 0 decimal places
        3. Parameter: @PRECNO()
        4. Description: Physical Record Number
    2. Analysis tab: DUPLICATE KEY – EXCLUSION *
      1. Create new file: INVOICE REVERSALS
        1. Fields to match (example)
          1. VENDOR
          2. ABS_AMOUNT
          3. VENDOR_INVOICE_NO
          4. INVOICE_DATE
          5. PO_NO
        2. Fields that must be different
          1. AMOUNT
    3. Analysis tab: JOIN*
      1. Create new file: INVOICE FILE – NO REVERSALS
        1. Primary File: INVOICE FILE
        2. Secondary File: INVOICE REVERSALS
        3. Match: PREC_NO
        4. Match Type: RECORDS WITH NO SECONDARY MATCH
  2. Open: INVOICE FILE – NO REVERSALS
    1. Analysis tab: DUPLICATE KEY – DETECTION
      1. Create new file: INVOICE DUPLICATES
        1. Fields to match (example)
          1. VENDOR
          2. AMOUNT
          3. VENDOR_INVOICE_NO
          4. INVOICE_DATE
          5. PO_NO

* NOTE: Systems like SAP may have multiple reversals on a PO, thus removing reversals requires more than one “pass” on these two steps – usually dropping INVOICE_DATE (as a match) on at least one pass.

Try it yourself! This mock data matches the column names used above. There are 14 reversals and 6 duplicates.

Sample Invoice Data File.xlsx

Let’s Go!


Best Practices , CaseWare IDEA , Data Analytics , Tech Tip



Posted By

By Kris Willison
Kris joined the Professional Services team in January of 2015 as a Solutions Specialist. She has an extensive background in Software and Database Development accumulated from thirty years in IT support with twenty years’ experience in database development, cleanup, audit and migration using Microsoft Access. In her time with Audimation, she has received client praise for her “Top Tier” engagement on Monitor and Scripting projects. Kris enjoys looking at problems from new angles to determine the most efficient means of meeting the clients’ needs. Kris has been breeding/showing purebred Balinese cats since 1972 and Oriental Longhairs since 1996. She also hosts one of the largest online pedigree database sites for Siamese and related breeds with nearly 600 users worldwide.


Related Posts
11 Reasons You’ll Love IDEA v11.1!
Jun 24 Simplified Licensing. Increased Power. Abundant Resources.   Being a market leader requires constant innovation. The CaseWare IDEA® product development ...
Best Kept Secrets of IDEA
Jul 20 Compare Databases   The Compare Databases task lets you identify differences in a single Numeric field within two databases (referred to as the primary a...
Use of Data Extraction & Analysis Software in a Financial Statement Audit
Nov 18 William V. Allen   Purpose   The purpose of this paper is to assist audit partners (owners) in understanding how data extraction and analysis ...
BROWSER NOT SUPPORTED

This website has been designed for modern browsers. Please update. Update my browser now

×