Data Interrogation: What’s Hiding in Your Organization’s Data?

Due to the Corona virus pandemic of 2020, my presentation for the Chicago chapter Institute of Internal Auditors (IIA) 60th Annual Seminar was cancelled. It was an honor to have my proposal selected and I hope to have many more opportunities to speak about data analytics in the future. In the meantime, and amidst the chaos, I figured I could at least share the knowledge contained in my presentation here!

Having been an IT internal auditor for the past two years, I have attended many seminars and training workshops throughout the Chicagoland area. A key trend I noticed beginning in 2018 was how many people were starting to talk about data analytics in the realm of audit. I found this very interesting as I had just shifted my career from IT and software development to IT auditing. Fast forward two years later and I have developed over 50 analytics projects and 100’s of scripts. These projects range from analyzing millions of employee expense transactions to developing strategic metrics from more than $2 billion in accounts payables. I have also automated the control testing and sampling processes in the general IT control (ITGC) environment for change management, user access reviews and provisioning, employee terminations, etc. and used analytics to support observations for numerous operational audits.

Throughout my data analytics journey, I have found that most presentations given at audit seminars and training workshops do not go beyond the initial stages of building and gaining support for data analytics programs. Everyone knows you need management’s support, the right tools, and the right employees. You can easily find articles proclaiming the “Top 25 Analytics for every Organization” online and many of the analytics software vendors have active communities and provide an abundance of materials and support. With all of this in mind, I wanted to focus my presentation on what happens once you have an analytics program in place. How do you continue to grow the program and what are some innovative ways for beefing up the value-add factor? I wanted to share the most important lessons learned but mainly engage the audience, challenge their thinking, and spark new ideas for them to leave with and hopefully implement.

Agenda

  • Data analytics in the context of audit
  • Practicing good data hygiene
  • Interrogation tools and techniques
  • Visualizations – Don’t lose the value!
  • Lessons learned

Learning Objectives

  • Develop and deploy data interrogation techniques
  • Add value by analyzing decentralized data sets
  • Tell stories and visualize results effectively
  • Apply lessons learned from developing data analytics programs

“What gets measured, gets managed.” Peter F. Drucker, American Economist and Author

Data Analytics Meets Audit

How do we define data analytics as auditors?

  1. Data: A piece of information.
  2. Analytics: Discovering, interpreting, and communicating meaningful patterns in data.

In audit terms: Data analytics refers to discovering useful information and supporting decision-making through the process of extracting, cleansing, transforming, and visualizing data.

Prepping for Interrogation

Auditors may need basic knowledge of:

  1. Querying and manipulating data (SQL, Data Conversion)
  2. Cleaning and normalizing data (Errors, Duplication)
  3. Different data types (Structured vs. Unstructured)
  4. Establishing defined relationships between datasets (Keys and Joins)

Auditors should also be able to eliminate “noise,” visualize the data, understand correlation versus causation, and recognize when not to use data.

Focusing the Light

Use interrogation tactics to help determine the root cause of non-compliance, policy gaps, opportunities for enhancement, and where further investigation may be required.

Caution: Discuss the use of data analytics in relation to SOX testing with external auditors and be thorough in documenting procedures.

Developing Good Data Hygiene

The following items are useful techniques and common functions available in most analytics tools for normalizing data.

  • Non-alphanumeric characters – REPLACE(), OMIT()
    • Replace or remove characters, e.g., ” – ” and ” ‘ “
  • Justification – ALLTRIM()
    • Removes leading and trailing spaces from characters
  • Varying lengths – SUBSTRING()
    • Shorten or lengthen field lengths
  • Leading zeros – ZONED()
    • Converts numeric data to character data and adds leading zeros
  • Varying decimal places – DECIMALS()
    • Specifies the number of decimal places for a numerical field

Data Interrogation Tools

Data interrogation tools allow users to manipulate (interrogate) data without affecting the data’s integrity.

Basic Functions:

Common Data Analytics Functions
These functions are common in most data analytics tools.

Additional Features:

Many analytics tools offer audit logging, scripting and graphical user interfaces (GUIs), and have database connector functionality.

Finding the Fraud

It is important to note that data interrogation is not ALWAYS focused on identifying fraud. There are many other ways in which data interrogation techniques can be applied to support the audit function. As shown below, internal audit departments account for only 15% of fraud detection.

Fraud Tips Percentages
Source: ACFE 2018 Report to the Nations

Cross-Examination

Distinguishing between two types of fraud:

  1. On the books – Easier to identify; it is in your data!
    • Control: Bidding process, etc.
  2. Off the books – Almost impossible to identify
    • Controls can be circumvented (under the table payments, etc.)

Data analytics can be used to assess the risk of off the books activity and add value by providing management with data trends and historical analysis.

Simple but Effective…

Data Trend for Historical Analysis
Data trend used in fraud investigation.

Data Interrogation Techniques

Iterating Through Data:

  • RECOFFSET() – Returns a field value from a record that is a specified number of records away from the current record.

Scenario: Project Spending Review

Identify when and how long a project was over budget.

  • Sort by project and ascending transaction date (oldest to newest)
  • Condition: (RECOFFSET(P,1) = P) AND Running_Total > Auth_Amt
  • Return Value: “Spend Greater than Approved”
RECOFFSET() Analysis
By sorting each project by spend date and calculating a running total, we can use the RECOFFSET() function to read the records down the table and identify at which point the running total surpasses the authorized amount column for each project number.

Know Your Enemy:

Scenario: Are we effectively monitoring gift card purchases?

  • Create a list of known vendors who sell gift cards
  • Look for round dollar amounts
  • Cross-tab is your friend (better visualize results)
  • Monitor and update
Gift Card Vendors and Cross-tab Results
Left: List of known vendors who sell gift cards | Right: Cross-tab results

Assigning a Risk Rating:

Using the STATS() function:

  1. Define variables
  2. Assign values
  3. Perform analysis
  4. Extract results

Building a risk rating into analytics scripts increases value for defining risk and auditing. In my experience, applying a risk rating methodology to sampling and test lead to a 30% increase in testing efficiency.

Risk Rating Script Excerpt
Building a risk rating into scripts using the standard deviation and average unit cost.

The Law is on Your Side

Benford’s Law Analysis

Benford’s law is an observation about the leading digits of the numbers found in real-world data sets. Intuitively, one might expect that the leading digits of these numbers would be uniformly distributed so that each of the digits from 1 to 9 is equally likely to appear. It is often the case that 1 occurs more frequently than 2, 2 more frequently than 3, and so on.

Scenario: T&E Expense Review – Looking for outliers

  • Leading two digits – (LEADING(Expense_Amount, 2) = “xx”‘)
Benford's Law Analysis
Using Benford’s Law analysis on the first 2 digits of the expense amount, it is possible to identify transactions that fall outside of the uniform distribution. We can apply our data interrogation techniques to these buckets to draw conclusions, i.e. $30 and $40 transactions above.

Expense Insights:

  1. Baggage/Change/Flight Fees (not airfare) = 27%
  2. Parking Fees and Tolls = 14%

What about the Report?

When implementing continuous monitoring, it is important to trend the results.

  • Keep it simple
  • Use unique record keys (primary keys)
  • Cross-tab
Trending Data Analytics Results
Top: Excerpt of script to flag transactions resulting from analytics. Bottom: Trending results using the crosstab method.

Interrogating Decentralized Data

Decentralized Data
Data can exist in a multitude of locations throughout an organization.

Overcome the Obstacles First

  • Inconsistent data
  • Duplicated data
  • Varying data formats
  • Data availability

Where is my Data Hiding?

  • Compare the file directories of multiple locations.
    • ITGC Application: Change Management
  • Compare each application’s users to the termination list periodically
    • ITGC Application: Terminations and Timeliness Review
  • Compare user access across multiple applications and platforms
    • ITGC Application: User Access Review
  • Compare customer agreement terms across business segments
    • Application: Enhanced Operational Review

Interrogating Decentralized Data Continued

Below shows the process of taking a text file containing the directory contents of an application, resulting from a PowerShell script, and formatting the text file in Excel using the legacy text wizard. After performing our data hygiene steps, we then perform analysis using a data interrogation tool to review change management for the application files by comparing them to the files in the main production instance.

Change Management Analysis

Interrogating Decentralized Data Continued

Fuzzy Logic and Levenshtein Distance

  • Effective when analyzing inconsistent data
  • Apply after normalization
  • Can vary the level of accuracy of results

Ex: [FUZZYJOIN LEVDISTANCE DISTANCE 2]

  1. 1.1234MCCLENNANCTCOLUMBUSIN47002
  2. 1.1234MCCLENNANSCTCOLUMBUSIN47002

Flag the results: MATCH((UPPER(ALLTRIM(ADDRESS1))), (UPPER(ALLTRIM(ADDRESS2))) = F

Extracting the Value

Reflect and Assess

  • Did I ask enough questions to understand the business process?
  • Do my results make sense? (Did I involve process owners?)
  • What is the most effective visual for my data and my audience?
  • In what context should I present the results?
  • Are my analytics repeatable?
  • Are there actionable items from the results?

Telling the Story

Incorporating External Datasets

Scenario: Can we improve the efficiency of vendor spending relative to our geographic locations?

Process:

  • Map vendor master file addresses to latitude and longitude coordinates
    • US Cities Latitude and Longitude Database – SimpleMaps
  • Map company location addresses to latitude and longitude coordinates
  • Calculate distances using the Great-Circle Distance formula in Excel
  • Import into visualization tool

Creating Visualizations

There are many different visualization tools available. Microsoft PowerBI, Tableau, etc.

  • Be a resource
  • Give management a tool
  • Plug and play with new data
  • Approach from the value add standpoint
  • Let others draw their own conclusions
Interactive Vendor and Supplier Distance Map
Interactive Vendor and Supplier Map

Lessons Learned

  1. Ask questions about your data
  2. Verify the completeness and accuracy of your data
  3. Be as detailed as possible when requesting datasets
Concur Expense Form
Use the names of specific form fields when requesting datasets if possible.

Do not Trust the System!

Data Hygiene Example
By simply inserting a table before beginning the interrogation process, we discover some of the data has been shifted during extraction by the addition of two extra columns.

Live in Harmony

Bring together your data of varying file formats, naming conventions, and columns, and transform them into one cohesive dataset.

  • Create common fields with the same length
  • Pay attention to time and date formats
  • Have a standard number of decimal places
  • Track calculated and computed fields

Iterate, Iterate, Iterate

  • Use scripts and macros
  • Have a standard project template
  • Use comments and version control
  • Leverage reports
  • Make the process repeatable
Script Template
Sample script template.

Conclusion

  1. Invest time in getting to know your data
  2. Use interrogation techniques to preserve data integrity
  3. Document and iterate
  4. Be skeptical

PCAOB – Data Technology – Task Force

Leave a comment