Enterprise Architecture

Hitting the Data Quality Bullseye

Exploring the forces that affect Data Quality and how training, process design and application logic can prevent data quality issues from arising.

DATA IS AT THE CORE of every business. Data Quality is a consequence of many factors. Applications act on data, processes act on applications, people interact with applications via processes.

Data Quality Bullseye

Figure 1 – Hitting the Data Quality ‘Bulls Eye’

Forces Affecting Data Quality

Data Quality Forces

Figure 2 – Factors affecting Data Quality (abridged)

Measuring Data Quality for Ongoing Improvement: A Data Quality Assessment Framework (The Morgan Kaufmann Series on Business Intelligence)

People have varying abilities, interest levels, workloads, levels of business understanding and experience. Their role in the Information System is critical. In most circumstances they are the source of the data and responsible for its input into (or consumption from) the Information System. Their ability to fulfil this important role is variable and compounded by:

  • Business Process Complexity
  • Training Levels
  • Availability of exemplars
  • Intuitive nature of processes
  • Variability of processes
  • Constraints within processes (e.g. number of paths, degree of control applied)
  • Quality of online help
  • Review processes and audits
  • Compliance mandates and consequences of non-compliance
  • Learning curves relating to processes and use of applications
  • Clear definitions of roles and responsibilities
  • Churn rates
  • Change (e.g. legislation, process changes, application changes)
  • Design quality of applications
  • An agreed common language

Principles of Data Management: Facilitating Information Sharing

The Data Quality Blame Game

Data Quality is a consequence of interacting variables. When Data Quality drifts:

  • Application designers argue from a position of “Garbage in / Garbage out
  • Process and Application designers promote the “stupid or careless users” argument
  • Users blame process complexity and lack of intuitive functionality in applications (as well as over-work and lack of training)
  • Recipients of reports based on tainted data ‘panic’
  • Managers commence a ‘quest for the Golden Hammer’ in an attempt to find a single simple fix (which is forlorn).

Promoting Data Quality

Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program (The Morgan Kaufmann Series on Business Intelligence)

Tackling these issues requires an Information Systems focused approach (respecting key variables across People, Processes and Applications). Think about how best to formulate Data Quality countermeasures by focusing iteratively on the weakest links. High-level steps:

  • Quantify the actual Data Quality issue (sometimes hearsay is a false amplifier of the real size of the problem)
  • Prioritise data items where Data Quality is paramount (those linked to KPI’s, statutory reporting, financial reporting and forecasting for example). Non-business critical data items could be tackled in later phases
  • Hypothesise cause of Data Quality drift for prioritised data items
  • Substantiate hypotheses – e.g. check against actual circumstances (focus groups, user interviews, remediation programmes in place)
  • Check for compound problems – e.g. lack of training and only very occasional use of a specific process will often lead to inconsistent outcomes
  • Look for badly designed applications (semantic confusion, no input validation or mandatory fields)
  • Check governance and compliance processes. Are there any penalties for non-compliance with that mandated?
  • Check for badly integrated applications/processes – e.g. manual re-keying of data or disconnected processes (e.g. system A doesn’t notify system B of an important data change)

Most importantly, do not be drawn towards the simplistic conclusions of the “Data Quality Blame Game.”

Information Systems work (much like an orchestra) when all parts are in harmony. The interaction of users with processes and applications is non trivial. Data Quality is a consequence, and hence a symptom of misalignment of variables listed above. Focus on what needs to be tuned, then tune again!

Further Reading on Data Management and Data Design

By Steve Nimmons

Steve is a Certified European Engineer, Chartered Engineer, Chartered Fellow of the British Computer Society, Fellow of the Institution of Engineering and Technology, Royal Society of Arts, Linnean Society and Society of Antiquaries of Scotland. He is an Electric Circle Patron of the Royal Institution of Great Britain, a Liveryman and Freeman of London and serves on numerous industry panels. He is a member of Chatham House, the Royal United Services Institute and the Chartered Institute of Journalists.