Distorted Label

Description

  • This pattern refers to the existence of two or more values of an event attribute that are not an exact match with each other but have strong similarities syntactically and semantically
  • This pattern typically occurs through incorrect data entry or inadvertent changing of attribute values during data pre-processing or extraction

Affect

  • The impacts of this pattern on process mining analysis are similar to the ones described for the synonymous label pattern
    • Poor readability and validity of results through inclusion of ‘behaviours’ in the discovered models that should have been merged (as they represent the same activities)
    • Impact on performance analysis through two or more activities on the log that should have been treated as the same activity actually being considered as separate activities

Data Quality Issues

I15 - Incorrect data: activity name
  • The activity label does not accurately reflect the process step that generated the log entry

Manifestation and Detection

  • ‘a/w inv to cls.’ vs ‘a/w inv to cls’
    • Note the existence of a period in the first value
  • ‘XX – Further Information Required’ vs ‘XX – Further Infomation Required’
    • Note the missing ‘r’ in the word ‘Infomation’ in the second value
  • Pattern signature — existence of minor differences in the letters of some attribute values, e.g. ‘Follow-up a call’ vs. ‘follow-up a call’.
  • The presence of this pattern can be detected by either (i) selecting all the distinct values of each attribute in an event log, sorting them alphabetically and checking for multiple consecutive rows with values that are similar but not exactly the same, or (ii) applying string similarity search. This pattern may exist if the string similarity search returns positive results.

Remedy

  • Capitalisation issues are straightforward (e.g. change all attribute values involved into lower case)
  • In case of multiple factors (e.g. missing characters and use of short-hand notations) use string similarity approaches
  • Manual intervention may be required to deal with values syntactically similar but not semantically

Side-effects of Remedy

  • The side-effects of the remedies recommended above should be minimal if manual intervention, as suggested above, is followed properly