Synonymous Labels

Description

  • The synonymous label pattern refers to a situation whereby there is a group of values (of certain attributes in an event log) that are syntactically different but semantically similar
  • This pattern is commonly encountered where an event log has been constructed from multiple data sources with non-standard data schema across the various data sources resulting in similar activities being recorded using different labels in different data sources

Affect

  • Where this pattern affects the activity name, the readability and validity of process mining results is negatively impacted due to the inclusion of activities in the discovered process models that should have been considered as the same activity (as they have the same semantics)
  • This pattern may also impact the performance analysis results through having two or more activities in the log that should be treated as the same activity, actually being considered as separate activities

Data Quality Issues

I22 - Imprecise data: event attributes
  • The existence of multiple names for the same attribute creates ambiguity in an event log

Manifestation and Detection

  • Pattern signature — existence of multiple values of a particular attribute that seem to share a similar meaning but are nevertheless, distinct

Remedy

  • Where syntactic differences between labels are minor, a text similarity search can be applied to group those events that have strong similarity in their labels, and then replace them with a pre- defined value
  • Where the syntactic differences are quite substantial (e.g. ‘DrSeen’ vs. ‘Medical Assign’), the use of an ontology will allow replacement of the labels with just one value (either one of the synonyms can theoretically be used as the label substitute)

Side-effects of Remedy

  • A label could be incorrectly mapped to another label such that the meaning of the original label somewhat deviates from the original meaning (or intent) of the label. This is likely to happen when the ontology used is flawed or when two labels share strong syntactic similarities but differ semantically, e.g. ‘drawn vs. dawn’.