Managing Process Model Complexity via Abstract Syntax Modifications
Marcello La Rosa, Petia Wohed, Jan Mendling, Arthur H.M. ter Hofstede, Hajo A. Reijers and Wil M.P. van der Aalst
As a result of the growing adoption of Business Process Management (BPM) technology different stakeholders need to understand and agree upon the process models that are used to configure BPM systems. However, BPM users have problems dealing with the complexity of such models. Therefore, the challenge is to improve the comprehension of process models. While a substantial amount of literature is devoted to this topic, there is no overview of the various mechanisms that exist to deal with managing complexity in (large) process models. It is thus hard to obtain comparative insight into the degree of support offered for various complexity reducing mechanisms by state- of-the-art languages and tools. This paper focuses on complexity reduction mechanisms that affect the abstract syntax of a process model, i.e. the structure of a process model. These mechanisms are captured as patterns, so that they can be described in their most general form and in a language- and tool-independent manner. The paper concludes with a comparative overview of the degree of support for these patterns offered by state-of-the- art languages and language implementations.
Index Terms—Process model, pattern, complexity, understand- ability, process metric.
Business Process Management (BPM) is increasingly rec- ognized as an overarching approach to improve performance at an operational level. Companies typically utilize BPM technology to reduce costs, save cycle time, and react to changes in a more agile way. While many BPM concepts have already contributed to various business improvements in industrial practice, there are still significant challenges, which need to be addressed by BPM research.
One of the challenges in this context relates to complexity management of process models. The capability of a process model to be easily understandable plays an important role for the success of process redesign projects . Business process models in practice often contain dozens of activities and complex behavioral dependencies between them. An increase in size and complexity of a business process model beyond certain thresholds can lead to comprehension problems by its stakeholders. For complex models it becomes difficult to validate it, to maintain it, and to utilize it as a means of communication.
The empirical connection between complexity and process model understanding has been demonstrated in recent publica- tions (e.g. , , , , , , ), as much as mechanisms have been proposed to alleviate specific aspects of complexity (e.g. , , ). However, what is lacking is a systematic classification of the various operations that exist for reducing complexity in process models. A comprehensive account of such mechanisms would contribute to improved support for complexity management in process modeling lan- guages, standards and tools. A corresponding classification may be beneficial to research and practice, for instance to initiatives towards process modeling language standardization, to academic and industrial tool evaluation, and to vendors seeking to incorporate innovative features in their tools.
In this paper we address this research gap by compiling a collection of patterns, which define an extensive range of desired capabilities. The approach of capturing design knowledge as patterns has been used in various engineering disciplines including architecture, software engineering, and workflow modeling . The patterns described in this paper capture mechanisms for managing process model complexity. They stem from the literature, process modeling language specifications, and tool implementations.
Essentially, mechanisms for managing complexity of pro- cess models can be defined on two different levels : (a) concrete syntax of a model and (b) abstract syntax of a model. The concrete syntax of a process models deals with its visual appearance including symbols, colors and position, and is also referred to as secondary notation . A collection of patterns for concrete syntax modifications has been presented in . These patterns include mechanisms for arranging the layout, for highlighting parts of the model using enclosure, graphics, or annotations, for representing specific concepts explicitly or in an alternative way, and for providing naming guidance. The abstract syntax of a process model relates to the formal structure of process elements and the relationships among them. The patterns presented in this paper work on the abstract syntax and complement the patterns collection for concrete syntax modifications presented in . They relate to model operations such as transforming a model into a set of modules or omitting elements to provide a more abstract view on the process. Clearly, a change on the structure of a process model, also indirectly affects the model's visual appearance.
In this paper we aim for a language-independent description of abstract syntax related patterns. Each pattern is accom- panied with a discussion of its intended effect on model complexity and of different realizations to clarify its scope and to demonstrate its relevance. The pattern description is complemented by an overview of its support in tools and modeling languages, which sheds light on its comparative strengths and weaknesses. Additionally, we evaluate each of the patterns from a usability perspective as perceived by BPM practitioners.
The paper is structured accordingly. Section II describes and justifies the methodology, which we used to identify the patterns. Section III presents the collection of patterns in detail. Section IV evaluates the pattern support of various process modeling languages and tools. Section V presents the results of a usability evaluation with BPM practitioners. Section VI discusses related work while Section VII concludes the paper.
In this paper we identify patterns to reduce the model complexity on the level of the abstract syntax, i.e., the goal is to simplify the structure of the process model.
The original idea to organize design knowledge in terms of patterns stems from the architect Christopher Alexander, who collected rules and diagrams describing methods for constructing buildings . In this context, a pattern provides a generic solution for a recurring class of problems. The general idea of design patterns has been introduced to information systems engineering by Gamma, Helm, Johnson and Vlissides , who describe a set of recurring problems and solutions for object-oriented software design. The design patterns by Gamma et al. inspired many patterns initiatives in computer science, including the Workflow Patterns Initiative .
The patterns for abstract syntax modifications, which are defined in this paper, have been collected though a series of steps. The starting point has been an extensive analysis of the BPM literature, as well as the specifications and standard proposals that are managed by organizations such as OASIS, OMG, W3C and WfMC. Subsequently, we inspected commercial BPM tools and the operations they offer for modifying abstract syntax. The initial set derived in this way was presented to a panel of experts, which resulted in a further identification of two additional patterns. This extended set was evaluated with respect to their support by reported research approaches, languages, and tools with the goal to distinguish those which are most frequently used. We decided to focus on those patterns that are at least supported by five research approaches/languages/tools, which resulted in a final set of 12 patterns. Finally, we evaluated this final set on its ease of use and usefulness by a group of nine BPM professionals.
Each of the 12 patterns in the final set is illustrated in this paper by the use of BPMN (Business Process Model and Notation), an industry standard for modeling processes . Figure 1 shows the notational elements of BPMN which are used in this paper. The example models are intentionally kept simple such that they can be understood without deep knowledge of this standard.
When referring to a model, we use the term model element to indicate any element which has a corresponding concept in the language's meta-model. Model elements can be nodes (e.g. a task, a gateway or a business object) or arcs. We also use the term fragment to indicate a set of model elements in a process model that are organized via control-flow relations, and the term module to indicate a process model which is part of a larger business process (e.g. a subprocess or the portion of model enclosed by a lane).
We use a fixed format to document the twelve patterns and to discuss their support in languages, tools and in the literature. This format contains: (a) description, (b) purpose, (c) example, (d) metric, (e) rational and (f) realization of a pattern. The purpose describes the use case in which the pattern is commonly used, while the rationale provides a justification grounded in the literature, as to why a given pattern reduces the complexity of the process model it is applied to. Moreover, we relate each pattern that operates on a process model to a desired improvement of a structural metric. In fact, it has been shown that certain structural metrics can be related to ease of making sense of process models . For example, intuitively, the smaller the size of a model, the easier it is to understand it. We discuss the following metrics in this context:
- module size, the number of nodes in a module;
- model size, the summed size of all modules in a process model ;
- repository size, the summed size of all models in a process model repository;
- models, the number of models in a process model repository ;
- depth, the number of modular levels appearing in a process model ;
- diameter, the longest path from a start to an end element in a process model ;
- average gateway degree, the number of nodes a gateway in a specific process model is on average connected to ;
- structuredness, the restructuring ratio of an unstructured model to a structured variant of it ;
- modules overhead, the ratio between modules and model size;
- fan-in, the average number of references to a module .
For example, Fig. 5 shows a BPMN model consisting of three levels with one root module and three subprocess modules. This model has the following metrics: depth=3, modules=4, model size=25, average gateway degree=3, etc. Similarly, we can infer metrics for specific modules. For example the root module has module size=9 and diameter=8.
III. PATTERNS FOR ABSTRACT SYNTAX MODIFICATION
From an analysis of relevant BPM languages, tools and approaches, we identified twelve patterns operating on the abstract syntax of a process model and classified them according to the hierarchy in Fig. 2. The patterns are categorized into two main groups: Model modification (including patterns that directly modify a process model or set thereof) and Language Modification (including patterns that have more profound changes because they affect the underlying process language). Model modification includes Behavior Abstraction and Behavior Neutral patterns. Behavior Abstraction includes those patterns that operate on a single model and provide a more abstract one as a result. Omission simply skips elements of the original model while Collapse aggregates a set of elements into a single, semantically more abstract element. Be- havior Neutral patterns preserve the behavior being described in a single model or in a set of models, but organize this behavior in a different representation. Restructuring refers to transformations that reorganize the control-flow of a process model in a more understandable way, either in terms of Block-Structuring or Compacting the process model, while Duplication introduces model element redundancy in order to simplify its structure. Three Modularization patterns, Vertical, Horizontal and Orthogonal, capture different ways in which a process model is decomposed into modules. Two Integration patterns, namely Composition and Merging, refer to features for combining information which is scattered across different modules or models into a single one. While Composition uses references among different modules or models to achieve integration, Merging relies on an overlap of elements. Finally, Meta-model Modifications involve Restriction and Extension.
Similar to our previous work in , we now report the results of evaluating several languages and tools against their support for the identified patterns. The languages we selected for this evaluation are mainstream process modeling languages deriving from standardization efforts, large-scale adoptions or established research initiatives. Specifically, we chose three languages for conceptual process modeling (UML ADs 2.3, eEPCs and BPMN 2.0) and four languages for executable process modeling (BPMN 2.0, BPEL 1.2/2.0, YAWL 2.2 beta and Protos 8.0.21). For each language, we also evaluated one supporting modeling tool. For UML ADs we tested Sparx's Enterprise Architect 9; for eEPCs we tested ARIS Business Architect 7.2 from Software AG; for BPMN we tested Sig- navio Editor 5.0.0; for BPEL we tested Oracle's JDeveloper 22.214.171.124.0; for YAWL we tested the YAWL Editor and Rules Editor; and for Protos the Protos Editor 8.0.2/BPM|one from Pallas Athena.
Table 11 shows the results of the analysis, where tool evaluations are shown next to the evaluations of the supported languages, except for Protos, where the language cannot be separated from its tool, because it is vendor specific. For a tool, we measured the extent by which it facilitates the support for a pattern, as it is offered by the corresponding language. We ranked a tool with a '-' if it offers no support for a pattern; with a '+/-' if the support is only partial or if it is the same as that offered by the corresponding language; and with a '+' if the support goes beyond that offered by the language, i.e. if the tool actually facilitates the application of a pattern. Accordingly, for Duplication we ranked Signavio, JDeveloper and Protos with a '-'. In particular, in Signavio call activity tasks cannot be linked with global tasks thus this pattern is not actually supported. For Vertical Modularization, we ranked the YAWL Editor with a '+/-' as it is not possible to navigate from a parent model to a subprocess. Enterprise Architect took a '+/-' for Horizontal Modularization as it supports the concept of UML Partition (to create parallel modules), but not that of Activity Edge Connector (to create sequential modules). For Restriction we gave a '+/-' to Signavio as it only provides two predefined meta-model filters which cannot be customized, while for Extension we gave a '+/-' to ARIS as it only allows renaming of elements and attributes of the eEPC meta-model, but not the addition of new concepts.
Six of the twelve patterns that we identified in this paper (i.e. Block-Structuring, Compacting, Composition, Merging, Omission and Collapse) are not applicable to languages since they refer to tool features only (we crossed the corresponding cells for the languages in Table 11). For these patterns, we rated a tool with a '+' if it supported the pattern, and with a '- ' otherwise. As per BPEL, even if this language is essentially block-structured, tool features could still be provided to block- structure or compact the content of a Flow activity (e.g. by removing redundant Link arcs). However, since JDeveloper does not offer any such feature, we rated this tool with a '- ' along these two patterns. In fact, these six patterns are not supported by any of the evaluated tools, except for ARIS which caters for Composition. This indicates a clear immaturity of process modeling tools for such advanced features, which have only been explored in research.
One would expect that a tool generally offered wider pattern support than the respective language. Indeed, this is what we observed in the benchmark of tools for the concrete syntax pat- terns . However for the abstract syntax patterns the results are more varied. ARIS supports more patterns than eEPCs, Enterprise Architect supports as many patterns as UML ADs,while Signavio, JDeveloper and the YAWL Editors support less patterns than their respective languages. The reason for such different level of sophistication among the tools evaluated may be twofold. First, these tools have different maturity (for example, ARIS and Enterprise Architect have been around much longer than the others). Second, the major difference in support between BPMN 2.0 and Signavio is likely due to the fact that BPMN 2.0 has only been standardized recently (a few months before the time of writing). Thus we cannot yet expect a high level of maturity for its supporting tools. This difference in patterns support between BPMN 2.0 and Signavio is even more evident because BPMN 2.0 is the language that supports the greatest number of patterns, out of the languages being evaluated. This is clearly a reflection of the evolution of business process modeling languages.
V. USABILITY EVALUATION
For the evaluation of the usability of the patterns we drew inspiration from the technology acceptance model  and its adaptation to conceptual modeling . This theory postulates that actual usage of an information technology artifact— patterns in the case of this paper—is mainly influenced by the perceptions of potential users regarding usefulness and ease of use. Accordingly, a potential user who perceives a pattern to be useful and easy to use is likely to actually adopt it.
We conducted a series of focus group sessions with professionals to discuss the patterns, in a similar vein as in our earlier work . Altogether, 9 process modeling experts participated in these sessions, which took place in Stockholm (5 participants) and in Berlin (4 participants). On average, the participants had close to 10 years experience with process modeling. When we asked them, they estimated that in the past 12 months each on average analyzed slightly over 200 models, while having created over 60 models in this same period. The typical size of such a model would include 20 tasks. Due to this extensive involvement in both process model analysis and development, the participants can be considered as genuine experts in the field.
We used a questionnaire with seven-point scale items adopted from  to measure the participants' perceptions on usefulness and ease of use for each of the patterns. Our earlier pattern evaluation using the same set of questions  already pointed at the high internal consistency of these questions, which is a measure for the reliability of this evaluation. Indeed, the computation of Cronbach's alpha as ex post reliability check for this evaluation provided the values 0.91 for useful- ness and 0.89 for ease of use, which confirm our confidence in this instrument. The boxplots in Fig. 12 and 13 display the outcomes of the data analysis for the patterns' usefulness and ease of use respectively (in a boxplot the median value is shown as a horizontal line in a box, which represents the interval between the lower and upper quartiles).
As displayed by Fig. 12, all patterns are perceived to be useful (each median at least equals 4). The patterns that received the highest scores are patterns 4 (Vertical Modularization) and 11 (Restriction), the latter even receiving the maximum appraisal by all but 2 of the participants. Figure 13 shows that ease of use is overall considered positively as well, with median values of 5 or more for all but patterns 6 (Orthogonal Modularization) and 8 (Merging).
In the follow-up discussion with the participants, we invited them to explain their scores. One of the Stockholm participants commented on Pattern 11 (Restriction), which received a high score on usefulness, stating that it is "very efficient" and often applied by this person in modeling workshops. The other pattern that received explicit praise for usefulness in the discussion was Pattern 3 (Compacting), as it was recognized to reduce confusion of name similarities in large models. By contrast, one of the Berlin participants was rather critical about Pattern 6, noting that it was complicated to understand and to apply. As he said, "If at all, this might be relevant for very large repositories of executable process models". We observe that the boxplot for this particular pattern covers a wide spectrum of the evaluation scale for usefulness (see Fig. 12), meaning that the opinions on this pattern vary. Indeed, two other participants expressed their belief in the value of this pattern, in particular to separate exceptions from the normal flow. These combined insights point at a potentially more restricted usefulness of Pattern 6 in comparison with the other patterns.
In summary, the focus group sessions support the statement that the patterns can be considered useful and in general easy to use. It is interesting to note that in comparison to our earlier evaluation of patterns for concrete syntax modifications , tool support for the abstract syntax patterns was not considered an issue. This may very well point at a greater adoption of these patterns in praxis.
VI. RELATED WORK
This paper should be seen as continuation of the work presented in  where we described and evaluated eight patterns for concrete syntax modification. The goal of these patterns is to reduce the perceived model complexity without changing the abstract syntax, i.e., the goal is to simplify the representation of the process model without changing its formal structure. An example of such a pattern is Layout Guidance, i.e., the availability of layout conventions or advice to organize the various model elements on a canvas. The other seven patterns described in  are Enclosure Highlight, Graphical Highlight, Pictorial Annotation, Textual Annotation, Explicit Representation, Alternative Representation and Naming Guidance.
The twelve patterns presented in this paper complement the patterns of  as they operate on the abstract syntax of process models. For example, duplicating a task to make the model structured changes the abstract syntax whereas modifying the layout does not.
Many authors have worked on functionality related to abstract syntax modifications as is illustrated by the many references provided when describing the possible realizations of such patterns. However, we are not aware of other approaches that systematically collect patterns to improve the understandability of process models.
There have been other approaches to analyze the expres- siveness or completeness of BPM languages and systems, e.g., the workflow patterns framework , , Bunge, Wand and Weber's (BWW) framework , and the Semiotic Quality Framework (SEQUAL) . The workflow patterns ,  provide a language-independent description of control-flow, resource, data and exception handling aspects in workflow lan- guages. Their development started as a bottom-up, comparative analysis of process modeling languages and tools, with the purpose to evaluate their suitability and determine similarities and differences among them. This analysis does not include mechanisms to improve the understandability and reduce the complexity of process models.
The BWW framework  refers to Wand and Weber's tai- loring and application of Bunge's ontology  to information systems. It was initially used for the analysis and comparison of conceptual modeling languages, and later, it was also used for the analysis of process modeling languages , . However, the BWW framework lacks conceptual structures central to process modeling such as various types of splits and joins, iteration and cancelation constructs, and different forms of concurrency restrictions. Thus, despite its utilization in practice, its suitability for evaluating process modeling lan- guages can be questioned. Also, its application as a theoretical foundation for conceptual modeling has been criticized .
The SEQUAL framework  introduces and reasons about different aspects relevant to model quality. These aspects span different quality notions, including physical, empirical, syntactic, semantic, pragmatic and social quality. Particularly relevant for our work are the empirical quality, which deals with readability matters such as graph aesthetics, and the pragmatic quality, which deals with the understanding of a model by its audience. SEQUAL has been used for the evaluation of process modeling languages , and has later been extended to deal specifically with the quality of process models . Nonetheless, the authors themselves acknowledge SEQUAL's "disability (sic) to facilitate precise, quantitative evaluations of models" [60, p. 101]. In contrast, the patterns collections presented in  and in this paper provide a concrete means to evaluate the pragmatic and empirical quality of process modeling languages and supporting tools (while focusing on understandability).
Finally, our work is related to , where a theory of general principles for designing cognitive-effective visual no- tations is proposed. Specifically, our Modularization patterns can be seen as an implementation of the Principle of Com- plexity Management, which prescribes the provision of mod- ularization and hierarchical abstraction to deal with model complexity. However, our patterns collection also provides other mechanisms related to complexity management besides modularization. Our Restriction pattern is related to the Prin- ciple of Graphic Economy, according to which the number of different graphical symbols should be controlled in order to be "cognitively manageable". In fact, as a result of restricting a meta-model, the number of symbols available will also be restricted. This is particularly valid for languages with an extensive number of graphical symbols such as eEPCs and BPMN, where Restriction could be applied to filter-out irrelevant symbols for particular audiences. Further, Extension is related to the Principle of Cognitive Fit, which prescribes the use of different dialects for different audiences. In fact this pattern can be used to create an audience-specific process modeling dialect by extending a process' meta-model.
The main contribution of this paper is a systematic analysis of abstract syntax modifications for reducing process model complexity, as they occur in the literature, in process modeling languages, and tool implementations. This analysis took the form of a collection of frequently recurring patterns. These twelve patterns, combined with the eight patterns presented in  provide a comprehensive overview of existing mecha- nisms and language features to improve the understandability of process models by reducing complexity. The patterns in  focused on changes to the concrete syntax (e.g., improving the layout) but did not consider changes to the abstract syntax.
After documenting these patterns, we evaluated state-of- the-art languages and language implementations in terms of these patterns, and conducted a usability test with practitioners. The results of the usability test demonstrate that all identified patterns are indeed perceived as relevant.
Although most tools provide some support for modifying models to improve their understandability, there is no real guidance on how to simplify and clarify the representation of process models. For example, many tools allow the duplication of model elements (e.g., two nodes referring to the same activity), however, automated support to suggest duplication for increasing the understandability is missing in the current generation of process model editors. Thus, one could argue that today's tools are good for model creation, but provide little support for model management and maintenance. Since processes change at an increasing pace and more and more variants of the same process need to be supported, this shortcoming is limiting the applicability of BPM technology. Thus, we hope that tool vendors will use our patterns as a guide to drive the development of better functionality. We also intend to extend existing research tools such as the YAWL Editor, and the process model repository AProMoRe , with innovative features to support the patterns identified.