Summary: | Business process models are the conceptual models to depict the workflow of an organization. Process model matching (PMM) refers to the automatic identification of corresponding activities between a pair of process models that show similar or the same behavior. During the last few years, PMM has received much of the researchers' attention due to its wide range of applications, such as clone detection and harmonization of process models. Consequently, a plethora of PMM techniques has been developed. In order to evaluate the effectiveness of these techniques, experts have developed three benchmark datasets, formally called PMMC'15 datasets. Furthermore, the process models in the datasets have been converted into OAEI'17 ontologies. These resources are a valuable asset for the PMM community to evaluate process model matching techniques. However, these resources (PMMC'15 and OAEI'17) are limited to fewer models and a handful collection of corresponding activities among these models that may not be sufficient to rigorously evaluate the PMM techniques. To fill this gap, this paper provides a large, diverse, and a carefully handcrafted collection of process models, along with their benchmark correspondences. The process model collection and benchmark correspondences between these models are freely available for the community [1]. Our newly developed dataset, together with the existing resources, can be used for a thorough evaluation of PMM techniques, especially in the context of the vocabulary mismatch problem. At last, we have evaluated the characteristics of our dataset by a series of experiments while involving widely used similarity measures in PMM research. The results reveal that our dataset is larger, diverse, and challenging as compared to existing datasets in the PMM domain.
|