Natural language processing-enhanced extraction of SBVR business vocabularies and business rules from UML use case diagrams

作者:

Highlights:

摘要

Discovery, specification and proper representation of various aspects of business knowledge plays crucial part in model-driven information systems engineering, especially when it comes to the early stages of systems development. Being among the most applicable and advanced features of model-driven development, model transformation could help improving one of the most time- and resource-consuming efforts in this process, namely, discovery and specification of business vocabularies and business rules within the problem domain. One of our latest developments in this area was the solution for the automatic extraction of SBVR business vocabularies and business rules from UML use case diagrams, which was arguably one of the most comprehensive developments of this kind currently available in public. In this paper, we present an enhancement to our previous development by introducing a novel natural language processing component to it. This enhancement provides more advanced extraction capabilities (such as recognition of entities, entire noun and verb phrases, multinary associations) and better quality of the extraction results compared to our previous solution. The main contributions presented in this paper are pre- and post-processing algorithms, and two extraction algorithms using custom-trained POS tagger. Based on the related work findings, it is safe to state that the presented solution is novel and original in its approach of combining together M2M transformation of UML and SBVR models with natural language processing techniques in the field of model-driven information systems engineering.

论文关键词:SBVR business vocabulary and rules,UML use case diagram,Model-to-model transformation,Controlled natural language,Natural language processing,Information extraction

论文评审过程:Received 16 June 2019, Revised 10 February 2020, Accepted 2 May 2020, Available online 6 May 2020, Version of Record 5 August 2020.

论文官网地址:https://doi.org/10.1016/j.datak.2020.101822