Master Thesis

Automatic Extraction of Conceptual Interoperability Constraints from API Documentation

MASTER THESIS

By

Mohammed Ismail Abujayyab

Abstract

Successfully integrating software requires fulfilling their conceptual interoperability constraints that restrict their state or behavior. Typically, the only source for these information that is available for third-party clients is the API documentation. However, manually reading and analyzing the natural language (NL) text within such API documents, which is unstructured textual content, is a tedious and time consuming task and it requires lexical and linguistic analysis skills. Moreover, it might undergo many mistakes and misunderstandings leading to unexpected mismatches and cost consequences to fix them.  This encouraged us to provide a means to support software analysts and the architect to help them in increasing their efficiency and effectiveness for identifying the conceptual interoperability constraints automatically rather than manually from the text in API documentations.

To achieve our goals in this research, we followed an empirical-based methodology in incorporating machine learning (ML) technologies together with natural language processing (NLP) ones. The main contributions of this thesis are wrapped within our methodology. First, we started with a manual development for a corpus, which is a collection of relevant sentences we chose from real API documentations then we manually classified them into different classes. This classification is based on the COnceptual Interoperability coNstraints (COIN) model, which has seven classes (i.e. NOT-COIN, Dynamic, Semantic, Structure, Syntax, Context and Quality). Then, we built rules for these classes. Afterwards, we decided to explore the potentials of using the ML classifiers, thus we designed the classification model that defines the frequently used patterns and terms for representing conceptual interoperability constraints in the NL text of API documents. By training the classification model on our developed corpus. We were able to run many text classification algorithms and we have achieved promising results F-measure of Finally, we implemented a plugin tool by utilizing the classifier that we trained, so this tool allows architects to classify any texts into one of these seven classes.

[1] Mohammed-Thesis (PDF)                 

[2]Mohammed-Thesis (PPT)