RESTBERTa

RESTBERTa stands for Representational State Transfer on Bidirectional Encoder Representations from Transformers approach and should support machines in processing structured syntax and unstructured natural language descriptions for semantics in Web API documentation. In detail, we use question answering to solve the generic task of identifying a Web API syntax element (answer) in a syntax structure (paragraph) that matches the semantics described in a natural language query (question). The identification and extraction Web API syntax elements from Web API documentation is a common sub task of many Web API integration tasks, like parameter matching and endpoint discovery. Thus, RESTBERTa might be a foundation for several Web API integration tasks. Technically, RESTBERTa covers the concepts for fine-tuning a Transformer Encoder model, i.e., a pre-trained BERT model, to question answering with task-specific samples in order to prepare a model for a specific Web API integration task.

The paper "Semantic Parameter Matching in Web APIs with Transformer-based Question Answering" demonstrates the application of RESTBERTa on semantic parameter matching:

RESTBERTa for Semantic Parameter Matching

This repository contains the weights for a fined-tuned RESTBERTa model for the task of semantic parameter matching in Web APIs. For this, we formulate question answering as a multiple choice task: Given a query in natural language that describes the purpose and behavior of the target parameter, i.e., its semantics, the model should choose the parameter from a given schema, which consists of hierarchically organized parameters, e.g., a JSON or XML schema.

Note: BERT models are optimized for linear text input. We, therefore, serialize a schema, which is commonly a tree structure of hierarchically organized parameters, into linear text by converting parameters into an XPath-like notation, e.g., "users[*].name" for a parameter "name" that is part of an object of the array "users". The result is a list of alphabetically sorted XPaths, e.g., "link.href link.rel users[*].id users[*].name users[*].surname".

Fine-tuning

We fine-tuned the pre-trained microsoft/codebert-base model to the downstream task of question answering with 1,085,051 question answering samples from 2,321 real-world OpenAPI documentation. Each sample consists of:

Inference:

RESTBERTa requires a special output interpreter that processes the predictions made by the model in order to determine the suggested parameter. We discuss the details in the paper.

Hyperparameters:

The model was fine-tuned with ten epochs and a batch size of 16 on Nvidia RTX 3090 GPU with 24 GB. This repository contains the model checkpoint (weights) after five epochs of fine-tuning, which achieved the highest accuracy applied to our validation set.

References:

Citation:

@INPROCEEDINGS{10.1109/SOSE58276.2023.00020,
  author={Kotstein, Sebastian and Decker, Christian},
  booktitle={2023 IEEE International Conference on Service-Oriented System Engineering (SOSE)}, 
  title={Semantic Parameter Matching in Web APIs with Transformer-based Question Answering}, 
  year={2023},
  volume={},
  number={},
  pages={114-123},
  doi={10.1109/SOSE58276.2023.00020}}