code linguistic antipatterns python

alBERTo

alBERTo model pre-trained for the classification of linguistic antipatterns on a dataset containing instances of these bad practices of type: "Get" more than accessor, Not implemented condition, Method signature and comment are opposite, Attribute signature and comment are opposite

Model Description

alBERTo is a model created for the recognition of linguistic antipatterns within python code. It was created starting from the Microsoft CodeBERT model, on which fine tuning operations were carried out to make it capable of classifying the code as "clean" or containing linguistic antipatterns. The model is able to classify different classes:

Intended uses & limitations

this model can be used for the classification of linguistic antipatters described previously. The model still has limitations, as it makes classification errors due to the presence of little data for training, therefore its predictions should not be taken as absolute or true regardless

Usage

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained('alBERTo')
model = AutoModelForSequenceClassification.from_pretrained("alBERTo")

# prepare input
text = """
  """"""
    create a new object
  """"""
  def destroy_object():
"""
encoded_input = tokenizer(text, return_tensors='pt')

# forward pass
output = model(**encoded_input)