generated_from_trainer

<!-- This model card has been generated automatically according to the information the Trainer had access to. You should probably proofread and complete it, then remove this comment. -->

Classification of patent abstracts - "Green Plastics" or "No Green Plastics"

This model (distilbert-base-uncased-finetuned-greenplastics-3) classifies patents into "green plastics" or "no green plastics" by their abstracts.

The model is a fine-tuned version of distilbert-base-uncased on the green plastics dataset (11.196 samples of patent abstracts). The green plastics dataset was split into 70 % training data and 30 % test data (using ".train_test_split(test_size=0.3)"). The model achieves the following results on the evaluation set:

The maximum number of taining steps was set to 200 to avoid overfitting. I considered an accuracy of 0.8574 to be suitable for the task. Further training would lead to a high accuracy but testing the final model with random examples was not really satisfying. That is why I chose to limit the training steps.

EPO - CodeFest on Green Plastics

The model has been developed for submission to the CodeFest on Green Plastics by the European Patent Office (EPO).

The task:

"To develop creative and reliable artificial intelligence (AI) models for automating the identification of patents related to green plastics."

How to use the model

from transformers import pipeline

model_id = "cwinkler/distilbert-base-uncased-finetuned-greenplastics-3"
classifier = pipeline("text-classification", model=model_id)

your_abstract = <insert_your_abstract_here_as_a_string>
# e.g. your_abstract = "The present disclosure relates to a process for recycling of plastic waste comprising: segregating plastic waste collected from various sources followed by cleaning of the segregated plastic waste to obtain segregated cleaned waste; grinding of the segregated cleaned waste to obtain grinded waste; introducing the grinded waste into an extrusion line having a venting extruder component as part of the extrusion line, to obtain molten plastic; and removing the impurities by vacuum venting of the molten plastic to obtained recycled plastic free from impurities. The present disclosure further relates to various articles like Industrial Post Recycled (IPR) plastic tubes, blow moulded bottles, pallates, manufactured from the recycled plastic waste."
preds = classifier(your_abstract, return_all_scores=True)
print(preds)

Examples

Following examples are randomly chosen abstracts from patent literature.

"Green Plastics"

"No Green Plastics"

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

Training results

Training Loss Epoch Step Validation Loss Accuracy F1
No log 0.2 200 0.3435 0.8574 0.8573

Framework versions