automotive

WG-BERT (Warranty and Goodwill) is a pretrained encoder based model to analyze automotive entities in automotive-related texts. WG-BERT is trained by continually pretraining the BERT language model in the automotive domain by using a corpus of automotive (workshop feedback) texts via the masked language modeling (MLM) approach. WG-BERT is further fine-tuned for automotive entity recognition (subtask of Named Entity Recognition (NER)) to extract components and their complaints out of automotive texts. The dataset for continual pretraining consists of 1.8 million workshop feedback texts which contain ~4 million sentences. The dataset for fine-tuning consists of ~5.500 gold annotated sentences by automotive domain experts. We choose as the training architecture the BERT-base-uncased version.

Please contact Lukas Weber lukas-weber[at]hotmail[dot]de / lukas.l.weber[at]mercedes-benz[dot]com about any WG-BERT related issues and questions.