This is a fine-tuned BERT classifier for separating jobs in an eco-friendly context (green jobs) from non-green jobs. It is part of Shuang Chen's Ph.D. Job Market Paper “Green Investors and Green Transition Efforts: Talk the Talk or Walk the Walk?" on https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4254894.
The interactable API produces the predicted value as 0 or 1. The output value 0 indicates the input is a non-green job and 1 indicates a green job.
license: apache-2.0
This machine learning algorithm is BERT, Bidirectional Encoder Representations from Transformers. It is a natural language processing method widely adopted in the industry since its birth in 2017 (Vaswani et al., 2017). It is a deep learning model in which every output element is connected to every input element, and the weightings between them are dynamically calculated based on their connection. Training a BERT model requires very large training samples and a long processing time. Fortunately, it can be pre-trained with texts that are not specific to a classification problem. Based on the pre-trained model, fine-tuning with a small set of training samples in a specific problem can achieve satisfying accuracy. In this study, I use the widely used transformer package from Hugging Face, which provides pre-trained BERT models of various sizes for several languages. Specifically, I use the “bert-base-uncased” model as the initial BERT model to fine-tune.
The training sample for fine-tuning directly affects model parameters and prediction quality. I expect environment-related jobs to be much fewer than irrelevant ones, as our economy is still in a transition stage. If a random group of job postings is used as a training sample, one class will dominate the other, and the model is tempted to always predict the dominant class. Moreover, for occupations such as logistics manager, where the greenness of the job position depends on the context, some work tasks are not related to environmental responsibility. Neither are the sentences in the job description. Fine-tuning could be more effective if positive training samples are more distinct from negative training samples. Therefore, I use the descriptions of green tasks and non-green tasks as positive sample and negative sample in the training sample.
The U.S. Department of Labor (DOL) lists the green tasks and non-green tasks involved in every green enhanced skills occupation and green new and emerging occupation, there are 1398 green tasks and 1705 non-green tasks in total. This training sample with a balanced proportion between positive sample and negative sample can help relieve the over-fitting concern. The DOL also provides a general description of each green occupation. For occupations whose greenness depends on the context, the general description does not indicate environmental responsibility, while the description of an always-green occupation does. So, I use the general descriptions of always-green occupations and context-dependent green occupations as the positive sample and negative sample in the validation sample.
The performance of the BERT classifier is evaluated on four dimensions, accuracy (0.89), precision (0.73), recall (0.96) and F1 score (0.83). These numbers are calculated by comparing the model predictions in the validation sample with their actual labels. Accuracy is the number of correct predictions divided by the number of all predictions made. 90% predictions of the BERT classifier match the actual labels. Precision is the number of positive predictions that are actually positive divided by the number of positive predictions. Namely, precision is true positive divided by the sum of true positive and false positive. Recall is the number of actual positive predictions that are correctly predicted divided by the number of actual positive cases. Namely, recall is true positive divided by the sum of true positive and false negative. Precision (0.73) is lower than recall (0.96), which indicates that the classifier makes more type 1 error, predicting non-green positions as green (false positive), than type 2 error, predicting green positions as non-green (false negative). F1 score is the harmonic mean of precision and recall.