Uganda Labor Market Interview Text Classification

This model is a fine-tuned Roberta base model using text transcripts of interviews between Vocational Training Institutes (VTI) students and their successful alumni in Uganda on the subject of the labor market.

Model description

The model classifies sentences into six distinct categories, with some sentences potentially being assigned to multiple topics. The classification criteria are as follows:

info: Pertinent details about the job market, working conditions, salaries, and expectations in the workplace, as well as the alumni's and students' current job market situations, career plans, and past experiences. If strategies are mentioned in this context, the sentence is also classified as a strategy.

tip: Advice on workplace behavior and self-improvement, primarily emphasizing discipline, humility, treating colleagues and clients well, and avoiding illegal activities. If these tips are associated with an increased likelihood of employment, the sentence is also classified as a strategy..

strategy: Guidance aimed at enhancing students' chances of securing employment or better job opportunities, covering aspects such as company research, application creation and submission, interview conduct, networking, and general advice for enhancing job-related skills. Additionally, this category includes tips for starting a business, such as capital accumulation, location scouting, business models, equipment procurement, and client attraction and retention.

motivation: General recommendations for maintaining confidence, patience, persistence, engagement, and optimism in the job market. If specific contexts are provided for these recommendations, the sentence may also be classified as a strategy or tip accordingly.

referral: Directing students to companies or individuals, or providing affirmative responses to students' requests for connections.

neutral: Introductions, contact exchanges, purely technical content, unrelated school or exam discussions, miscellaneous conversations that do not fit into the other five topics, and unclear content due to language deficiencies or translation issues.

How to use

You can use this model directly with a pipeline for text classification:

>>> from transformers import pipeline
>>> pipe = pipeline("text-classification", model= "wanghao2023/uganda-labor-market-interview-text-classification", tokenizer = "wanghao2023/uganda-labor-market-interview-text-classification", return_all_scores = True)
>>> pipe("if they think you know too much, they won't teach you.")
[[{'label': 'is_info', 'score': 0.18128268420696259},
  {'label': 'is_tip', 'score': 0.5684323310852051},
  {'label': 'is_strategy', 'score': 0.22818608582019806},
  {'label': 'is_motivation', 'score': 0.03250108286738396},
  {'label': 'is_neutral', 'score': 0.05972086638212204},
  {'label': 'is_referral', 'score': 0.013502764515578747}]]

Limitations and bias

Sentence classification is heavily dependent on context. For instance, the phrase "be patient" could be categorized as a tip, strategy, and/or motivation, depending on the specific context in which the alumni advises patience. The context determines whether the advice pertains to interviews, workplace behavior, or general motivation.

Evaluation results

This model achieves the following results when tested on the validation dataset (multilabel, threshold = 0.3). There is a huge room for improvement but it performs much better than a dice roll at least:

F1 Roc Auc Accuracy
0.655779 0.799979 0.552670