Description

Turkish News Classification AI Model is a powerful and efficient tool for text categorization. It is built on the foundation of the highly acclaimed BERTurk model, which has been meticulously fine-tuned to enhance its performance. The model leverages advanced Natural Language Processing (NLP) techniques to accurately classify text into one of seven categories: world, economy, culture, health, politics, sport, and technology.

This model is a result of careful training on a dataset of 27,000 news articles, spanning a decade from 2010 to 2020. The dataset encompasses a wide array of topics, offering a comprehensive view of historical events, trends, and developments across various sectors. This extensive dataset has been thoughtfully categorized into the seven categories mentioned above, providing a rich resource for training and testing this AI model.

The Turkish News Classification AI Model is capable of delivering precise and reliable results. It can be effectively used for a range of applications, including sentiment analysis, topic modeling, and other text analysis tasks. It is an efficient tool for researchers, developers, and data scientists who are looking to extract meaningful information from textual data, understand human sentiment, or make significant strides in the field of information extraction.

The model's structure is designed to ensure optimal performance, with a focus on accuracy and efficiency. It is a testament to the power of advanced NLP techniques when combined with a fine-tuned version of the BERTurk model. This combination results in a robust and reliable AI model capable of classifying text with a high degree of precision.

Turkish News Classification AI Model is a versatile tool that promises exceptional performance. It is designed to help you make the most of your text data, providing you with valuable insights and helping you make informed decisions. Whether you are a researcher, a developer, or a data scientist, this model is a resource for your text classification tasks.

Predicted Entities / Classes

Model Performance Metrics

image/png

How to use from the Transformers library

Use a pipeline as a high-level helper

from transformers import pipeline

pipe = pipeline("text-classification", model="aimped/nlp-classifier-news-tr")

Load model directly

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("aimped/nlp-classifier-news-tr")
model = AutoModelForSequenceClassification.from_pretrained("aimped/nlp-classifier-news-tr")

Dataset & Source

The Turkish News Category Dataset comprises 27K news articles, categorized into 7 distinct categories. The dataset is sourced from printed media and news websites spanning the years 2010 to 2020. It utilizes the Kaggle Turkish benchmark dataset for testing purposes.


license: MIT