cnn_dailymail_22457_3000_1500_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_22457_3000_1500_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 49
 - Number of training documents: 3000
 
<details> <summary>Click here for an overview of all topics.</summary>
| Topic ID | Topic Keywords | Topic Frequency | Label | 
|---|---|---|---|
| -1 | said - one - year - people - police | 10 | -1_said_one_year_people | 
| 0 | league - player - club - game - cup | 1050 | 0_league_player_club_game | 
| 1 | said - syria - government - iraq - islamic | 317 | 1_said_syria_government_iraq | 
| 2 | obama - president - house - state - republican | 140 | 2_obama_president_house_state | 
| 3 | cancer - hospital - baby - treatment - child | 122 | 3_cancer_hospital_baby_treatment | 
| 4 | google - apple - tablet - car - device | 84 | 4_google_apple_tablet_car | 
| 5 | fashion - dress - hair - look - woman | 78 | 5_fashion_dress_hair_look | 
| 6 | police - officer - shooting - said - shot | 66 | 6_police_officer_shooting_said | 
| 7 | film - movie - show - actor - comedy | 65 | 7_film_movie_show_actor | 
| 8 | murder - death - said - home - police | 55 | 8_murder_death_said_home | 
| 9 | mr - labour - minister - mp - blair | 52 | 9_mr_labour_minister_mp | 
| 10 | storm - water - weather - ice - rain | 51 | 10_storm_water_weather_ice | 
| 11 | shark - bear - turtle - crocodile - bird | 50 | 11_shark_bear_turtle_crocodile | 
| 12 | flight - plane - passenger - airport - pilot | 49 | 12_flight_plane_passenger_airport | 
| 13 | house - property - home - per - room | 49 | 13_house_property_home_per | 
| 14 | drug - police - court - stealing - robbery | 40 | 14_drug_police_court_stealing | 
| 15 | police - murder - mr - court - clavell | 36 | 15_police_murder_mr_court | 
| 16 | games - gold - olympic - race - sport | 34 | 16_games_gold_olympic_race | 
| 17 | student - school - teacher - said - cardosa | 34 | 17_student_school_teacher_said | 
| 18 | country - minister - energy - cent - greece | 32 | 18_country_minister_energy_cent | 
| 19 | golf - mcilroy - course - round - ryder | 31 | 19_golf_mcilroy_course_round | 
| 20 | police - harris - abuse - allegation - officer | 30 | 20_police_harris_abuse_allegation | 
| 21 | ebola - virus - africa - health - liberia | 29 | 21_ebola_virus_africa_health | 
| 22 | chinese - china - cable - bo - beijing | 28 | 22_chinese_china_cable_bo | 
| 23 | federer - tennis - murray - wimbledon - match | 28 | 23_federer_tennis_murray_wimbledon | 
| 24 | dog - animal - dogs - owner - simmons | 26 | 24_dog_animal_dogs_owner | 
| 25 | cent - per - woman - men - pickens | 23 | 25_cent_per_woman_men | 
| 26 | ship - boat - rescue - water - sea | 23 | 26_ship_boat_rescue_water | 
| 27 | hamilton - race - rosberg - mercedes - formula | 22 | 27_hamilton_race_rosberg_mercedes | 
| 28 | galaxy - planet - universe - earth - telescope | 22 | 28_galaxy_planet_universe_earth | 
| 29 | russian - russia - putin - ukraine - moscow | 22 | 29_russian_russia_putin_ukraine | 
| 30 | pakistan - pakistani - karachi - taliban - anwar | 22 | 30_pakistan_pakistani_karachi_taliban | 
| 31 | korea - north - korean - south - kim | 21 | 31_korea_north_korean_south | 
| 32 | car - driver - train - accident - cope | 21 | 32_car_driver_train_accident | 
| 33 | food - fruit - taste - cake - cream | 20 | 33_food_fruit_taste_cake | 
| 34 | painting - art - auction - artist - gallery | 20 | 34_painting_art_auction_artist | 
| 35 | base - drone - soldier - afghan - us | 19 | 35_base_drone_soldier_afghan | 
| 36 | weight - fat - eating - healthy - size | 18 | 36_weight_fat_eating_healthy | 
| 37 | mafia - wine - money - fraud - court | 18 | 37_mafia_wine_money_fraud | 
| 38 | aguilar - bravo - brewer - rambold - court | 18 | 38_aguilar_bravo_brewer_rambold | 
| 39 | missing - search - found - family - disappeared | 17 | 39_missing_search_found_family | 
| 40 | juarez - quezada - mexico - mexican - cartel | 15 | 40_juarez_quezada_mexico_mexican | 
| 41 | knicks - lin - chicago - blackhawks - game | 15 | 41_knicks_lin_chicago_blackhawks | 
| 42 | duchess - prince - kate - royal - william | 15 | 42_duchess_prince_kate_royal | 
| 43 | price - supermarket - asda - shop - food | 14 | 43_price_supermarket_asda_shop | 
| 44 | school - child - pupil - teacher - xxx | 14 | 44_school_child_pupil_teacher | 
| 45 | nhs - patient - ae - hospital - staff | 13 | 45_nhs_patient_ae_hospital | 
| 46 | zsa - francesca - rhodes - vongtau - gabor | 12 | 46_zsa_francesca_rhodes_vongtau | 
| 47 | medal - war - bomb - graf - vc | 10 | 47_medal_war_bomb_graf | 
</details>
Training hyperparameters
- calculate_probabilities: True
 - language: english
 - low_memory: False
 - min_topic_size: 10
 - n_gram_range: (1, 1)
 - nr_topics: None
 - seed_topic_list: None
 - top_n_words: 10
 - verbose: False
 
Framework versions
- Numpy: 1.22.4
 - HDBSCAN: 0.8.33
 - UMAP: 0.5.3
 - Pandas: 1.5.3
 - Scikit-Learn: 1.2.2
 - Sentence-transformers: 2.2.2
 - Transformers: 4.31.0
 - Numba: 0.56.4
 - Plotly: 5.13.1
 - Python: 3.10.6