audio - AI Model Zoo - BimAnt

Multimodal

Feature Extraction Text-to-Image Image-to-Text Text-to-Video Visual Question Answering Document Question Answering Graph Machine Learning

Computer Vision

Depth Estimation Image Classification Object Detection Image Segmentation Image-to-Image Unconditional Image Generation Video Classification Zero-Shot Image Classification

Natural Language Processing

Text Classification Token Classification Table Question Answering Question Answering Zero-Shot Classification Translation Summarization Conversational Text Generation Text2Text Generation Fill-Mask Sentence Similarity

Audio

Text-to-Speech Text-to-Audio Automatic Speech Recognition Audio-to-Audio Audio Classification Voice Activity Detection

Tabular

Tabular Classification Tabular Regression

Reinforcement Learning

Reinforcement Learning Robotics

NSDT 3DConvert

Convert 30+ 3D formats online: GLTF, GLB, GBX, OBJ, DAE, IFC, STEP, STL...

UnrealSynth

Unreal engine based photo realistic synthetic data generator for YOLO.

DreamTexture.js

AI powered 3d texture generation and projection SDK for three.js.

Models with tag audio retrieved: 1539

espnet/yoshiki_wsj_whisper_medium_finetuning audio

espnet/akreal_ls100_asr2_e_branchformer1_1gpu_raw_wavlm_large_21_km2k_bpe_rm6k_bpe_ts5k_sp audio

hchung1017/gigaspeech_streaming_conformer audio

espnet/chendali_librimix_asr_train_sot_asr_whisper_small_raw_en_whisper_multilingual audio

wwerkk/tiny-audio-diffusion-percussion-finetuned-triton audio

yangwang825/mert-base audio

facebook/multiband-diffusion audio

alphacep/vosk-model-ru audio

alphacep/vosk-model-small-ru audio

volodymyrs/small_dir_upload audio

espnet/yoshiki_wsj0_2mix_spatialized_enh_tfgridnet_waspaa2023_raw audio

espnet/yoshiki_wsj_asr_conformer_s3prlfrontend_wavlm_raw_en_char audio

carlosdanielhernandezmena/whisper-large-faroese-8k-steps-100h-ct2 audio

carlosdanielhernandezmena/whisper-large-maltese-8k-steps-64h-ct2 audio

meehl/whisper-medium-ko-ct2 audio

spotify/basic-pitch audio

espnet/eason_chime4_asr2_e_branchformer12_conv1d1_raw_wavlm_large_21_km1k_bpe_rm2k_char_ts_sp audio

alphacep/vosk-model-small-streaming-ru audio

davidggphy/whisper-small-dv-ct2 audio

espnet/msk_lrs3_train_avsr_avhubert_large_extracted_en_bpe1000 audio