🎨

Vision & Multimedia Processing

530 models available

Explore image generation, video AI, speech recognition, and music synthesis models.

Page 1 of 23
📦 Source

stable-diffusion-xl-base-1.0

by stabilityai

--- license: openrail++ tags: - text-to-image - stable-diffusion --- !row01 !pipeline SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the ba...

diffusers onnx
❤️ 7.2K
📥 2.2M
📦 Source

Kokoro-82M

by hexgrad

--- license: apache-2.0 language: - en base_model: - yl4579/StyleTTS2-LJSpeech pipeline_tag: text-to-speech --- **Kokoro** is an open-weight TTS model with 82 million parameters. D...

text-to-speech en
❤️ 5.4K
📥 4.0M
📦 Source

whisper-large-v3

by openai

--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr -...

transformers pytorch
❤️ 5.2K
📥 5.7M
📦 Source

XTTS-v2

by coqui

--- license: other license_name: coqui-public-model-license license_link: https://coqui.ai/cpml library_name: coqui pipeline_tag: text-to-speech widget: - text: "Once when I was si...

coqui text-to-speech
❤️ 3.2K
📥 6.4M
📦 Source

whisper-large-v3-turbo

by openai

--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - 'no' - th - ur - hr...

transformers safetensors
❤️ 2.7K
📥 4.4M
📦 Source

blip-image-captioning-large

by Salesforce

--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: bsd-3-clause --- Model card for image captioning pretrained on COCO dataset - base architecture (w...

transformers pytorch
❤️ 1.4K
📥 1.2M
📦 Source

speaker-diarization-3.1

by pyannote

No description available.

pyannote-audio pyannote
❤️ 1.4K
📥 16.0M
📦 Source

stable-diffusion-v1-5

by stable-diffusion-v1-5

--- license: creativeml-openrail-m tags: - stable-diffusion - stable-diffusion-diffusers - text-to-image inference: true --- Modifications to the original model card are in red or ...

diffusers safetensors
❤️ 939
📥 2.0M
📦 Source

vit-gpt2-image-captioning

by nlpconnect

--- tags: - image-to-text - image-captioning license: apache-2.0 widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg example_title: Savanna...

transformers pytorch
❤️ 921
📥 1.4M
📦 Source

detr-resnet-50

by facebook

--- license: apache-2.0 tags: - object-detection - vision datasets: - coco widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/main/savanna.jpg example_titl...

transformers pytorch
❤️ 913
📥 1.9M
📦 Source

vit-base-patch16-224

by google

--- license: apache-2.0 tags: - vision - image-classification datasets: - imagenet-1k - imagenet-21k widget: - src: https://huggingface.co/datasets/mishig/sample_images/resolve/mai...

transformers pytorch
❤️ 912
📥 4.8M
📦 Source

blip-image-captioning-base

by Salesforce

--- pipeline_tag: image-to-text tags: - image-captioning languages: - en license: bsd-3-clause --- Model card for image captioning pretrained on COCO dataset - base architecture (w...

transformers pytorch
❤️ 823
📥 2.0M
📦 Source

whisper-small

by openai

--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr -...

transformers pytorch
❤️ 497
📥 4.5M
📦 Source

whisper-tiny

by openai

--- language: - en - zh - de - es - ru - ko - fr - ja - pt - tr - pl - ca - nl - ar - sv - it - id - hi - fi - vi - he - uk - el - ms - cs - ro - da - hu - ta - no - th - ur - hr -...

transformers pytorch
❤️ 392
📥 1.1M
📦 Source

table-transformer-detection

by microsoft

--- license: mit widget: - src: https://www.invoicesimple.com/wp-content/uploads/2018/06/Sample-Invoice-printable.png example_title: Invoice --- Table Transformer (DETR) model trai...

transformers pytorch
❤️ 384
📥 2.2M
📦 Source

wav2vec2-base-960h

by facebook

--- language: en datasets: - librispeech_asr tags: - audio - automatic-speech-recognition - hf-asr-leaderboard license: apache-2.0 widget: - example_title: Librispeech sample 1 src...

transformers pytorch
❤️ 383
📥 1.9M
📦 Source

distil-large-v3

by distil-whisper

--- language: - en license: mit library_name: transformers tags: - audio - automatic-speech-recognition - transformers.js widget: - example_title: LibriSpeech sample 1 src: https:/...

transformers jax
❤️ 356
📥 1.2M
📦 Source

FLUX.1-dev

by black-forest-labs

No description available.

diffusers safetensors
❤️ 12.0K
📥 938.2K
📦 Source

speaker-diarization

by pyannote

No description available.

pyannote-audio pyannote
❤️ 1.2K
📥 845.7K
📦 Source

FLUX.1-schnell

by black-forest-labs

No description available.

diffusers safetensors
❤️ 4.5K
📥 748.0K
📦 Source

F5-TTS

by SWivid

--- license: cc-by-nc-4.0 pipeline_tag: text-to-speech library_name: f5-tts datasets: - amphion/Emilia-Dataset --- Download F5-TTS or E2 TTS and place under ckpts/ Github: https://...

f5-tts text-to-speech
❤️ 1.1K
📥 780.2K
📦 Source

BiRefNet

by ZhengPeng7

--- library_name: birefnet tags: - background-removal - mask-generation - Dichotomous Image Segmentation - Camouflaged Object Detection - Salient Object Detection - pytorch_model_h...

birefnet safetensors
❤️ 496
📥 731.2K
📦 Source

chatterbox

by ResembleAI

--- license: mit language: - ar - da - de - el - en - es - fi - fr - he - hi - it - ja - ko - ms - nl - no - pl - pt - ru - sv - sw - tr - zh pipeline_tag: text-to-speech tags: - t...

chatterbox text-to-speech
❤️ 1.3K
📥 637.3K
📦 Source

sd-turbo

by stabilityai

--- pipeline_tag: text-to-image inference: false --- !row01 SD-Turbo is a fast generative text-to-image model that can synthesize photorealistic images from a text prompt in a sing...

diffusers safetensors
❤️ 429
📥 1.2M