Make money doing the work you believe in

You can now clone a human voice in real time without tokenization.

It's called VoxCPM.

Most TTS models convert speech to discrete tokens.

Tokens lose information. Create artifacts. Cause unnatural pauses.

VoxCPM generates speech in continuous space.

End-to-end diffusion. No tokenizer. No information loss.

What you get:

1️⃣ Context-aware generation

→ Model reads text, infers appropriate prosody

→ Adapts style based on content automatically

2️⃣ Zero-shot voice cloning

→ Short reference clip is all you need

→ Captures timbre, accent, emotion, rhythm, pacing

3️⃣ Real-time synthesis

→ RTF of 0.15 on RTX 4090

→ Streaming supported

The specs:

→ 800M parameters

→ 44.1kHz output

→ 1.8M hours of training data

→ Supports LoRA fine-tuning

pip install voxcpm

from voxcpm import VoxCPM

model = VoxCPM.from_pretrained("openbmb/VoxCPM1.5")

wav = model.generate(text="Your text here")

5.5k GitHub stars. Apache 2.0.

💾 Save for when you need TTS that doesn't sound like TTS

♻️ Repost if you've been burned by robotic voice cloning

Feb 12
at
2:03 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.