Dataset requirements
Quality in = quality out
You need 30–60 minutes of clean (no effects or noise) and monophonic (one voice at a time) vocals.
Keep Vocals Clean
Effects like reverb, delay and chorus disrupt the cloning process.
Dry Vocals
Reverb & Delay
Only Include One Voice
Don’t include any harmonies, stacked voices, or instrumentals.
One Voice
Harmonies
Creating datasets for high-fidelity voice cloning
Your AI voice learns everything it can about your dataset, so create a dataset that sounds exactly how you want your AI voice to sound. Read the tips below to learn how to record and prepare a high-quality dataset.

Dataset Examples
Keep Vocals Clean
Effects like reverb, delay and chorus disrupt the cloning process.
Dry Vocals
Reverb & Delay
Avoid Room Sound
Hard surfaces (walls, floor, ceiling) can cause unwanted reverb.
Well-treated Room
Hard, Reflective Room
Avoid Background Noise
Appliances, fans, electrical hums, or street noise can reduce the accuracy of your voice.
Noiseless
Background Noise
Only Include One Voice
Don’t include any harmonies, stacked voices, or instrumentals.
One Voice
Harmonies
Vocals only. No reverb. No harmonies
High quality. Recorded with the highest-quality mic available in a sound-treated room
Diverse. As many vowels and pitches as possible