🗣️ VoxMorph: Scalable Zero-Shot Voice Identity Morphing
University of North Texas | ICASSP Accepted Paper
This interface implements the VoxMorph framework, allowing for high-fidelity voice morphing via disentangled prosody and timbre embeddings.
1. Source Input
2. Morphing Controls
0 1
3. Acoustic Output
Methodology: The framework operates by disentangling vocal characteristics into Prosody (Style) and Timbre (Identity) embeddings. These embeddings are projected onto a hypersphere and interpolated using Spherical Linear Interpolation (Slerp) to ensure geometric consistency. The fused embeddings condition an autoregressive language model and a Conditional Flow Matching (CFM) network to synthesize the final waveform.