🗣️ VoxMorph: Scalable Zero-Shot Voice Identity Morphing

University of North Texas | ICASSP Accepted Paper

This interface implements the VoxMorph framework, allowing for high-fidelity voice morphing via disentangled prosody and timbre embeddings.

1. Source Input

2. Morphing Controls

0 1

3. Acoustic Output

Methodology: The framework operates by disentangling vocal characteristics into Prosody (Style) and Timbre (Identity) embeddings. These embeddings are projected onto a hypersphere and interpolated using Spherical Linear Interpolation (Slerp) to ensure geometric consistency. The fused embeddings condition an autoregressive language model and a Conditional Flow Matching (CFM) network to synthesize the final waveform.