Learn/Core Concept How does neural diarization work? Neural diarization uses deep learning to determine who spoke when in audio recordings. Unlike traditional clustering approaches, it jointly models speaker identities and speech boundaries in a unified architecture. The SoulX-Transcriber demonstrates this by combining speaker identification with transcription in one model, eliminating the need for separate pipeline stages that accumulate errors. TranscriptionAudio-processing |