Diffusion and flow matching TTS faces a tension between discrete temporal structure and continuous spectral modeling. Two-stage models first determine durations then diffuse on fixed alignments, often collapsing to mean prosody and applying uniform stretching when adapting speaking rate. Single-stage attention-based models avoid explicit durations but suffer alignment instability. We propose a jump-diffusion framework where discrete jumps model temporal structure and continuous diffusion refines spectral content within one probabilistic process. On LJSpeech, our method achieves 3.37% WER vs. 4.38% for Grad-TTS with better UTMOSv2. In out-of-distribution slow speech, our model autonomously inserts natural pauses rather than stretching uniformly, improving intelligibility over two-stage baselines.