“Scratch that.” Two words that should delete your last sentence — unless you were dictating a note about literally scratching a car, in which case they should stay. Telling those apart, live, is the entire problem.
Every voice editor hits the same trap: the words you use to control it are also words you might want to write. “New paragraph”, “delete that”, “make it a list” — all valid prose. Treat them as commands by default and you mangle real text; treat them as text and you ignore the user. No keyword is safe in every context.
Instead of matching phrases, Sona scores intent. A command candidate has to clear several bars at once: a short, self-contained utterance; a clear pause before and after; and a low probability that it fits the surrounding sentence as content. “…and then scratch that idea” mid-flow stays as text; a clipped “scratch that.” after a beat becomes an edit.
When confidence is borderline, we bias toward leaving your words alone. A missed command costs one keystroke to fix. A false one silently eats a sentence — much worse.
Voice edits are applied as discrete, undoable steps, never destructive rewrites. Say “scratch that”, change your mind, and a single undo brings it back. The model can be wrong; the system just makes being wrong cheap.