shubhamintech

The full-session evaluation framing is the right call - most teams don't realize the failure happened in turn 2 until they've spent 3 hours blaming the model. One thing worth thinking about as you grow: connecting caught regressions to production conversation data. When your simulation flags a new failure mode, being able to say "this pattern has already surfaced X times in prod this week" cuts the prioritization debate in half. Does Cekura currently let you correlate simulation failures back to real user sessions, or is that still a manual step?

show comments
FailMore

Any ideas how to solve the agent's don't have total common sense problem?

I have found when using agents to verify agents, that the agent might observe something that a human would immediately find off-putting and obviously wrong but does not raise any flags for the smart-but-dumb agent.

show comments
guerython

we treat each scenario as an explicit state machine. every conversation has checkpoints (ask for name, verify dob, gather phone) and the case only passes if each checkpoint flips true before the flow moves on. that means if the agent hallucinates, skips the verification step, or escalates to a human too early you get a session-level failure, not just a happily-green last turn. logging which checkpoint stayed false makes regressions obvious when you swap prompts/models.

chrismychen

How do you handle sessions where the correct outcome is an incomplete flow — e.g. the agent correctly refuses to move forwards because the caller failed verification, or correctly escalates to a human?

show comments
jamram82

Testing voice agents would require some kind of knowledge integration. Do you have any plans to support custom knowledge bases for test voice agents ?

show comments
niko-thomas

We've tried a few platforms for voice agent testing and Cekura has been the best by a long shot. Keep up the great work!

sidhantkabra

Was really fun building this - would love feedback from the HN community and get insights on your current process.

moinism

congrats on the launch! do you guys have anything planned to test chat agents directly in the ui? I have an agent, but no exposed api so can't really use your product even though I have a genuine need.

show comments
michaellee8

Interesting, I have built https://github.com/michaellee8/voice-agent-devkit-mcp exactly for this, launch a chromium instance with virtual devices powered by Pulsewire and then hook it up with tts and stt so that playwright can finally have mouth and ears. Any chance we can talk?

show comments