Cookiy ResearchAI Voice Assessment12 InterviewsApril 2026

Voice is present. Depth is missing.

Twelve participants tried an AI-voiced personality test. The voice was polite, sometimes welcome, rarely load-bearing — and never the reason a result felt true.

8 of 12 treat the voice as courtesy, not core value
FIG. 01 · HERO IMAGE · AI-PERSONALITY-TEST
8/12VOICE AS COURTESY ONLY
6/12RESULTS FELT GENERIC
5/12VOICE DISRUPTED FOCUS
2/12HARD PRIVACY BOUNDARIES
Executive summary — three sentences
Participants reliably finished the test, but the AI voice was rarely what made results feel true. What landed was question design and nuance, not voice presence; when questions felt tailored, results resonated, and when they felt generic, the voice could not save them. A small but loud minority read constant voice presence as interruption or surveillance, requesting silence or refusing to continue.
8 of 12describe voice as a courtesy feature
6 of 12found results too generic or shallow
5 of 12flagged voice as a focus disruptor
2 of 12withdrew or set a hard privacy boundary
01

AI voice gets tolerated as courtesy, not as the thing that makes results feel true.

8 of 12 described the voice as pleasant-to-neutral but not a reason the results landed. What made results feel personal was question design and wording; the voice rode on top of whatever was underneath. No participant named voice as the decisive positive.

Counter: 2 of 12 (P05, P06) felt voice made the test more interactive and 'easier'. Neither claimed the voice was the source of the result quality — only that it made completion smoother.

Implication. Push voice into an optional 'companion' role and let the question set carry the experience. Measure 'results resonated' against question-set versions, not voice variants. Voice is polish; questions are the product.

"It's very generic. It could apply to anyone."— P04 · skeptical analyst
02

Generic results read as hollow no matter how warm the voice is.

6 of 12 called results generic, shallow, or uncertain. Language included 'could apply to anyone', 'quite childish', and 'three answers is too few'. When the question set felt narrow, the voice did not recover the experience.

Counter: 2 of 12 (P05, P10) experienced results as 'tailored' and 'really good', describing specific passages that matched their self-image. The signal is about question design variance — the same voice product produced both reactions.

Implication. Measure result depth directly: ask participants to name a single phrase from their result that felt specifically theirs. Train on that signal, not on voice-satisfaction metrics that confound delivery with content.

"The survey is quite childish — there's not enough range on these questions. Three answers is too few."— P09 · skeptical analyst
03

Constant voice presence is focus friction — quiet beats empathetic.

5 of 12 described the voice as distracting mid-task. The friction was not tone; it was presence. Participants had to context-switch between answering on-screen and acknowledging the voice. 'Check in at the start / middle / end' was the explicit request, not 'be warmer'.

Counter: 2 of 12 (P05, P06) welcomed the continuous presence, finding it 'more interactive'. Tolerance for voice companionship varies sharply between participants — it is not a tuning knob, it is a toggle.

Implication. Ship a 'start / midpoint / end' voice cadence as the default. Offer 'always on' as an opt-in for participants who prefer companionship. Let participants switch mid-session without restarting.

"I'm trying to answer you and interact with you, although you're AI. I still need to switch my eye to refocus on the task."— P04 · skeptical analyst
04

Open-ended answers are the unmet demand — radio buttons feel childish.

3 of 12 (P04, P09 explicitly; P07 implicitly) pushed back against multiple-choice formats. P09 made the strongest case: an LLM interpreting an open-ended answer would capture 'the nuance of your unique answers' — a capability the current three-button flow cannot touch.

Counter: 9 of 12 completed the multiple-choice format without complaint. Most did not voice a format preference at all; the demand for open-ended input is a minority signal, but it matches the 'results felt generic' complaint from a different angle.

Implication. Add an 'elaborate' micro-prompt after each multiple-choice answer: 'one sentence on why'. Feed both the choice and the prose to the interpretation model. Test whether resulting summaries move the 'felt specifically theirs' metric.

"If there's an AI LLM interpreting what the person says, that would be more personalized."— P09 · skeptical analyst
05

Hard boundaries show up immediately and cannot be coaxed.

2 of 12 (P02, P07) set explicit limits — P02 withdrew over data usage concerns ('I don't want to be used for social media'); P07 asked the voice to be quiet repeatedly. P03 exited mid-session. The pattern is small in count but decisive in behaviour: these participants leave rather than adapt.

Counter: 9 of 12 proceeded without friction, either because they accepted the consent terms outright or had lower privacy sensitivity. The boundary-setters were not persuadable by incentive.

Implication. Treat early-exit rate as a first-class funnel metric, not as failed recruitment. Give participants a visible 'pause voice' and 'end session with partial data' control on every screen.

"I don't consent. I don't want to be used for social media purpose."— P02 · boundary protector

Four archetypes

The Skeptical Analyst 2 of 12 "It's very generic. It could apply to anyone."
— P04 · skeptical analyst
The Boundary Protector 2 of 12 "Can you be quiet because you're distracting?"
— P07 · boundary protector
The Flow-Seeker 4 of 12 "They were tailored to my personality. It kinda resonated with me."
— P05 · flow-seeker
The Pragmatic Consumer 2 of 12 "I think so."
— P01 · pragmatic consumer

Voice is polite. Questions do the work.

The path to results that feel personal runs through question design and open-ended prompts, not warmer voice delivery.

Powered by Cookiy AI

The consumers in this report. You could talk to them.

Cookiy AI runs AI-moderated consumer interviews at scale — in hours, not weeks.

Learn more →