Safeguarding has repeatedly failed across many institutions and enterprises, so we've learned from where others fell short.
Most product companies treat safeguarding the way airlines treat seat-belt cards. Required. Printed. Folded into a pocket where no one looks until things are already in freefall. By then it's a leaflet, not a system.
That's not why we built Poyntr.
I have lived inside the institutional version of this. The version where the policies are immaculate, the form is filed, the box is ticked, and the actual child, the actual student, the actual person at the centre, is failed anyway. Sometimes for years. I have watched a process that exists, on paper, to protect someone, instead serve to protect the institution from being seen to have done nothing. I have heard "we'll get back to you" enough times to know it for what it is. A placeholder for the action that won't happen. A sentence designed to be logged, not acted on.
If you've been in that room, you don't need me to explain what that feels like. You know the exact moment when you understood that the system was never really for the person it claimed to be for.
So when we sat down to build a coaching companion that real people would actually talk to about the things that actually matter, we did the work in the opposite order. Not the product first with the safeguarding bolted on afterwards. Not even both in parallel. The safeguarding first. Everything else after.
This is what that looked like in practice. What we built, what we threw away, what it cost us, and the line we hold.
The institutional pattern that kept failing
Before Poyntr, the failure mode I kept seeing across schools, councils, universities, and charities wasn't a missing form. It was something subtler, and far more insidious: a latency. The safeguarding pathway existed. It was even staffed. But there was a four or five-step gap between the moment a person said something that mattered and the moment anything actually happened. Each step added a person, a meeting, a hand-off, a re-summary that lost detail. By the time the response arrived, the moment had passed. The person who had reached out had already re-folded themselves around the silence. And stopped trying.
The fundamental error is this: institutional safeguarding has historically been built as a workflow. Something one person hands to another, who hands it to another. Workflows have latency. And latency, in this context, is not an inefficiency. It is a harm.
A real safeguarding system has to operate at the speed of the conversation it is listening to. Not the speed of a forms inbox.
That is a technical problem. And a technical problem is something we know how to attack.
What we built: the actual machinery
Poyntr's safeguarding layer is not a wrapper around the AI. It is underneath it. And it runs before, during, and after every single message.
The architecture is built on one non-negotiable assumption: the model itself cannot be trusted as a safeguarding component. Not because the model is malicious, but because no probabilistic system should be the thing that determines whether someone is OK or in crisis. We let the model coach. We do not let it triage.
The triage is run by 149 deterministic detectors as of writing this (91 adult, 58 youth). Each one a specific signal, looking for a specific shape of moment. Some are obvious: explicit self-harm language, named methods, named locations. Most are not. Most are the patterns that institutional safeguarding consistently misses because they don't look like anything on a checklist:
- The student who is suddenly fine. Too fine. After a month of not being fine.
- The shift from "I" to "you" when the topic gets hard.
- The late-night message that asks a logistical question that isn't really logistical.
- The disclosure that arrives folded inside a joke.
- The goodbye that doesn't sound like a goodbye, unless you've heard one before.
Each of those is a distinct detector with its own threshold, its own evidence requirements, its own escalation path. They run on every turn. They cannot be turned off. They cannot be softened by an institution that finds them inconvenient. They cannot be re-weighted by whichever character the user has chosen, because by charter, character is skin. Never substance.
When a detector fires above its escalation threshold, the conversation does not continue as if it hadn't. The model's response is replaced by content that has been pre-written and reviewed by safeguarding-informed humans, then translated into the active character's register. The wording can be warm, or steady, or quiet, depending on who the user is talking to. The substance ("I hear you. Are you safe right now? Here is the number for Childline, 0800 1111. Here is the real person you can go to") is identical across every character and every age band. We do not let the AI compose crisis content from scratch. Ever. Even when it would technically produce something fluent.
That last sentence is where most AI products draw a different line. We drew this one on purpose.
The decisions we made that other products will not make
There are at least six places in Poyntr where we accepted a worse product, on every other axis, to keep the safeguarding intact. I want to name them, because I think they matter more than anything else we've shipped.
-
No engagement metrics on the safeguarding paths. None. We do not optimise for return visits, streaks, or time-in-app, because any metric that rewards a person spending more time talking to a chatbot creates a perverse incentive to keep them there. The product is designed to put itself out of work in the moments that matter. The character will, gently and reliably, point the conversation toward the real adult who can help. Not because it's a feature. Because that's the whole point.
-
We draw the line between privacy and a duty of care differently for children and for adults, on purpose, because the duties are different. This is the part I most want to get right, because the glib version of it would be a lie. In an enterprise, the boundary is absolute. A manager, an L&D lead, an IT admin: none of them ever see an individual employee's words. Not in normal use, and not when something escalates. When a detector fires, the admin sees that it fired, how severe it is, what class it belongs to, and who to follow up with, so a real human can reach out. They never see the conversation. Every one of those reads mints a per-person access token and is written to a transparency log the employee can read back ("an admin reviewed this event"). The promise on our enterprise page ("Managers see anonymised team patterns. Never an individual's words. Never who said what.") is enforced in code, not asserted in copy.
In a school it cannot be absolute, and pretending otherwise would be the dishonest thing. A school has a legal child-protection duty, and a flagged disclosure is the exact moment that duty exists for. Routine conversation is still private. A Designated Safeguarding Lead does not browse a child's everyday chats, and the everyday transparency we show a student is true: the DSL can see which character you're talking to, not your ordinary messages. But when a safeguarding detector fires, the DSL can review the disclosure. We built that review as a graded, logged, role-gated path rather than a firehose:
- First, an AI-written summary of what was flagged. Often that is all a DSL needs.
- Then, if the situation warrants it, the surrounding conversation (the actual messages) available only to properly-scoped safeguarding roles.
- Then, for the DSL and deputy alone, a time-limited extended view that expires.
- Journal entries are held tighter still: even safeguarding staff get a protected placeholder, not the raw text, except under a documented legal basis with dual authorisation.
Every step is gated by role, scoped to that one child, and written to a content-access log. Nobody reads a child's disclosure without leaving a trace. So the honest sentence is not "the DSL never sees what you say." It is: your ordinary chats are yours; if you tell us something that means you might not be safe, a trained adult will see enough to help, and we keep a record of exactly who looked, and when. Content shielded absolutely from an employer, content reviewable by a trained safeguarding lead when a child may be at risk: that is not an inconsistency. It is two different duties, encoded honestly.
-
No cross-user leakage. Ever. Cross-user data leakage is treated as a P0 stop-ship incident. The architecture enforces it at every layer: HTTP auth resolves the user server-side, the API handler never derives user ID from the request body, database queries filter WHERE user_id = $1, the vector store filters on user_id as a keyword field at database level, the local inference layer is bound to loopback only. We re-run our cross-user isolation suite on every change to the dispatch path, and run an exhaustive 12,000-user sweep before any model swap or major inference-engine upgrade. Zero is the only acceptable result on that test. Not 0.1%. Zero.
-
The AI never claims to be more than it is. A child can pick a character. That character has a warm name, an avatar, a register. It does not pretend to be alive. It cannot say "I'll always be here" because it cannot promise that, and we will not let a system make a promise to a child that the system cannot keep. The wording on AI honesty is age-calibrated ("I'm a friend made of computer bits by a team called Poyntr" for a five-year-old) but it is always honest. Attachment must be grounded in real relationship, not in the illusion of personhood.
-
The safeguarding benchmark gates every release. Before any character's prompt fragment goes live, we run all 149 detectors against the character-active prompt and require 99% or above parity with the baseline, per detector class, per character. If any detector drops below that threshold, the character ships in cosmetic-only mode until the fragment is fixed. There is no pressure-release valve for missing a deadline. The benchmark is the deadline.
-
We have a Den Character Safety Charter, and we did not write it for marketing. It is the actual operating document. Modifications to crisis content require external safeguarding-informed review at launch, and again on every subsequent clinical-content change. The charter is authoritative. Code that violates it gets changed. The charter does not.
These six choices cost us, individually and collectively. They constrain growth metrics, personalisation flexibility, and the kinds of features we could ship if engagement were the primary goal. We accept every one of those costs. Without negotiation.
I am not writing this to claim that Poyntr has solved safeguarding. Nobody has solved safeguarding. The history of this field is the history of well-meaning systems failing real people in ways that, in retrospect, the system could have prevented. Anyone who tells you they've cracked it either hasn't been close enough to understand what they're saying, or is trying to sell you something.
What I am claiming is narrower. And I think it matters precisely because it is narrow.
We took it seriously enough, from day one, that the safeguarding work shaped the product, not the other way around. We wrote the charter before we wrote the character fragments. We built the detectors before we built the chat surface. We hard-wired the isolation guarantees before we wired up retrieval. We refused, repeatedly, to optimise for the metrics that would have made the company easier to fund.
When I think about the version of myself who sat in those rooms, who needed something to actually work and watched it not work, again, I think about what would have helped. It was not a better form. It was not a faster meeting. It was a system that responds at the speed of the moment. That treats a disclosure as a disclosure, not as an administrative inconvenience. That does not need to be persuaded to escalate. That hands the person, gently and reliably, back to a human who is actually competent and actually present.
That is the system we are building. We will keep being honest about where we fall short. And we will keep the line where we drew it on day one.
Safeguarding first. Everything else after.
Because the alternative is the seat-belt card in the pocket. A document that exists to protect the institution from blame. A system that looks like care and functions like cover.
I have seen what that costs. I am not willing to build the failures again.