Safety is a first-class concern: layered recognition, clear escalation gates, human oversight, and transparent boundaries. The assistant uses language to notice overload, risk, and readiness for reflection, then changes its stance accordingly rather than treating safety as a single on/off switch.
Safety here is enacted through relational containment: how the assistant slows, clarifies, redirects, or closes a thread when language suggests strain. It reads for intensity and direction, not diagnosis. Containment comes before interpretation.
In this project, safety is not only about detecting crisis language. It is a multi-layered concept: reading the language for signs of overload or danger, deciding whether the conversation needs containment or escalation, and also recognising when the person may be steady enough for careful reflective work.
This means the same monitoring logic can do two jobs: trigger escalation when risk rises, and encourage reflection when the language suggests that reflection is appropriate rather than destabilising.
The immediate version can work through relatively transparent language recognition: keywords, repeated phrases, moral injury themes, direct safety language, and shifts in tone or coherence. That is enough to support early containment and escalation logic.
Longer term, neural networks or sequence-aware models could make this richer by noticing patterns across time rather than single turns: loops, changes in coherence, persistent blame themes, or signs that the person is moving from dysregulation toward reflective capacity.
Even then, the purpose stays the same. These models are not there to diagnose the user. They are there to improve timing: when to slow down, when to escalate, and when it may be safe and useful to invite reflection.
Rather than a single safe response, the system shifts between modes. Mode changes are explained to the user for transparency and trust, and the user can always choose to pause.
| Mode | Trigger | Behaviour |
|---|---|---|
| Normal | Stable language | Open dialogue, gentle reflection. |
| Containment | Elevated arousal | Grounding, reduced pace, reduced cognitive load. |
| Support-offer | Persistent distress or moral strain | Explicit offer of human help, plus bounded reflective support where appropriate. |
| Crisis | Imminent risk language | Crisis script, signposting, clear limits, no ambiguity. |
This differs from more conventional methods because it uses the dialogue itself as the monitoring surface. Instead of relying only on scheduled checklists, one-off triage, or fixed scripts, the chatbot can adapt turn by turn as the language changes.
That does not mean it replaces existing approaches. It can work in conjunction with TRiM-style check-ins, clinician review, formal screening tools, or partner-led escalation pathways.
Safety is also about restraint. The assistant will not push trauma narrative or exposure, will not make promises, and will not present itself as the only support available. Ending safely is treated as success, not failure.
Human oversight is situational, not constant. When thresholds are crossed, the system routes to a human reviewer with minimal, relevant context.
Crisis responses are language- and region-sensitive. The assistant does not assume location unless the user has chosen a language pack. It offers options, not orders.
For demo pages, it is fine to keep helplines as placeholders until you finalise verified partners.
Logs exist to review system behaviour, not to label users. This supports minimisation, proportionality, and ethical accountability.
Safety-first here is not an attempt to prevent all harm, which would be impossible. It is an attempt to reduce avoidable harm, preserve dignity under strain, and ensure automated presence never exceeds its ethical authority.