Safety
Safety and moderation
Rhetica simulates challenging dialogue, so moderation has to keep the training useful without letting it drift into harmful coaching.
Moderation
How the app stays in bounds
The app can simulate rhetoric for recognition practice. It should not help users carry out harm.
- User input and model output are screened before they appear in the app.
- Requests involving harassment, coercion, impersonation, or exploitation should be blocked or redirected.
- High-stakes guidance stays out of scope even when it is framed as a debate prompt.
Escalation
What to do when a session goes wrong
Users need a clear stop path, and deployments need to know who handles incidents next.
- Users should be able to pause or leave sparring at any point.
- High-risk content should be logged for restricted safety review.
- School deployments should define incident owners and response times before launch.
