Blog
Why we're building Stridel
Content moderation has drifted into two very different directions.
One is keyword filters. They’re fast and inexpensive, but they don’t understand language. They’re easy to bypass with misspellings, slang or creative wording, and they regularly flag messages simply because they contain the wrong words.
The other is LLMs. They understand language remarkably well, but they’re also asked to reason about every single message they moderate. That comes with latency, cost and a level of unpredictability that’s difficult to justify for what is, at its core, a classification task.
We’re trying to implement a solution that combines the best of both worlds.
Stridel is built around discriminative AI. Moderation isn’t about generating text. It’s about deciding whether a message violates a policy.
Communities should define the boundary
One of the biggest inspirations behind Stridel was Perspective API from Google. It proved that semantic moderation is dramatically more capable than keyword matching. But what it didn’t solve was ownership.
A gaming community, a classroom and a mental health support group don’t share the same definition of acceptable content. They shouldn’t have to.
Most moderation systems expose a fixed set of categories such as toxicity, sexual content or identity attacks. Communities then have to adapt their rules to the model.
We think it should work the other way around. In Stridel, a policy is simply written in natural language. An example policy might look like this:
“Do not sell or advertise anything.”
The model learns what that policy means instead of looking for words that are commonly associated with it. Messages are classified based on meaning, not vocabulary.
If moderators disagree with a decision, they can nudge the decision boundary around similar cases. The adjustment takes effect almost instantly, allowing the system to adapt to the community instead of asking the community to adapt to the system.
Why discriminative AI?
LLMs are exceptional general-purpose models, but general-purpose isn’t always the right tool.
For content moderation, we care about throughput, consistency, explainability and cost just as much as raw accuracy. A moderation system may classify millions of messages every day, where a few hundred milliseconds quickly becomes significant and every generated token has a cost.
Discriminative models are purpose-built for this kind of work. They classify instead of reason, making them faster, more predictable and substantially cheaper to operate.
They’re also easier to explain. Stridel includes an evidence-gathering algorithm based on nearest-neighbor search that surfaces similar examples, helping moderators understand why a decision was made instead of treating the model as a black box.
The goal
We’re not trying to replace moderators, and we’re not trying to build another LLM wrapper.
We’re building a moderation engine that understands meaning instead of keywords, lets communities define their own standards, and is efficient enough to run wherever conversations happen.
Moderation shouldn’t be dictated by whoever trained the model. It should be defined by the communities that use it.