OpenAI simply quietly dropped one thing price paying shut consideration to. Launched on Hugging Face below an Apache 2.0 license, Privateness Filter is an open, bidirectional token-classification mannequin purpose-built for detecting and redacting personally identifiable data (PII) in textual content. It’s sufficiently small to run in an internet browser or on a laptop computer and quick sufficient for high-throughput knowledge sanitization pipelines.
What It Does
Privateness Filter is a Named Entity Recognition (NER) mannequin however one tuned particularly for the privateness use case. It detects eight classes of delicate spans: account_number, private_address, private_email, private_person, private_phone, private_url, private_date, and secret. The key class covers credential codecs, project-specific token patterns, and high-entropy strings — the mannequin card explicitly calls out missed detection of ‘novel credential codecs’ and ‘secrets and techniques cut up throughout surrounding syntax’ as identified failure modes, which alerts what the class is skilled to focus on.
The supposed use case is obvious: dev groups that want to scrub datasets, scrub logs, or pre-process user-generated content material earlier than it enters a coaching pipeline or will get saved in an information warehouse. As a result of it runs on-premises and on commodity {hardware}, it suits squarely into the rising set of edge-deployable AI instruments that organizations can undertake with out routing delicate knowledge to a third-party API.
The Structure is the Actual Story
Privateness Filter has 1.5 billion complete parameters however solely 50 million energetic parameters at inference time. That hole, which is roughly 30x, is defined fully by the mannequin’s sparse mixture-of-experts (MoE) feed-forward design.
Architecturally, the mannequin is ‘much like gpt-oss, albeit of a smaller dimension.’ It’s constructed on 8 pre-norm transformer blocks with a residual stream width (d_model) of 640. Consideration makes use of grouped-query consideration (GQA) with rotary positional embeddings (RoPE) — 14 question heads over 2 KV heads, that means 7 question heads share every KV head — which reduces the reminiscence footprint of the key-value cache considerably in comparison with normal multi-head consideration. RoPE can be what permits the mannequin’s 128,000-token context window. The feed-forward layers use sparse MoE with 128 complete consultants and top-4 routing per token: for every token, 4 of the 128 consultants are activated, and all different professional parameters stay dormant. That is precisely the mechanism that produces the 30x hole between complete and energetic parameter counts.
A Three-Section Coaching Pipeline
What makes this mannequin architecturally uncommon isn’t just its dimension, however the way it was constructed. Privateness Filter was produced in three distinct phases.
First, it was pretrained autoregressively as an ordinary next-token prediction language mannequin — within the custom of GPT-style decoders. Second, that checkpoint was architecturally transformed: the language-model head was changed with a token-classification head over the privateness label taxonomy, and the eye mechanism was switched from causal (unidirectional) to bidirectional banded consideration with a band dimension of 128, giving every token an efficient context window of 257 tokens (the token itself plus 128 on both sides). Third, the transformed mannequin was post-trained with a supervised classification loss — a definite fine-tuning part utilizing labeled PII knowledge, separate from the architectural conversion step.
The autoregressive pretraining offers the mannequin wealthy language representations discovered from way more knowledge and compute than any task-specific funds would assist. The architectural conversion permits bidirectional context, which is crucial for NER — a reputation like ‘Alice’ in ‘Alice Smith referred to as’ is unambiguous, however with solely left context it might be missed. The supervised post-training then specializes these representations for the privateness detection activity.
In comparison with classical masked-language-model approaches like BERT, this can be a post-training conversion of an autoregressive mannequin fairly than a local masked-LM setup — a significant distinction in how the bottom representations had been shaped.
Constrained Viterbi Decoding As an alternative of Argmax
The label scheme Privateness Filter makes use of is BIOES — Start, Inside, Outdoors, Finish, Single. Every of the 8 privateness classes will get 4 boundary-tagged token courses (B-, I-, E-, S-) plus the background class O, yielding 33 complete output courses per token. For a sequence of size T, the output logits have form [T, 33].
Somewhat than taking a per-token argmax over these 33 logits, which might produce incoherent label sequences like B- adopted instantly by S-, the mannequin runs a constrained Viterbi decoder at inference time. The decoder makes use of linear-chain transition scoring and enforces legitimate BIOES boundary transitions. It scores full label paths utilizing begin, transition, and finish phrases, together with six transition-bias parameters that particularly management: background persistence, span entry, span continuation, span closure, and boundary-to-boundary handoff. This world path optimization improves span coherence and boundary stability by making every token choice rely on sequence-level construction, not simply native logits — significantly precious in noisy or mixed-format textual content.
These six transition-bias parameters are additionally user-tunable at runtime. This brings AI builders to push towards broader, extra contiguous masking for improved recall, or tighten boundaries for improved precision, with out retraining the mannequin.
Key Takeaways
- OpenAI launched Privateness Filter, an open-source PII redaction mannequin below Apache 2.0, able to detecting eight delicate span classes together with account_number, private_person, secret, and extra — deployable on-premises with out routing knowledge to an exterior API.
- The mannequin has 1.5B complete parameters however solely 50M energetic at inference, due to a sparse MoE feed-forward design with 128 consultants and top-4 routing per token — making it light-weight sufficient to run in a browser or on a laptop computer.
- The spine is architecturally much like gpt-oss: 8 pre-norm transformer blocks, d_model=640, grouped-query consideration with RoPE, and a sparse MoE FFN — first pretrained autoregressively, then transformed to a bidirectional banded consideration encoder, then post-trained with a supervised classification loss.
- At inference, it runs constrained Viterbi decoding over a BIOES label scheme fairly than per-token argmax, producing coherent span boundaries with six tunable transition-bias parameters that permit engineers alter the precision/recall tradeoff at runtime with out retraining.
Take a look at the Mannequin Weights. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 130k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as nicely.
Must accomplice with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Join with us

