@robert_douglass No a dictionary can be too brittle, it makes use of a mixture of gear.
Presidio handles regex-based PII (IBAN, electronic mail, tax IDs). For names, we use 3 NER fashions, all native:
-
The NER part of spaCy’s de_core_news_lg pipeline (referred to as by way of Presidio)
-
Aptitude’s de-ner-large (devoted NER fashion, separate cross — catches “Schmidt, Lisa” comma-form and lowercase felony textual content)
-
GLiNER (zero-shot — upload customized entity sorts at runtime with out retraining)
Every NER fails in a different way, so the 3 vote in combination, for the reason that union has higher recall than any unmarried one.



