Detect whether a name input is a real human name or fake / junk. Fast (microseconds per call), self-contained (embedded model), runs entirely offline. Targets net8.0 and net9.0.
dotnet add package NameGuard.ML.Coreusing NameGuard.ML.Core;
var guard = new NameGuard();
var result = guard.Check("Mary Johnson");
result.IsReal // true
result.Score // 1.00 (0..1, higher = more likely real)
result.Reason // "ML model"Drop-in [RealName] attribute:
using System.ComponentModel.DataAnnotations;
using NameGuard.ML.Core;
public sealed class RealNameAttribute : ValidationAttribute
{
private static readonly NameGuard Guard = new();
public float MinScore { get; set; } = 0.5f;
public override bool IsValid(object? value)
{
if (value is not string name) return true;
var result = Guard.Check(name);
return result.IsReal && result.Score >= MinScore;
}
}
public sealed class SignupRequest
{
[Required, RealName(MinScore = 0.7f, ErrorMessage = "Please enter a valid name.")]
public string FullName { get; set; } = "";
}ModelState.IsValid is now false for asdfgh, qwerty, aaaa, etc.
Two-stage pipeline:
- Heuristic fast-path (~0.2 µs) — catches obvious junk: keyboard rolls (
qwerty), no vowels (xkqzpw), repeating chars (aaaa), all-digits, length bounds. - ML.NET FastTree classifier (~22 µs) — character n-grams (1–4, TF-IDF) trained on 175 countries × 40 tokens = 17,500 real samples.
The trained model (~1 MB) is embedded in the assembly. No runtime downloads, no Python, no external services.
Mary Johnson -> REAL (1.00) ML model
Khaled Hossain -> REAL (1.00) ML model
Yuki Tanaka -> REAL (1.00) ML model
Nikolai Lobachevsky -> REAL (1.00) ML model
asdfgh -> FAKE (0.00) Keyboard roll detected
xkqzpw -> FAKE (0.00) No vowels
aaaaaaa -> FAKE (0.00) Repeating character
12345 -> FAKE (0.00) No letters
| Holdout (20%) | 5-fold CV | |
|---|---|---|
| AUC | 0.9997 | 0.9996 |
| Accuracy | 0.9942 | 0.9919 |
| F1 | 0.9942 | 0.9919 |
Verified on 197 names from every UN member state + observers: 197/197 REAL at score ≥ 0.98.
namespace NameGuard.ML.Core;
public interface INameGuard
{
NamePrediction Check(string name);
}
public sealed class NameGuard : INameGuard, IDisposable
{
public NameGuard(float threshold = 0.5f); // threshold must be in [0, 1]
public NameGuard(Stream modelStream, float threshold = 0.5f);
public NamePrediction Check(string name); // thread-safe
public void Dispose();
}
public sealed class NamePrediction
{
public bool IsReal { get; } // Score >= threshold
public float Score { get; } // 0..1
public string Reason { get; } // Why this verdict was returned
}- Single-token names (
Akihito,Pyotr) score lower — pass full names where possible. Multi-token inputs are scored both as a whole and per token, so rare components don't drag down strong ones. - Dictionary-word combos (
Lorem Ipsum,Test Test) pass — layer a stop-word check above if needed. - Latin-script only — Cyrillic / CJK / Arabic / Greek / Hebrew etc. are rejected with reason
"Non-Latin script". Romanize before callingCheck().
Check() is thread-safe — a single NameGuard instance can be shared across requests / threads. It pools PredictionEngine instances internally and grows the pool on contention. Call Dispose() on shutdown.
- Source & issues — https://github.com/encryptedtouhid/NameGuard.ML
- Changelog — CHANGELOG.md
- License — MIT