Levenshtein Distance
A character-level similarity algorithm that detects minor modifications in domain names.
What is Levenshtein Distance?
Levenshtein distance (also known as edit distance) measures the minimum number of single-character edits required to transform one string into another. These edits can be:
- •Insertions: Adding a character
- •Deletions: Removing a character
- •Substitutions: Replacing one character with another
How We Use It
When analyzing domains, we calculate the Levenshtein distance between the domain and your monitored keywords. A smaller distance indicates higher similarity, which could signal a phishing attempt.
The distance is then converted to a percentage similarity score for easier interpretation.
Real-World Examples
Character Substitution
Distance: 1The letter 'l' is replaced with the number '1'. This is a common phishing technique.
Character Deletion
Distance: 1One 'o' is removed. Users might not notice the missing character when typing quickly.
Character Insertion
Distance: 2An extra 'o' is inserted. This exploits common typing mistakes.
Multiple Edits
Distance: 2Two 'o' characters replaced with '0'. Still highly similar despite multiple edits.
How Similarity Scoring Works
We convert the edit distance into a percentage that indicates how similar a domain is to your keyword. The exact thresholds adapt based on your keyword length to balance catching threats while avoiding false alarms.
- •Shorter keywords require stricter matching since even one character change creates a different word
- •Longer keywords can tolerate more edits while still being recognizable
- •Thresholds are tuned to minimize false positives while catching real threats
Why We Combine Multiple Methods
Levenshtein distance is powerful but works best as part of a comprehensive detection suite:
- •Homograph attacks: We use specialized Unicode analysis to detect lookalike characters
- •Typosquatting patterns: Pattern matching identifies common typing mistakes
- •Combosquatting detection: Semantic analysis catches brand name combinations
By layering multiple detection techniques, we provide comprehensive protection against the full spectrum of domain-based threats.
Learn More
Explore our other detection methods to understand how we provide comprehensive brand protection: