Levenshtein Distance

A character-level similarity algorithm that detects minor modifications in domain names.

What is Levenshtein Distance?

Levenshtein distance (also known as edit distance) measures the minimum number of single-character edits required to transform one string into another. These edits can be:

•Insertions: Adding a character
•Deletions: Removing a character
•Substitutions: Replacing one character with another

How We Use It

When analyzing domains, we calculate the Levenshtein distance between the domain and your monitored keywords. A smaller distance indicates higher similarity, which could signal a phishing attempt.

The distance is then converted to a percentage similarity score for easier interpretation.

Real-World Examples

Character Substitution

Distance: 1

paypal.com → paypa1.com

The letter 'l' is replaced with the number '1'. This is a common phishing technique.

Character Deletion

Distance: 1

google.com → gogle.com

One 'o' is removed. Users might not notice the missing character when typing quickly.

Character Insertion

Distance: 2

amazon.com → amazoon.com

An extra 'o' is inserted. This exploits common typing mistakes.

Multiple Edits

Distance: 2

microsoft.com → micr0s0ft.com

Two 'o' characters replaced with '0'. Still highly similar despite multiple edits.

How Similarity Scoring Works

We convert the edit distance into a percentage that indicates how similar a domain is to your keyword. The exact thresholds adapt based on your keyword length to balance catching threats while avoiding false alarms.

•Shorter keywords require stricter matching since even one character change creates a different word
•Longer keywords can tolerate more edits while still being recognizable
•Thresholds are tuned to minimize false positives while catching real threats

Why We Combine Multiple Methods

Levenshtein distance is powerful but works best as part of a comprehensive detection suite:

•Homograph attacks: We use specialized Unicode analysis to detect lookalike characters
•Typosquatting patterns: Pattern matching identifies common typing mistakes
•Combosquatting detection: Semantic analysis catches brand name combinations

By layering multiple detection techniques, we provide comprehensive protection against the full spectrum of domain-based threats.

Learn More

Explore our other detection methods to understand how we provide comprehensive brand protection:

Homograph Attacks →Typosquatting →Combosquatting →