← Back to detection methods

Levenshtein Distance

A character-level similarity algorithm that detects minor modifications in domain names.

What is Levenshtein Distance?

Levenshtein distance (also known as edit distance) measures the minimum number of single-character edits required to transform one string into another. These edits can be:

  • Insertions: Adding a character
  • Deletions: Removing a character
  • Substitutions: Replacing one character with another

How We Use It

When analyzing domains, we calculate the Levenshtein distance between the domain and your monitored keywords. A smaller distance indicates higher similarity, which could signal a phishing attempt.

The distance is then converted to a percentage similarity score for easier interpretation.

Real-World Examples

Character Substitution

Distance: 1
paypal.compaypa1.com

The letter 'l' is replaced with the number '1'. This is a common phishing technique.

Character Deletion

Distance: 1
google.comgogle.com

One 'o' is removed. Users might not notice the missing character when typing quickly.

Character Insertion

Distance: 2
amazon.comamazoon.com

An extra 'o' is inserted. This exploits common typing mistakes.

Multiple Edits

Distance: 2
microsoft.commicr0s0ft.com

Two 'o' characters replaced with '0'. Still highly similar despite multiple edits.

How Similarity Scoring Works

We convert the edit distance into a percentage that indicates how similar a domain is to your keyword. The exact thresholds adapt based on your keyword length to balance catching threats while avoiding false alarms.

  • Shorter keywords require stricter matching since even one character change creates a different word
  • Longer keywords can tolerate more edits while still being recognizable
  • Thresholds are tuned to minimize false positives while catching real threats

Why We Combine Multiple Methods

Levenshtein distance is powerful but works best as part of a comprehensive detection suite:

  • Homograph attacks: We use specialized Unicode analysis to detect lookalike characters
  • Typosquatting patterns: Pattern matching identifies common typing mistakes
  • Combosquatting detection: Semantic analysis catches brand name combinations

By layering multiple detection techniques, we provide comprehensive protection against the full spectrum of domain-based threats.

Learn More

Explore our other detection methods to understand how we provide comprehensive brand protection: