Question 1

What is the difference between Hamming distance and Levenshtein distance?

Accepted Answer

Hamming distance counts positions where two equal-length strings differ and does not allow insertions or deletions. Levenshtein (edit) distance counts the minimum number of single-character insertions, deletions, or substitutions needed to transform one string into another and works on strings of unequal length. Use Hamming for fixed-length codes and Levenshtein for general text comparison.

Question 2

Can Hamming distance be used for non-binary strings?

Accepted Answer

Yes. The definition applies to any alphabet — you simply count positions where the two strings have different symbols. For example, comparing DNA sequences over the alphabet {A, T, G, C} or comparing decimal digit strings are both valid applications. The formula is unchanged; only the symbol set differs.

Question 3

How is Hamming distance used in error correction?

Accepted Answer

A binary code with minimum Hamming distance d_min can detect up to d_min − 1 bit errors and correct up to ⌊(d_min − 1) / 2⌋ bit errors. For example, a code with d_min = 3 (like the classic Hamming(7,4) code) can detect 2-bit errors and correct any single-bit error by mapping a received word to the nearest valid codeword.

Question 4

What does a normalized Hamming distance of 0.5 mean?

Accepted Answer

A normalized Hamming distance of 0.5 means exactly half of the positions in the two strings differ. For a random pair of binary strings, the expected normalized Hamming distance is 0.5, so values near 0.5 suggest the strings share no more similarity than random chance would produce.

Question 5

How is Hamming distance applied in machine learning?

Accepted Answer

In machine learning, Hamming distance is used as a similarity metric for binary feature vectors in k-nearest-neighbor classification, locality-sensitive hashing for approximate nearest-neighbor search, and evaluating multi-label classifiers where each label is a binary variable. It is also used in genetic algorithms to measure the diversity of binary-encoded candidate solutions in a population.

Hamming Distance Calculator

Formula

How it works

Worked example

Limitations & notes

Frequently asked questions

What is the difference between Hamming distance and Levenshtein distance?

Can Hamming distance be used for non-binary strings?

How is Hamming distance used in error correction?

What does a normalized Hamming distance of 0.5 mean?

How is Hamming distance applied in machine learning?