Understanding Fuzzy Searches in the JRI-Poland Legacy Database
Many users find the way “Fuzzy” searches work in the JRI-Poland Legacy database a bit mysterious. Here’s a simple explanation of what’s happening behind the scenes and why it matters for your research.
What is a Fuzzy Search?
Fuzzy searches use a mathematical approach called the Damerau–Levenshtein distance. You’ve probably seen this in action in spell-checkers on your computer or phone—when you type a word incorrectly, it suggests alternatives that are close to what you typed.
The Damerau–Levenshtein algorithm measures how many changes it would take to turn one word into another. These changes can be:
- Adding a letter
- Removing a letter
- Replacing one letter with another
- Swapping two letters next to each other (a common typing mistake)
For example, changing “KUROPAWTA” to “KUROPATWA” involves swapping two letters. The algorithm counts that as just one change.
Why This Matters for Genealogy Searches
When you search for names or towns in historical records, small errors can prevent important records from showing up in your results. These errors can come from:
- Original scribes writing something down incorrectly
- Volunteers or professionals misreading old handwriting or poor-quality scans/registers or mistyping that information
- Sometimes the spelling of the name you’ve inherited in your family history is uncertain or possible corrupted, perhaps after transitioning through several languages.
Traditional phonetic or soundex searches are good at handling differences in vowel sounds or similar-sounding names. But they often fail when consonants are wrong or letters are swapped. That’s where Fuzzy searches come in – they help you find matches that would otherwise be missed.
What the Different Fuzzy Options Mean
The database offers three levels of Fuzzy searching:
- Fuzzy: Finds names or towns that differ from your search by just one letter or one swapped pair of letters.
- Fuzzier: Allows up to two differences (letters added, removed, replaced, or swapped).
- Fuzziest: Allows differences up to one-third the length of the name.
- Example: For “KONOPIATY” (9 letters), up to 3 differences are allowed.
- For “FRISCHWASSER” (12 letters), up to 4 differences are allowed.
Tips for Best Results
Because historical records are often messy, we recommend trying multiple search types—phonetic, soundex, “starts with,” and the different Fuzzy options. Exact spelling is often unrealistic because surnames and place names were rarely written consistently over time.
Fuzzy searches give you a safety net, helping you uncover records you might otherwise miss due to simple typos or transcription errors. If you do find such errors please let us know so that we can check the original documents and make the necessary corrections in our databases.
Michael Tobias