Page 1 of 1

Transliteration Latin -> Hebrew

Posted: Tue Feb 17, 2015 9:23 am
by logan
There is a new way to search Hebrew and Yiddish sources, such as yizkor books. It has always been possible to enter Hebrew letters directly, either via your own keyboard or by clicking the keyboard icon to the left of the search box. However, that required that you knew how to spell your search term in Hebrew or Yiddish. In case you do not know, there is now an option for automated transliteration, so you can type a search term in Latin letters and find matches in Hebrew and Yiddish. To enable this option, change "Add Latin -> Cyrillic" below the search box to "Add Latin -> Cyrillic + Hebrew." Or, if you only want to see Hebrew and Yiddish matches (none in Latin or Cyrillic letters), try "Only Latin -> Hebrew." Note that transliteration only works with single-word search terms and the Regular Match option (not D-M Soundex or OCR-Adjusted), and is still limited by the accuracy of the OCR software used to convert scanned documents to (Hebrew, Yiddish) text.

This system is similar to the automated Latin -> Cyrillic transliteration I added several years ago when starting to index Russian Empire sources. However, it is not enabled by default, at least for the time being. Other differences are that it does not yet handle diacritics or have as many language-specific transliteration rules. For example, if you search for "Debica," the Cyrillic transliteration will consider the possibility that you meant Dębica, but the Hebrew transliteration will not. And the Hebrew transliteration has some specialized rules for search terms of Polish and German linguistic origin, but not Romanian, for example.

The main difference you are likely to encounter, though, is the number of false positives. Because of Hebrew written without vowels, you are more likely to encounter false positives than with the Cyrillic transliteration. I have tried to balance capturing the maximum number of spelling variations with minimizing false positives and will likely make additional adjustments.

You might also find some transliterated searches to be much slower, depending on the search term. There is no need to notify me of these slow searches, as I know the cause and will try to improve their speed. (Essentially, it happens for search terms with a very large number of possible transliterated spelling variations.)

Please do, however, let me know if you see any strange results from transliterated searches, such as false positives that are very different than correct transliterations or common surnames not returning any transliterated matches. Thanks to Israel Pickholtz for testing so far.

I will very likely modify the transliteration system based on further testing, but I hope it will already benefit many researchers.

Logan