transliterate

Transliteration engine
git clone git://lumidify.org/transliterate.git
Log | Files | Refs | README | LICENSE

commit f5414cb7c164023b787fa56d4ab59313df8961e1
parent 3c4e5e3eeec87344d77e63d0eeaff048f58f499f
Author: lumidify <nobody@lumidify.org>
Date:   Wed,  1 Apr 2020 20:09:22 +0200

Add note about canonical decomposition

Diffstat:
Mtransliterate.pl | 3+++
1 file changed, 3 insertions(+), 0 deletions(-)

diff --git a/transliterate.pl b/transliterate.pl @@ -1851,6 +1851,9 @@ Adds the given list of diacritics to the list of diacritics that will be removed from a word when "Retry without diacritics" is pressed in the L<unknown word window|/"UNKNOWN WORD WINDOW">. +Note that all input text is first normalized to the unicode canonical decomposition +form so that diacritics can be removed individually. + There are quite advanced Unicode algorithms that could be used to compare words while ignoring diacritics, but I do not know if it would be possible to use any of those with the current way this engine works.