commit f5414cb7c164023b787fa56d4ab59313df8961e1
parent 3c4e5e3eeec87344d77e63d0eeaff048f58f499f
Author: lumidify <nobody@lumidify.org>
Date: Wed, 1 Apr 2020 20:09:22 +0200
Add note about canonical decomposition
Diffstat:
1 file changed, 3 insertions(+), 0 deletions(-)
diff --git a/transliterate.pl b/transliterate.pl
@@ -1851,6 +1851,9 @@ Adds the given list of diacritics to the list of diacritics that will be removed
from a word when "Retry without diacritics" is pressed in the
L<unknown word window|/"UNKNOWN WORD WINDOW">.
+Note that all input text is first normalized to the unicode canonical decomposition
+form so that diacritics can be removed individually.
+
There are quite advanced Unicode algorithms that could be used to compare words
while ignoring diacritics, but I do not know if it would be possible to use any
of those with the current way this engine works.