transliterate_data

Data for Urdu<->Hindi transliteration
git clone git://lumidify.org/transliterate_data.git
Log | Files | Refs | README

ExplanationForAdditionalFilesInHindiUrduTransliteration (1128B)


      1 In the HindiToUrdu transliteration,the order of tables has been rearranged in the list and a new table 'pairs.hi_ur' has been added.
      2 
      3 One problem is the بے, which is converted to बे. When converting back, the program cannot recognize if it is a بے  as in بےشک (बेशक)or بی as in بیٹا (बेटा).
      4 
      5 Therefore misc_beginword.hi_ur.txt containing the بے replacement has been shifted to after replacement of the group of tables comprising of adjective_nouns and verbs.
      6 
      7 However, now it unable to find words such as बेशक; although शक is in the nouns_adjectives/cmasc.txt file, it is not recognized because it begins with बे.
      8 
      9 Another problematic rule is the Persian Genetive े-  (मुल्के-मिसर), which conflicts with word pairs containing this such as नवासे-नवासियाँ. These word pairs are regular inflections and do not contain a Persian Genetive, so in Urdu script the first word of the pair ends in ے + space and not ِ  + space.
     10 
     11 Therefore word pairs conflicting with the Persian Genetive have been put into the new file 'pairs.hi_ur'.
     12