Notes (4031B)
1 NOTE REGARDING THE TABLES 2 3 The tables of words have been divided into nouns_adjectives and verbs. The tables are divided according to the way in which the stems are inflected. The two 'irregular.txt' files are for any word that is not to be expanded/inflected. 4 5 Note: When adding new words to the tables, it is important to understand WHAT to add. In the case of the irregular.txt tables, the whole word is added. With the rest, only a stem is added. The inflections are then added by the program. 6 7 An example from each table is given below. On the left is the stem, on the right one of the inflections/expansions. 8 9 VERBS 10 11 irregular سیوں گا सियूँगा > [no expansion] 12 regular_consonant_ending ابال उबाल > ابالنا उबालना 13 regular_ending_in_a_o آزما आज़मा > آزمانا आज़माना 14 15 NOUNS/ADJECTIVES 16 17 adjectiveregular_a_i آدھ आध > آدھا आधा 18 irregular آئین आईन > [no expansion] 19 ahmasc آلود आलूद > آلودہ आलूदा 20 aishortmasc افع अफ़ > افعی अफ़इ 21 amasc آٹ आट > آٹا आटा 22 an آٹھو आठव > آٹھواں आठवाँ 23 cfem آتش आतिश > آتشیں आतिशें 24 cmasc آبشار आबशार > آبشاروں आबशारों 25 ifem آباد आबाद > آبادی आबादी 26 ifemshort مورت मूर्त > مورتی मूर्ति 27 imasc آدم आदम > آدمی आदमी 28 o_a_staysfem ابتدا इब्तिदा > ابتداؤں इब्तिदाओं 29 u_staysfem آرز आरज़ > آرزو आरज़ू 30 o_a_staysmasc دانا दाना > داناؤں दानाओं 31 u_staysmasc آنس आँस > آنسو आँसू 32 ui_oi_ai_mascfem ابتدا इब्तिदा > ابتدائی इब्तिदाई 33 34 TABLES IN DATA FOLDER 35 36 There are a number of further tables in order to cope with punctuation, exceptions and special cases in the data folder: 37 38 ignore: adds words that are ignored permanently, 39 punctuation: for conversion of punctuation. 40 misc_beginword.ur_hi: word parts ("prefixes") at the beginning of word compounds 41 misc_endword: word parts ("suffixes") at the end of word compounds 42 special: special cases (no beginword endword) 43 exceptions_beginword_endword.ur_hi: override multiple choices for common words found in the preceding tables. 44 exceptions_beginword.hi_ur: exceptions which need to replaced before the following match statements. 45 exceptions_beginword_endword.hi_ur: override multiple choices for common words found in the preceding tables. 46 pairs_middle_e_o: The Persian Genetive े- (eg मुल्के-मिसर) conflicts with word pairs containing this such as नवासे-नवासियाँ. These word pairs are regular inflections and do not contain a Persian Genetive, so in Urdu script the first word of the pair ends in ے + space and not ِ + space. Word pairs conflicting with the Persian Genetive have been put into the new file 'pairs.middle_e_o'. Word pairs with و at the end of the first word have also been placed here, eg دو ایک दो-एक, as these conflict with the rule regarding the copula و linking words in Urdu. 47 48 CAREFUL: If you add the wrong words to these tables, you can mess up the conversion process! 49 50 THE CONFIG FILES 51 There are two config files. 52 53 config.hi_ur: the config to use when converting Hindi to Urdu. 54 config.ur_hi: the config to use when converting Urdu to Hindi. 55 56 NOTE: The tables in the data folder relating only to one of these two configs are labelled accordingly, ie xxxxx.hi_ur.txt or xxxxx.ur_hi.txt 57 58 Tables which are not labelled in either way relate to both config files. 59 60 !!!THINGS TO KEEP IN MIND!!!! 61 62 * -से needs to be done manually, as this is in most cases the postposition से and not the 'adjective' से. के-से can be done through search/replace. It is better to find the rest of the cases by reading through the text. 63 64 * Also make sure you have gtk2-perl installed! 65 66