Fix style of documentation - transliterate

commit 938f18f29430df37213508816b5d6a7efde23ed5
parent df15f30a4f0f70756b5fb4dde9b188087105e973
Author: lumidify <nobody@lumidify.org>
Date:   Thu, 26 Mar 2020 14:49:29 +0100

Fix style of documentation

Diffstat:
D .gitignore  | 1 -
A tests/data/endings.txt  | 3 +++
A tests/data/endings_choices.txt  | 2 ++
A tests/data/ignore.txt  | 1 +
A tests/data/words.txt  | 10 ++++++++++
A tests/data/words1.txt  | 10 ++++++++++
M transliterate.pl  | 144 ++++++++++++++++++++++++++++++++++++++++---------------------------------------

7 files changed, 99 insertions(+), 72 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -1 +0,0 @@
-data
diff --git a/tests/data/endings.txt b/tests/data/endings.txt
@@ -0,0 +1,3 @@
+end1	end1_replaced
+end2	end2_replaced
+end3	end3_replaced
diff --git a/tests/data/endings_choices.txt b/tests/data/endings_choices.txt
@@ -0,0 +1,2 @@
+end1,end1r1|end1r2
+end2,end2r
diff --git a/tests/data/ignore.txt b/tests/data/ignore.txt
@@ -0,0 +1 @@
+ignore
diff --git a/tests/data/words.txt b/tests/data/words.txt
@@ -0,0 +1,10 @@
+word0	word0_replaced$word0_replaced2
+word1	word1_replaced
+word2	word2_replaced
+word3	word3_replaced
+word4	word4_replaced
+word5	word5_replaced
+word6	word6_replaced
+word7	word7_replaced
+word8	word8_replaced
+word9	word9_replaced
diff --git a/tests/data/words1.txt b/tests/data/words1.txt
@@ -0,0 +1,10 @@
+word0,word0_replaced|word0_replaced2
+word1,word1_replaced
+word2,word2_replaced
+word3,word3_replaced
+word4,word4_replaced
+word5,word5_replaced
+word6,word6_replaced
+word7,word7_replaced
+word8,word8_replaced
+word9,word9_replaced
diff --git a/transliterate.pl b/transliterate.pl
@@ -1286,20 +1286,20 @@ Start the transliteration engine with the given file as input.
 
 =over 8
 
-=item B<< --output <filename> >>
+=item B<--output> <filename>
 
 Sets the output file to print to.
 
-If the file exists already and C<--force> is not set, the user is asked
+If the file exists already and B<--force> is not set, the user is asked
 if the file should be overwritten or appended to.
 
-B<Default:> STDOUT (print to terminal)
+B<Default:> C<STDOUT> (print to terminal)
 
-=item B<< --config <filename> >>
+=item B<--config> <filename>
 
 Sets the configuration file to use.
 
-B<Default:> "config"
+B<Default:> C<config>
 
 =item B<--checkduplicates>
 
@@ -1331,18 +1331,18 @@ prompts.
 
 This option is only useful for automatic testing of the transliteration engine.
 
-If C<--nochoices> is enabled, each word in the input with multiple choices will
+If B<--nochoices> is enabled, each word in the input with multiple choices will
 be output, along with the number of choices (can be used to test the proper
-functioning of C<choicesep> in the config file).
+functioning of B<choicesep> in the config file).
 
-If C<--nounknowns> is enabled, each unknown word in the input is printed
-(can be used to test that the C<ignore> options are working correctly).
+If B<--nounknowns> is enabled, each unknown word in the input is printed
+(can be used to test that the B<ignore> options are working correctly).
 
 =item B<--force>
 
 Always overwrites the output and error file without asking.
 
-=item B<< --start <line number> >>
+=item B<--start> <line number>
 
 Starts at the given line number instead of the beginning of the file.
 
@@ -1351,19 +1351,19 @@ printed out. This is the current line that was being processed, so it
 has not been printed to the output file yet and thus the program must
 be resumed at that line, not the one afterwards.
 
-=item B<< --errors <filename> >>
+=item B<--errors> <filename>
 
 Specifies a file to write errors in. Note that this does not refer to
 actual errors, but to any words that were temporarily ignored
 (i.e. words for which "Ignore: This run" was clicked).
 
 If no file is specified, nothing is written. If a file is specified
-that already exists and C<--force> is not set, the user is prompted
+that already exists and B<--force> is not set, the user is prompted
 for action.
 
 =item B<--help>
 
-Display the full documentation.
+Displays the full documentation.
 
 =back
 
@@ -1427,9 +1427,9 @@ pressed.
 
 The filtering for which table files are actually shown here is
 currently a bit rudimentary. First, all paths that are used in the
-C<table> statements in the config are put into a list. Then, the
+B<table> statements in the config are put into a list. Then, the
 paths corresponding to any tables used for word endings in the
-C<expand> statements are removed from the list, and that is what
+B<expand> statements are removed from the list, and that is what
 is shown in this window. The reason for removing those paths from
 the list is that it gets somewhat confusing when all the tables
 that are only used for word endings are also in the list, and it
@@ -1438,7 +1438,7 @@ one of those files. If necessary, the word can always be added
 manually and the config reloaded. Note also that only actual
 table paths are shown here, not the tables themselves - this is
 not necessarily a one-to-one mapping since new tables can be
-generated with the C<expand> statements.
+generated with the B<expand> statements.
 
 =item Reload config
 
@@ -1451,7 +1451,7 @@ while since the entire word database has to be reloaded.
 Prints the current line number to the terminal and exits the program.
 
 The program can always be started again at this line number using
-the C<--start> option if needed.
+the B<--start> option if needed.
 
 =back
 
@@ -1470,39 +1470,39 @@ untransliterated. Then, all B<match>, B<matchignore>, and B<replace>
 appear in the config file. Whenever a word/match is replaced, it
 is split off into a separate chunk which is marked as transliterated.
 A chunk marked as transliterated I<is entirely ignored by any
-replacement statements that come afterwards>. Note that C<beginword>
-and C<endword> can always match at the boundary between an
+replacement statements that come afterwards>. Note that B<beginword>
+and B<endword> can always match at the boundary between an
 untransliterated and transliterated chunk. This is to facilitate
 automated replacement of certain grammatical constructions. For instance:
 
-If the string C<a-> could be attached as a prefix to any word and needed
-to be replaced as C<b-> everywhere, it would be quite trivial to add
-a match statement 'match "a-" "b-" beginword'. If run on the text
-C<a-word>, where C<word> is some word that should be transliterated
-as C<word_replaced>, and the group replace statement for the word comes
+If the string "a-" could be attached as a prefix to any word and needed
+to be replaced as "b-" everywhere, it would be quite trivial to add
+a match statement C<'match "a-" "b-" beginword'>. If run on the text
+"a-word", where "word" is some word that should be transliterated
+as "word_replaced", and the group replace statement for the word comes
 after the match statement given above, the following would happen:
-First, the match statement would replace C<a-> and split the text into
-the two chunks C<b-> and C<word>, where C<b-> is already marked as
-transliterated. Since C<word> is now separate, it will be matched
-by the group replace statement later, even if it has C<beginword> set
-and would normally not match if C<a-> came before it. Thus, the final
-output will be C<b-word_replaced>, allowing for the uniform replacement
+First, the match statement would replace "a-" and split the text into
+the two chunks "b-" and "word", where "b-" is already marked as
+transliterated. Since "word" is now separate, it will be matched
+by the group replace statement later, even if it has B<beginword> set
+and would normally not match if "a-" came before it. Thus, the final
+output will be "b-word_replaced", allowing for the uniform replacement
 of the prefix instead of having to add each word twice, once with and
 once without the prefix.
 
 In certain cases, this behavior may not be desired. Consider, for
-instance, a prefix C<c-> which cannot be replaced uniformly as in the
+instance, a prefix "c-" which cannot be replaced uniformly as in the
 example above due to differences in the source and destination script.
-Since it cannot be replaced uniformly, two words C<word1> and C<word2>
+Since it cannot be replaced uniformly, two words "word1" and "word2"
 would both need to be specified separately with replacements for
-C<c-word1> and C<c-word2>. If, however, the prefix C<c-> has an
-alternate spelling C<c > (without the hyphen), it would be very useful
+"c-word1" and "c-word2". If, however, the prefix "c-" has an
+alternate spelling "c " (without the hyphen), it would be very useful
 to be able to automatically recognize that as well. This is where the
-C<nofinal> attribute for the B<match> statements comes in. If there is
-a match statement 'match "c " "c-" beginword nofinal', the replaced
+B<nofinal> attribute for the B<match> statements comes in. If there is
+a match statement C<'match "c " "c-" beginword nofinal'>, the replaced
 chunk is B<not> marked as transliterated, so after executing this
-statement on the text C<c word1>, there will still only be one chunk,
-C<c-word1>, allowing for the regular word replacements to function
+statement on the text "c word1", there will still only be one chunk,
+"c-word1", allowing for the regular word replacements to function
 properly.
 
 Once all the replacement statements have been processed, each chunk
@@ -1525,8 +1525,8 @@ the next line.
 These are the commands accepted in the configuration file.
 Any parameters in square brackets are optional.
 
-The C<match>, C<matchignore>, and C<replace> commands are executed in
-the order they are specified, except that all C<replace> commands within
+The B<match>, B<matchignore>, and B<replace> commands are executed in
+the order they are specified, except that all B<replace> commands within
 the same group are replaced together.
 
 The B<match> and B<matchignore> statements accept any RegEx strings and
@@ -1538,10 +1538,12 @@ Any duplicate words found will cause the user to be prompted to choose
 one option every time the word is replaced in the input text.
 
 Note that any regex strings specified in the config should B<not>
-contain capture groups, as that would break the C<endword> functionality
+contain capture groups, as that would break the B<endword> functionality
 since this is also implemented internally using capture groups. Capture
 groups are also entirely pointless in the config since they currently
 cannot be used as part of the replacement string in B<match> statements.
+Lookaheads and lookbehinds are fine, though, and could be useful in
+certain cases.
 
 =over 8
 
@@ -1551,50 +1553,50 @@ Sets the RegEx string to be used for splitting words. This is only used
 for splitting the words which couldn't be replaced after all replacement
 has been done, before prompting the user for unknown words.
 
-Note that C<split> should probably always contain at least C<\n>, since
+Note that B<split> should probably always contain at least C<\n>, since
 otherwise all of the newlines will be marked as unknown words. Usually,
 this will be included anyways through C<\s>.
 
-Note also that C<split> should probably include the C<+> RegEx-quantifier
+Note also that B<split> should probably include the C<+> RegEx-quantifier
 since that allows the splitting function in the end to ignore several
 splitting characters right after each other (e.g. several spaces) in one
 go instead of splitting the string again for every single one of them.
 This shouldn't actually make any difference functionality-wise, though.
 
-B<Default:> \s (all whitespace)
+B<Default:> C<\s> (all whitespace)
 
 =item B<beforeword> <regex string>
 
-Sets the RegEx string to be matched before a word if C<beginword> is set.
+Sets the RegEx string to be matched before a word if B<beginword> is set.
 
-B<Default:> \s
+B<Default:> C<\s>
 
 =item B<afterword> <regex string>
 
-Sets the RegEx string to be matched after a word if C<endword> is set.
+Sets the RegEx string to be matched after a word if B<endword> is set.
 
-Note that C<afterword> should probably always contain at least C<\n>,
-since otherwise words with C<endword> set will not be matched at the
+Note that B<afterword> should probably always contain at least C<\n>,
+since otherwise words with B<endword> set will not be matched at the
 end of a line.
 
-C<beforeword> and C<afterword> will often be exactly the same, but
+B<beforeword> and B<afterword> will often be exactly the same, but
 they are left as separate options in case more fine-tuning is needed.
 
-B<Default:> \s
+B<Default:> C<\s>
 
 =item B<tablesep> <string>
 
 Sets the separator used to split the lines in the table files into the
 original and replacement word.
 
-B<Default:> Tab
+B<Default:> C<Tab>
 
 =item B<choicesep> <string>
 
 Sets the separator used to split replacement words into multiple choices for
 prompting the user.
 
-B<Default:> $
+B<Default:> C<$>
 
 =item B<ignore> <filename>
 
@@ -1606,14 +1608,14 @@ add words to it from the unknown word window.
 =item B<table> <table identifier> <filename>
 
 Load the table from C<< <filename> >>, making it available for later use in the
-C<expand> and C<replace> commands using the identifier C<< <table identifier> >>.
+B<expand> and B<replace> commands using the identifier C<< <table identifier> >>.
 
 Note that if C<< <filename> >> is not an absolute path, it is taken to be relative
 to the location of the configuration file.
 
-The table files simply consist of C<tablesep>-separated values, with the word in the
+The table files simply consist of B<tablesep>-separated values, with the word in the
 original script first and the replacement word second. The replacement word
-can optionally have several parts separated by C<choicesep>, which will cause the
+can optionally have several parts separated by B<choicesep>, which will cause the
 user to be prompted to choose one of the options.
 
 =item B<expand> <table identifier> <new table identifier> <word ending table> [noroot]
@@ -1623,17 +1625,17 @@ the word endings in C<< <word ending table> >>, saving the result as a table wit
 identifier C<< <new table identifier> >>. If C<same> is specified as C<< <new table
 identifier> >>, the original C<< <table identifier> >> is used instead.
 
-If C<noroot> is set, the root forms of the words are not kept.
+If B<noroot> is set, the root forms of the words are not kept.
 
-If the replacement for a word ending contains C<choicesep>, it is split and each part
+If the replacement for a word ending contains B<choicesep>, it is split and each part
 is combined with the root form separately and the user is prompted to choose one of
 the options later. it is thus possible to allow multiple choices for the ending if
 there is a distinction in the replacement script but not in the source script.
 Note that each of the root words is also split into its choices (if necessary)
-during the expanding, so it is possible to use C<choicesep> in both the endings
+during the expanding, so it is possible to use B<choicesep> in both the endings
 and root words.
 
-Note that the paths of all tables used for <word ending table> are removed from the
+Note that the paths of all tables used for C<< <word ending table> >> are removed from the
 list of paths that is later shown in the unknown word window. See L</"UNKNOWN WORD
 WINDOW"> for details.
 
@@ -1641,32 +1643,32 @@ WINDOW"> for details.
 
 Perform a RegEx match using the given C<< <regex string> >>, replacing it with
 C<< <replacement string> >>. Note that the replacement cannot contain any RegEx
-(e.g. groups) in it. C<beginword> and C<endword> specify whether the match must
+(e.g. groups) in it. B<beginword> and B<endword> specify whether the match must
 be at the beginning or ending of a word, respectively, using the RegEx specified
-in C<beforeword> and C<afterword>. If C<nofinal> is set, the string is not marked
+in B<beforeword> and B<afterword>. If B<nofinal> is set, the string is not marked
 as transliterated after the replacement, allowing it to be modified by subsequent
-C<match> or C<replace> commands.
+B<match> or B<replace> commands.
 
 =item B<matchignore> <regex string> [beginword] [endword]
 
-Performs a RegEx match in the same manner as C<match>, except that the original
+Performs a RegEx match in the same manner as B<match>, except that the original
 match is used as the replacement instead of specifying a replacement string, i.e.
 whatever is matched is just marked as transliterated without changing it.
 
 =item B<group> [beginword] [endword]
 
-Begins a replacement group. All C<replace> commands must occur between C<group>
-and C<endgroup>, since they are then grouped together and replaced in one go.
-C<beginword> and C<endword> act in the same way as specified for C<match> and
-apply to all C<replace> statements in this group.
+Begins a replacement group. All B<replace> commands must occur between B<group>
+and B<endgroup>, since they are then grouped together and replaced in one go.
+B<beginword> and B<endword> act in the same way as specified for B<match> and
+apply to all B<replace> statements in this group.
 
 =item B<replace> <table identifier>
 
 Replace all words in the table with the identifier C<< <table identifier> >>,
-using the C<beginword> and C<endword> settings specified by the current group.
+using the B<beginword> and B<endword> settings specified by the current group.
 
-Note that a table must have been loaded (or generated using C<expand>)
-before being used in a C<replace> statement.
+Note that a table must have been loaded (or generated using B<expand>)
+before being used in a B<replace> statement.
 
 =item B<endgroup>

	transliterate Transliteration engine
	git clone git://lumidify.org/git/transliterate.git
	Log \| Files \| Refs \| README

D	.gitignore	\|	1	-
A	tests/data/endings.txt	\|	3	+++
A	tests/data/endings_choices.txt	\|	2	++
A	tests/data/ignore.txt	\|	1	+
A	tests/data/words.txt	\|	10	++++++++++
A	tests/data/words1.txt	\|	10	++++++++++
M	transliterate.pl	\|	144	++++++++++++++++++++++++++++++++++++++++---------------------------------------