commit 938f18f29430df37213508816b5d6a7efde23ed5
parent df15f30a4f0f70756b5fb4dde9b188087105e973
Author: lumidify <nobody@lumidify.org>
Date: Thu, 26 Mar 2020 14:49:29 +0100
Fix style of documentation
Diffstat:
7 files changed, 99 insertions(+), 72 deletions(-)
diff --git a/.gitignore b/.gitignore
@@ -1 +0,0 @@
-data
diff --git a/tests/data/endings.txt b/tests/data/endings.txt
@@ -0,0 +1,3 @@
+end1 end1_replaced
+end2 end2_replaced
+end3 end3_replaced
diff --git a/tests/data/endings_choices.txt b/tests/data/endings_choices.txt
@@ -0,0 +1,2 @@
+end1,end1r1|end1r2
+end2,end2r
diff --git a/tests/data/ignore.txt b/tests/data/ignore.txt
@@ -0,0 +1 @@
+ignore
diff --git a/tests/data/words.txt b/tests/data/words.txt
@@ -0,0 +1,10 @@
+word0 word0_replaced$word0_replaced2
+word1 word1_replaced
+word2 word2_replaced
+word3 word3_replaced
+word4 word4_replaced
+word5 word5_replaced
+word6 word6_replaced
+word7 word7_replaced
+word8 word8_replaced
+word9 word9_replaced
diff --git a/tests/data/words1.txt b/tests/data/words1.txt
@@ -0,0 +1,10 @@
+word0,word0_replaced|word0_replaced2
+word1,word1_replaced
+word2,word2_replaced
+word3,word3_replaced
+word4,word4_replaced
+word5,word5_replaced
+word6,word6_replaced
+word7,word7_replaced
+word8,word8_replaced
+word9,word9_replaced
diff --git a/transliterate.pl b/transliterate.pl
@@ -1286,20 +1286,20 @@ Start the transliteration engine with the given file as input.
=over 8
-=item B<< --output <filename> >>
+=item B<--output> <filename>
Sets the output file to print to.
-If the file exists already and C<--force> is not set, the user is asked
+If the file exists already and B<--force> is not set, the user is asked
if the file should be overwritten or appended to.
-B<Default:> STDOUT (print to terminal)
+B<Default:> C<STDOUT> (print to terminal)
-=item B<< --config <filename> >>
+=item B<--config> <filename>
Sets the configuration file to use.
-B<Default:> "config"
+B<Default:> C<config>
=item B<--checkduplicates>
@@ -1331,18 +1331,18 @@ prompts.
This option is only useful for automatic testing of the transliteration engine.
-If C<--nochoices> is enabled, each word in the input with multiple choices will
+If B<--nochoices> is enabled, each word in the input with multiple choices will
be output, along with the number of choices (can be used to test the proper
-functioning of C<choicesep> in the config file).
+functioning of B<choicesep> in the config file).
-If C<--nounknowns> is enabled, each unknown word in the input is printed
-(can be used to test that the C<ignore> options are working correctly).
+If B<--nounknowns> is enabled, each unknown word in the input is printed
+(can be used to test that the B<ignore> options are working correctly).
=item B<--force>
Always overwrites the output and error file without asking.
-=item B<< --start <line number> >>
+=item B<--start> <line number>
Starts at the given line number instead of the beginning of the file.
@@ -1351,19 +1351,19 @@ printed out. This is the current line that was being processed, so it
has not been printed to the output file yet and thus the program must
be resumed at that line, not the one afterwards.
-=item B<< --errors <filename> >>
+=item B<--errors> <filename>
Specifies a file to write errors in. Note that this does not refer to
actual errors, but to any words that were temporarily ignored
(i.e. words for which "Ignore: This run" was clicked).
If no file is specified, nothing is written. If a file is specified
-that already exists and C<--force> is not set, the user is prompted
+that already exists and B<--force> is not set, the user is prompted
for action.
=item B<--help>
-Display the full documentation.
+Displays the full documentation.
=back
@@ -1427,9 +1427,9 @@ pressed.
The filtering for which table files are actually shown here is
currently a bit rudimentary. First, all paths that are used in the
-C<table> statements in the config are put into a list. Then, the
+B<table> statements in the config are put into a list. Then, the
paths corresponding to any tables used for word endings in the
-C<expand> statements are removed from the list, and that is what
+B<expand> statements are removed from the list, and that is what
is shown in this window. The reason for removing those paths from
the list is that it gets somewhat confusing when all the tables
that are only used for word endings are also in the list, and it
@@ -1438,7 +1438,7 @@ one of those files. If necessary, the word can always be added
manually and the config reloaded. Note also that only actual
table paths are shown here, not the tables themselves - this is
not necessarily a one-to-one mapping since new tables can be
-generated with the C<expand> statements.
+generated with the B<expand> statements.
=item Reload config
@@ -1451,7 +1451,7 @@ while since the entire word database has to be reloaded.
Prints the current line number to the terminal and exits the program.
The program can always be started again at this line number using
-the C<--start> option if needed.
+the B<--start> option if needed.
=back
@@ -1470,39 +1470,39 @@ untransliterated. Then, all B<match>, B<matchignore>, and B<replace>
appear in the config file. Whenever a word/match is replaced, it
is split off into a separate chunk which is marked as transliterated.
A chunk marked as transliterated I<is entirely ignored by any
-replacement statements that come afterwards>. Note that C<beginword>
-and C<endword> can always match at the boundary between an
+replacement statements that come afterwards>. Note that B<beginword>
+and B<endword> can always match at the boundary between an
untransliterated and transliterated chunk. This is to facilitate
automated replacement of certain grammatical constructions. For instance:
-If the string C<a-> could be attached as a prefix to any word and needed
-to be replaced as C<b-> everywhere, it would be quite trivial to add
-a match statement 'match "a-" "b-" beginword'. If run on the text
-C<a-word>, where C<word> is some word that should be transliterated
-as C<word_replaced>, and the group replace statement for the word comes
+If the string "a-" could be attached as a prefix to any word and needed
+to be replaced as "b-" everywhere, it would be quite trivial to add
+a match statement C<'match "a-" "b-" beginword'>. If run on the text
+"a-word", where "word" is some word that should be transliterated
+as "word_replaced", and the group replace statement for the word comes
after the match statement given above, the following would happen:
-First, the match statement would replace C<a-> and split the text into
-the two chunks C<b-> and C<word>, where C<b-> is already marked as
-transliterated. Since C<word> is now separate, it will be matched
-by the group replace statement later, even if it has C<beginword> set
-and would normally not match if C<a-> came before it. Thus, the final
-output will be C<b-word_replaced>, allowing for the uniform replacement
+First, the match statement would replace "a-" and split the text into
+the two chunks "b-" and "word", where "b-" is already marked as
+transliterated. Since "word" is now separate, it will be matched
+by the group replace statement later, even if it has B<beginword> set
+and would normally not match if "a-" came before it. Thus, the final
+output will be "b-word_replaced", allowing for the uniform replacement
of the prefix instead of having to add each word twice, once with and
once without the prefix.
In certain cases, this behavior may not be desired. Consider, for
-instance, a prefix C<c-> which cannot be replaced uniformly as in the
+instance, a prefix "c-" which cannot be replaced uniformly as in the
example above due to differences in the source and destination script.
-Since it cannot be replaced uniformly, two words C<word1> and C<word2>
+Since it cannot be replaced uniformly, two words "word1" and "word2"
would both need to be specified separately with replacements for
-C<c-word1> and C<c-word2>. If, however, the prefix C<c-> has an
-alternate spelling C<c > (without the hyphen), it would be very useful
+"c-word1" and "c-word2". If, however, the prefix "c-" has an
+alternate spelling "c " (without the hyphen), it would be very useful
to be able to automatically recognize that as well. This is where the
-C<nofinal> attribute for the B<match> statements comes in. If there is
-a match statement 'match "c " "c-" beginword nofinal', the replaced
+B<nofinal> attribute for the B<match> statements comes in. If there is
+a match statement C<'match "c " "c-" beginword nofinal'>, the replaced
chunk is B<not> marked as transliterated, so after executing this
-statement on the text C<c word1>, there will still only be one chunk,
-C<c-word1>, allowing for the regular word replacements to function
+statement on the text "c word1", there will still only be one chunk,
+"c-word1", allowing for the regular word replacements to function
properly.
Once all the replacement statements have been processed, each chunk
@@ -1525,8 +1525,8 @@ the next line.
These are the commands accepted in the configuration file.
Any parameters in square brackets are optional.
-The C<match>, C<matchignore>, and C<replace> commands are executed in
-the order they are specified, except that all C<replace> commands within
+The B<match>, B<matchignore>, and B<replace> commands are executed in
+the order they are specified, except that all B<replace> commands within
the same group are replaced together.
The B<match> and B<matchignore> statements accept any RegEx strings and
@@ -1538,10 +1538,12 @@ Any duplicate words found will cause the user to be prompted to choose
one option every time the word is replaced in the input text.
Note that any regex strings specified in the config should B<not>
-contain capture groups, as that would break the C<endword> functionality
+contain capture groups, as that would break the B<endword> functionality
since this is also implemented internally using capture groups. Capture
groups are also entirely pointless in the config since they currently
cannot be used as part of the replacement string in B<match> statements.
+Lookaheads and lookbehinds are fine, though, and could be useful in
+certain cases.
=over 8
@@ -1551,50 +1553,50 @@ Sets the RegEx string to be used for splitting words. This is only used
for splitting the words which couldn't be replaced after all replacement
has been done, before prompting the user for unknown words.
-Note that C<split> should probably always contain at least C<\n>, since
+Note that B<split> should probably always contain at least C<\n>, since
otherwise all of the newlines will be marked as unknown words. Usually,
this will be included anyways through C<\s>.
-Note also that C<split> should probably include the C<+> RegEx-quantifier
+Note also that B<split> should probably include the C<+> RegEx-quantifier
since that allows the splitting function in the end to ignore several
splitting characters right after each other (e.g. several spaces) in one
go instead of splitting the string again for every single one of them.
This shouldn't actually make any difference functionality-wise, though.
-B<Default:> \s (all whitespace)
+B<Default:> C<\s> (all whitespace)
=item B<beforeword> <regex string>
-Sets the RegEx string to be matched before a word if C<beginword> is set.
+Sets the RegEx string to be matched before a word if B<beginword> is set.
-B<Default:> \s
+B<Default:> C<\s>
=item B<afterword> <regex string>
-Sets the RegEx string to be matched after a word if C<endword> is set.
+Sets the RegEx string to be matched after a word if B<endword> is set.
-Note that C<afterword> should probably always contain at least C<\n>,
-since otherwise words with C<endword> set will not be matched at the
+Note that B<afterword> should probably always contain at least C<\n>,
+since otherwise words with B<endword> set will not be matched at the
end of a line.
-C<beforeword> and C<afterword> will often be exactly the same, but
+B<beforeword> and B<afterword> will often be exactly the same, but
they are left as separate options in case more fine-tuning is needed.
-B<Default:> \s
+B<Default:> C<\s>
=item B<tablesep> <string>
Sets the separator used to split the lines in the table files into the
original and replacement word.
-B<Default:> Tab
+B<Default:> C<Tab>
=item B<choicesep> <string>
Sets the separator used to split replacement words into multiple choices for
prompting the user.
-B<Default:> $
+B<Default:> C<$>
=item B<ignore> <filename>
@@ -1606,14 +1608,14 @@ add words to it from the unknown word window.
=item B<table> <table identifier> <filename>
Load the table from C<< <filename> >>, making it available for later use in the
-C<expand> and C<replace> commands using the identifier C<< <table identifier> >>.
+B<expand> and B<replace> commands using the identifier C<< <table identifier> >>.
Note that if C<< <filename> >> is not an absolute path, it is taken to be relative
to the location of the configuration file.
-The table files simply consist of C<tablesep>-separated values, with the word in the
+The table files simply consist of B<tablesep>-separated values, with the word in the
original script first and the replacement word second. The replacement word
-can optionally have several parts separated by C<choicesep>, which will cause the
+can optionally have several parts separated by B<choicesep>, which will cause the
user to be prompted to choose one of the options.
=item B<expand> <table identifier> <new table identifier> <word ending table> [noroot]
@@ -1623,17 +1625,17 @@ the word endings in C<< <word ending table> >>, saving the result as a table wit
identifier C<< <new table identifier> >>. If C<same> is specified as C<< <new table
identifier> >>, the original C<< <table identifier> >> is used instead.
-If C<noroot> is set, the root forms of the words are not kept.
+If B<noroot> is set, the root forms of the words are not kept.
-If the replacement for a word ending contains C<choicesep>, it is split and each part
+If the replacement for a word ending contains B<choicesep>, it is split and each part
is combined with the root form separately and the user is prompted to choose one of
the options later. it is thus possible to allow multiple choices for the ending if
there is a distinction in the replacement script but not in the source script.
Note that each of the root words is also split into its choices (if necessary)
-during the expanding, so it is possible to use C<choicesep> in both the endings
+during the expanding, so it is possible to use B<choicesep> in both the endings
and root words.
-Note that the paths of all tables used for <word ending table> are removed from the
+Note that the paths of all tables used for C<< <word ending table> >> are removed from the
list of paths that is later shown in the unknown word window. See L</"UNKNOWN WORD
WINDOW"> for details.
@@ -1641,32 +1643,32 @@ WINDOW"> for details.
Perform a RegEx match using the given C<< <regex string> >>, replacing it with
C<< <replacement string> >>. Note that the replacement cannot contain any RegEx
-(e.g. groups) in it. C<beginword> and C<endword> specify whether the match must
+(e.g. groups) in it. B<beginword> and B<endword> specify whether the match must
be at the beginning or ending of a word, respectively, using the RegEx specified
-in C<beforeword> and C<afterword>. If C<nofinal> is set, the string is not marked
+in B<beforeword> and B<afterword>. If B<nofinal> is set, the string is not marked
as transliterated after the replacement, allowing it to be modified by subsequent
-C<match> or C<replace> commands.
+B<match> or B<replace> commands.
=item B<matchignore> <regex string> [beginword] [endword]
-Performs a RegEx match in the same manner as C<match>, except that the original
+Performs a RegEx match in the same manner as B<match>, except that the original
match is used as the replacement instead of specifying a replacement string, i.e.
whatever is matched is just marked as transliterated without changing it.
=item B<group> [beginword] [endword]
-Begins a replacement group. All C<replace> commands must occur between C<group>
-and C<endgroup>, since they are then grouped together and replaced in one go.
-C<beginword> and C<endword> act in the same way as specified for C<match> and
-apply to all C<replace> statements in this group.
+Begins a replacement group. All B<replace> commands must occur between B<group>
+and B<endgroup>, since they are then grouped together and replaced in one go.
+B<beginword> and B<endword> act in the same way as specified for B<match> and
+apply to all B<replace> statements in this group.
=item B<replace> <table identifier>
Replace all words in the table with the identifier C<< <table identifier> >>,
-using the C<beginword> and C<endword> settings specified by the current group.
+using the B<beginword> and B<endword> settings specified by the current group.
-Note that a table must have been loaded (or generated using C<expand>)
-before being used in a C<replace> statement.
+Note that a table must have been loaded (or generated using B<expand>)
+before being used in a B<replace> statement.
=item B<endgroup>