

<sect> Rules <label id="rules">
<p>

In Epos, nearly all of the TTS processing is controlled by a rule file;
there is one rule file per language and it usually has the <tt>.rul</tt>
suffix.  The rule file for the German language, for instance, resides
by default in <tt>lng/german/german.rul</tt>. The rules may also slightly
vary for the individual voices using the <ref id="soft-options"
name="soft options">.

<sect1> Text Structure Representation overview <label id="tsr">
<p>

The text being processed by Epos is internally stored in a format
suitable for the application of transformational rules.  Every phonetic
unit (or an approximation of one) is represented by a single node in the
structure. The nodes are organized into layers corresponding to linguistic
levels of description, such that a unit of level <tt/n/ can list its
immediate constituents, that is units of level <tt/n-1/. Every layer
also has a symbolic name, which is used to refer to it in the rules.

The number and symbolic names of individual levels can be specified
with the <tt/unit_levels/ option before the languages are defined.
An example can be given:

<label id="tsr-levels">

<table loc="h"> 
<tabular ca="|l|l|l|">
<hline>
Level name | written TSR semantics | spoken TSR semantics @
<hline>
<tt/text/ |	the whole text	| the whole text @
<tt/sent/ |	sentence construction	| terminated utterance @
<tt/colon/|	sentence/clause/colon	| intonational unit @
<tt/word/ |	word	| stress unit	@
<tt/syll/ |	word	| syllable	@
<tt/phone/|	letter	| sound		@
<tt/segment/|		| segment	@
<hline>
</tabular>
<caption>
	Available Text Structure Representation layers (an example)
</caption>
</table>

Every unit, be it segmental level or not, may contain a character. The TSR,
as generated by the text parser, contains the appropriate punctuation
at suprasegmental levels (that is, levels except the phone level):
spaces at the word level, commas at the
intonational unit level, periods, question marks and such will
become the contents of a sentence (terminated utterance) level.
Some suprasegmental units will have no content, because they have
been delimited only implicitly; for example, a colon-final word
has been delimited by a comma, but the comma is actually a colon
level symbol: the last word will have no content.  This content
may be modified by the rules and actually, it often is.  This allows
marking up a unit for a later use (changing its content into an
arbitrary character, such as a digit or anything else, then applying
some rules only within units having this contents using a
<ref id="inside-rule" name="rule of type inside">.)

<sect1> Rule file syntax overview
<p>

The rules are applied sequentially, unless stated otherwise.
Each rule operates units of a certain level within a unit
of some other level; for instance, a rule may assimilate
phones within a word, another rule may change the syllabic
prosody within a colon.  The smaller units being manipulated
are called <em>target units</em>, the larger unit is referred
to as a <em>scope unit</em>; the respective levels are
called <em>scope</em> and <em>target</em>.  Each scope unit
is always processed separately (from any other scope units)
as if no other text ever existed.  For example, if the scope of some
assimilation happens to be "word", every word will have the rule 
applied in isolation and the assimilation will never apply across
the word boundary.  

Any line of the rules file may contain at most one rule and
possibly some comment.
The rule begins with an operation code specifier (what to do),
followed by the parameter (one word, opcode specific), and possibly
by scope and target specification, if the defaults (usually word
and phone, respectively) are not suitable.

The scope and the target can be one of the <ref id="tsr-levels"
name="available levels of linguistic description"> as defined
with the <tt/unit_levels/ option.  If target or even scope
for a rule is not specified, the <tt/default_target/ or
<tt/default_scope/ option value, respectively, will be used.
The typical defaults are <tt/phone/ and <tt/word/, respectively.

Every rule is evaluated within certain unit, and the scope specifies,
what kind of unit it should be.  
The meaning of the target is somewhat opcode specific,
but generally, this is the level which is affected by that rule.
See the individual rule descriptions in this section
in conjunction with the real world rule files for exact interpretation
of the target level.

The code, scope and target identifier is not case sensitive, but the parameter
usually is.


<sect1> Using comments and the <tt>@include</tt> directive
<p>

Any text starting with a semicolon or <tt/&num;/ not in the middle of a word up to the
end of the line is a comment.  It will be properly ignored.  If a line
doesn't contain anything except whitespace and/or comment, it is also
ignored.  The <tt>@include</tt> directive can be used to nest the rule
files.  The same rules apply within <tt>.ini</tt> files; for more
details, see <ref id="include-directive" name="the @include directive in
configuration files">.

<sect1> Variables (macros)
<p>

A line, which doesn't contain a rule, may contain a <em/macro definition/ instead.
It is specified as <tt>identifier = replacement</tt>, for example,
<tscreen>
<verb>
$vowel = aeiouy
</verb>
</tscreen>
Alternatively, the keyword <tt>external</tt> may follow an identifier instead of
the equality sign and the replacement:
<tscreen>
<verb>
$some_pathname	external
</verb>
</tscreen>

Here the identifier is assigned the value of its corresponding
configuration parameter (for the current voice or current language if possible).

The macros will get expanded anywhere where they occur except for their own
point of definition.  Therefore, <tt>&dollar;vowel  &dollar;short&dollar;long</tt> will be a valid macro
definition, provided that <tt>&dollar;short</tt> and <tt>&dollar;long</tt> have already been defined.  The
expansion is performed at the definition time and it is not iterated, because
the replacement is not expected to contain the dollar sign.

Macros can later be redefined if you wish and they can be local to a block
of rules as described below.

If there be any uncertainty concerning the exact length of the identifier,
you can use braces to embrace it: <tt>&dollar;{name}</tt> is usually equal to <tt>&dollar;name</tt>, but
<tt>&dollar;nameaeiou</tt> is not equal to <tt>&dollar;{name}aeiou</tt>.  It is also possible to use
a colon or an ampersand an a delimiter: <tt>&dollar;name&amp;aeiou</tt>.

For an abundance of examples see existing rule files.




<sect1> Dictionary-oriented rules <label id="dictionary">
<p>

The rule types described in this subsection operate in some way
on a list of words (or other strings), which can range from a few items
up to machine-generated megabytes of data.  These strings are usually listed
in a separate file, while the parameter of such a rule is the file name.
Alternatively, the strings can be quoted inside the rule file, especially
if only a few ones are listed.  Such a collection of strings
is called a <em/dictionary/ and obeys the same format for any rule type
which needs external data.

The dictionary consists of multiple lines, each of which contains a single
dictionary item.  An item consists of two whitespace separated words,
the former being the item itself, the latter being some string associated with
the item.  Often, the second string is used to replace every occurrence of the
first string in the text being processed.  That's why the strings are called
<em/replacee/ and <em/replacer/, respectively.  The order of dictionary items
is not significant. <footnote> We use adaptive hash tables -- and balanced
AVL trees for collisions 
-- for representation of the dictionary in memory to achieve instant lookups of
any item, even in a huge dictionary.</footnote>

The replacee cannot contain whitespace (unless escaped with a backslash),
but the replacer can.  That is, if more than two words are found on a line,
the first one becomes replacee and the rest of the line, except for
post-replacee and trailing whitespace, becomes the replacer.  However, some
rule types may not allow multiple word replacers.

In addition to or instead of a dictionary item, a dictionary line may
contain whitespace and comments.  The comment begins with a semicolon 
or hash mark, which is preceded with whitespace or located at the
beginning of line, and lasts up to the end of line.
A preceding backslash can be used to escape special characters (to interpret
them literally).

Instead of a file name reference, it is possible to quote the contents
of the dictionary directly; this is done by encapsulating the contents
in double quotes.  Dictionary items are whitespace-separated, the replacer
and replacee are separated with a comma.

The dictionary may either be parsed and loaded into memory at Epos startup
or at the moment of the first use.  The former option's advantage is
early error reporting, while the latter can sometimes completely avoid
loading a huge unused dictionary.  Use the option <tt/paranoid/ to choose
your preference.

<sect2> Type <tt>subst</tt>
<p>

  Substring substitution.  The replacers replace every occurrence
  of their respective replacees; longer matches are matched first; the
  process is iterated until no replacee occurs in the string.  It is required
  either to have a <tt>phone</tt> target, or to keep all the replacers
  and replacees of the same length, because it is not obvious how to handle the
  children of the units affected.  Note also that to be considered a match
  with the <tt>phone</tt> target, all characters other than phones also
  have to match (must be found or not found on the same positions in both
  the replacee and the occurrence in question) except for the terminating
  scope-level separator (if any), which is invisible to this rule type.

  Any replacer may begin with a ^ or end with a &dollar;. That forces
  the substring being replaced to be at the beginning or the end
  of the scope unit, respectively.

  The replacer should not contain units of the scope level or higher.
  Unless the <tt/paranoid/ option is set, this is tolerated, but the
  replacer is truncated at the first such character.

  With the <tt>phone</tt> target, this rule type will drop the
  internal structure of the replaced text as soon as a match is found.
  In other words: an affected scope unit with a replacer is parsed as
  any other plain text.

  Infinitely looping substitutions are currently reported as an error
  condition.

<sect2> Type <tt>prep</tt>
<p>

  Preposition.  If the scope unit is identical to some replacee,
  it gets replaced with its respective replacer and merged to its right-hand
  neighbor.  If there is no such neighbor, nothing happens.  Again, the target
  must currently be <tt>phone</tt> or all the replacers of sizes corresponding
  to their respective replacees.

<sect2> Type <tt>postp</tt>
<p>

  Postposition.  See type <tt>prep</tt>, but the resultant unit is merged to its
  left-hand neighbor.

<sect2> Type <tt>prosody</tt>
<p>
 
  This rule type is a prosody modeling rule which uses a dictionary
  of prosodic adjustments to be applied.  <ref id="prosody-rule"
  name="More details below">.


<sect2> Type <tt>segments</tt>
<p>

  Setup the segment layer below the phone layer.
  The parameter names a file, which contains
  phone to segment mappings, again in the dictionary format
  The replacees represent three character
  segment identifiers, the replacers are the respective segment
  numbers (decimal).
  It is possible, and indeed typical to include multiple identifiers
  for the same segment number.

  The middle character denotes the phone the resulting segment will
  be assigned to.  The left hand and right hand characters may either
  be a question mark, or they may specify the right hand and/or left
  hand neighbors to match a specific character.  The question mark is
  therefore a kind of wildcard.

  If both fully specified and partly specified segments exist for
  a given triplet of phones, they will be placed from left to right
  in this order: <tt/lt?, ?t?, ?tr, ltr/.

A sentence may contain these segments with the Czech segment inventory
by Tomas Dubeda:

<tscreen><verb>
    p       l       o       u       t       e       f
  0p?   pl? ?lo   ?o? ou?  ?u?    ut? ?te  ?e?    ef? ?f0
</verb></tscreen>
  
or, with the traditional Czech segment inventory:

<tscreen><verb>
    p       l       o       u       t       e       f
 0p? ?pl  pl? ?lo   ?o? ou?  ?u?    ut? ?te  ?e?    ef? ?f0
</verb></tscreen>
 
(In this second example, for instance the diphones <tt/?pl/
and <tt/?pt/ would actually share the segment number and
would correspond to the <tt/p-any consonant/ diphone.)   
  
There are more possibilities for representing a segment
inventory; it is necessary to decide for the major diphone
types, whether they should live in their initial or
final sound.  That is unfortunate, but it is the way it is.
Punctuation never plays a role of segments in Epos, and
the same is true here.

It is possible to repeat a segment a few times.  This effect
can be controlled by adding 10000 times the number of extra
repetitions to the segment number.  Therefore, 
<tscreen><verb>
?e?	20241
</verb></tscreen>
generates three identical segments number 241 for the stationary
part of the specified vowel.

<sect2> Type <tt>with</tt>
<p>

This is actually a conditional rule, though it also uses
a dictionary.  It applies an arbitrary rule upon the units
(words) listed in the dictionary.  <ref id="with-rule"
name="More details below">.

<sect1> Contentual rules
<p>

The contentual rules manipulate unit contents.  That is, they're suitable
for implementation of more regular letter-to-sound rules, character replacement
and other transformations.  They are a magnitude faster than e.g. the more
general <tt/subst/ rule, so they should be used whenever possible.

<sect2> Type <tt>regress</tt>
<p>

  Assimilation, elision or other mutation of phones or other units
  depending on their immediate environment.  The parameter is of the form
  <tt>o&gt;n(l&lowbar;r)</tt>, where o,n,l,r are arbitrary strings.  The semantic is "change tokens
  in <tt>o</tt> to their corresponding tokens in <tt>n</tt>
  whenever the left neighbor is in <tt>l</tt>
  and right one is in <tt>r</tt>".  The first two strings should therefore either be of
  equal length, or <tt>n</tt> should be a single character, with the obvious
  interpretations of "corresponding".

  The zero character (<tt>0</tt>0) may be included in any of the strings; it means
  "no element", and it can be used to insert new units, delete the old ones,
  and to limit the change to the beginning or the end of the scope unit,
  respectively.  On the other hand, if the contents of some unit is literal <tt>0</tt>
  before the application of this rule, it should, in theory, stay untouched.  Use e.g.
  <tt>regex</tt> or <tt>subst</tt> rules to handle this case properly, as Epos presently
  confuses no-element zeroes and real zero characters.

  Examples:   
<tscreen><verb>
	regress  0>'(0_aeiou)  word  phone
</verb></tscreen>
  inserts the apostrophe before the vowels listed at the beginning of a word.
<tscreen><verb>
	regress  $voiceless>$voiced(!_$voiced)  word  phone
</verb></tscreen>
  assimilates voiceless consonants to their voiced counterparts (assuming
  <tt>&dollar;voiced</tt> and <tt>&dollar;voiceless</tt> have been defined previously), when they're followed
  by a voiced consonant.  The change proceeds from the right to the left,
  therefore <tt>ppb</tt> will change to <tt>bbb</tt>.  See <ref id="except" name="below">
  for the explanation of the exclamation mark (here: "everywhere").

<sect2> Type <tt>progress</tt>
<p>

  As above, but the change proceeds from left to right.  In the second
  example for the <tt>regress</tt> rule, the result would be <tt>pbb</tt>
  if <tt>progress</tt> was employed.

<sect1>Structural rules
<p>

The structural rules can be used to restructuralize the text.  They usually interact
with multiple levels of description simultaneously.

<sect2> Type <tt>raise</tt>
<p>

  Move a unit to another level of description, e.g. when a segment
  level unit should directly affect the prosody.  The parameter is of the form
  <tt>from:to</tt> (<tt>from</tt> and <tt>to</tt> are arbitrary strings,
  they can employ the "except" operator (exclamation mark).  The tokens
  in <tt>from</tt>, if found at the target
  level, are copied to the scope level, if the original scope token is listed
  in <tt>to</tt>.  It is also possible to omit the colon and the <tt>to</tt> string; the default
  interpretation is "everywhere".

<sect2> Type <tt>syll</tt>
<p>

  Roughly speaking, this rule type can be used to split words to
  syllables according to the theory of sonority, i.e. at the least sonorous
  phones.  More generally, it could be used to do any sort of inserting unit
  boundaries depending on local values of a simple metric.

  The parameter is an ordering of the target units (typically, phones).

Example:
<tscreen><verb>
	syll  0<ptkf<bdgv<mnN<lry<aeiou"  syll  phone
</verb></tscreen>
  inserts the following (and other) syllable boundaries:
<tscreen><verb>
  a|pa  ap|pa  ap|ppppa  arp|pa  ar|pra  a|pr|pa
</verb></tscreen>

  Tokens not listed are considered least sonorous, order of tokens within
  the same sonority group (see the example) is irrelevant.

  As you can see from the example, the syllable boundaries are inserted
  exactly once per every sequence of equivalent target units (e.g. equisonorous
  phones) such that both preceding and following target units of the group
  have higher sonority, and they're inserted either between the first and second
  element of the group, or, if the group consists of a single unit, before
  that unit.

  This semantics is suitable for the syllabification task in all languages
  known to us where syllabification is not primarily morphologically based,
  but this rule type can also be used for other tasks involving a unit split
  as some point defined by its contents, e.g. splitting a higher level
  prosodic unit before or after certain words.  The authors are eager to
  hear from you if you'd prefer an extension or simplification of this rule
  type or if you can comment on syllabification issues over a wide range
  of languages.


<sect1> Prosody modeling rules <label id="prosody">
<p>

The utterance prosody is modeled in Epos by assigning
values for the following prosodic quantities of individual text
structure units (possibly at multiple levels of description):

<itemize>
<item> pitch (fundamental frequency)
<item> volume (intensity) and
<item> duration (time factor)
</itemize>

Currently, these are values per cent, 100 being the neutral
value.  <footnote> Epos doesn't currently provide sets
of segment inventories for multiple pitch ranges, therefore
extreme values, such as 15 or 1500 may sound very unnatural.
</footnote>  The prosody adjustments at different levels
sum up for the actual values assigned to the generated
segments.  For example, a phone with the frequency (pitch)
value of 130 in a word with the value of 120 will contain
segments (after the <tt/segments/ rule is applied) with
frequency of 150.  Alternatively, it is possible to multiply
the values for pitch, volume and duration instead, by setting the
<tt/pros_eff_multiply_f/, <tt/pros_eff_multiply_i/ and
<tt/pros_eff_multiply_t/ options, respectively.
It is also possible to change the neutral value of 100
to a different base value with the <tt/f_neutral/, <tt/i_neutral/
and <tt/t_neutral/ options.

<sect2> Type <tt>contour</tt> <label id="contour-rule">
<p>

This rule assigns a specified prosody contour to units at some level
of description within a unit which consists of them.  For example,
the rule can be used to assign pitch contours to stress units;
individual values will probably be assigned to syllables.

The parameter describes a single prosody contour.  The first letter
denotes the prosodic quantity (frequency, intensity or duration)
to be specified; the second is a slash; the adjustments follow
as colon-separated decimal integers.  For an example,
<tscreen><verb>
	contour   f/+2:+0:-2   word   syll
</verb></tscreen>
assigns a falling pitch contour to a trisyllabic word.  The number
of syllables in a word, or, more generally, of the target units
in a scope unit, must match the number of adjustments specified
in a <tt/contour/ rule, otherwise an error occurs; consider
the <ref id="counters" name="length-based selection of rules">
to ensure that.  As an exception to that, it is possible
to specify padding in the contour.  At most one
adjustment may be immediately followed by an asterisk.  This
adjustment will be used for zero or more consecutive target
units as necessary to stretch the contour over the scope unit.


<sect2> Type <tt>prosody</tt> <label id="prosody-rule">
<p>

  Individual prosodic feature generation.  (See also
  <ref id="contour-rule" name="the contour rule"> for assigning
  whole contours more conveniently.)

   Typically, there will be many instances of this rule in the rules
   file, each of which using a different configuration file for
   different purpose (e.g. one may handle word stress, another
   one sentence-final melody of wh- questions, another one semantic
   emphasis corresponding to an exclamation mark).  The parameter
   of a PROSODY rule is the name of a file formatted as a dictionary
   (see <ref id="dictionary" name="dictionary-oriented rules">)
   and is further specified here.

   Each prosodic adjustment occupies one line; it affects exactly one
   of frequency, intensity and duration (T, I, or F, respectively)
   of units positioned among others as specified.  Their ordering 
   is insignificant, because each of them affects different
   units or a different quantity of them.

   The structure of an adjustment is very simple, so let's just
   pick an example: <tt>i/3:4   -20</tt>.  The first letter must be one
   of T, I, F and specifies the quantity that may be adjusted;
   the first number specified denotes the position within a unit 
   whose length is to be equal to the second number: here, the
   rule applies at every third syllable of every tetrasyllable, 
   provided that the target of the rule is syllable, while
   the scope is word (this is specified in the rules file as 
   usual, not in the prosody file).  The last number, separated 
   by whitespace, is the intensity adjustment to be added 
   everywhere this specification applies.  It is an integer value.

   It is also possible to have an adjustment applied for any
   length of the scope unit (in the example above, for words
   of any number of syllables.  To do this, use "*" as the
   second number of the adjustment.  Also, it may make sense
   to count the target unit starting at the end of the scope
   unit; in this case append the word "last" to the first number.
   An example could be <tt>f/1last:*  -30</tt>, or "drop the pitch by 30
   for last syllables of every word".  Consequently, at most three
   distinct rules may affect a unit; if that happens, only one is
   chosen -- the more specific one, or, if both contain the
   asterisk, the one counting from the beginning is chosen.
   An example, in order of decreasing precedence:

<tscreen><verb>
   t/1:2     +30
   t/1:*     +20
   t/2last:*  +5
</verb></tscreen>

   You can therefore override general adjustments with exceptions
   for some lengths which have to be handled separately.

   If multiple prosodic rules (using their own files) supply
   adjustments for a certain unit, the adjustments are summed.

   It is important to understand the difference between
   e.g. a syllable and its phones: the syllable can have an entirely
   different prosodic value than its phones; for every given segment,
   the value for any prosodic quantity is obtained by totalling
   the values for all of higher levels units it is contained in.
   This independence of levels of description might theoretically
   be useful for modeling tone languages.

<sect2> Type <tt>smooth</tt>
<p>

  Smoothing out of one of the F,I,T quantities.  The parameter is
<tscreen><verb>
  quantity/left&lowbar;weights/base&lowbar;weight&bsol;right&lowbar;weights
</verb></tscreen>
  where the <tt>left&lowbar;weights</tt>,
  if there are multiple ones, shall be slash separated, the <tt>right&lowbar;weights</tt> shall
  be backslash separated.  The new value of the quantity specified for any
  target is computed as a weighted average of the values for the surrounding
  units at the same level.  If the target is too near to the scope boundary
  to have enough neighbors in some direction, the value for the last unit
  in that direction instead.

  Example:
<tscreen><verb>
	smooth  i/10/20/40&bsol;20&bsol;10  word  syll
</verb></tscreen>
  applied to the second word <tt>un-ne-ce-ssa-ry</tt> will adjust intensity values
  for all of the syllables.  E.g. the second syllable will be computed as
  0.3 x i("<tt>un</tt>") + 0.4 x i("<tt>ne</tt>") + 0.2 x i("<tt>ce</tt>") + 0.1 x i("<tt>ssa</tt>")

  The computations for different units do not interfere.  The weights can
  also be specified as negative quantities and/or as sums of more values.
  This permits linear parameterization of the rules.
  
  The <tt/smooth/ rule has also an unavoidable side effect. If (some of)
  the prosodic adjustments are assigned at the word level, for example, and smoothing
  should take place at the syllable level, it is first necessary to move
  the prosodic information down to the syllable level. It is done by adding
  the quantity found at the word level to every contained syllable and by
  removing it from the word level at all.  The <tt>unit::project</tt> method
  is responsible for that; it is called before the actual smoothing.
  Prosodic adjustments existing at lower levels than is the one being smoothened
  are ignored by the <tt/smooth/ rule.

<sect1> Composite rules <label id="composites">
<p>

Multiple rules are occasionally necessary where there are syntactical
placeholders for a single rule only.  Or, several rules have to be
grouped in a certain way -- for example, when one rule has to be chosen
nondeterministically out of a set of rules.  To satisfy these needs, Epos
rules include three types of composite rules with different semantics.
A composite rule is syntactically treated a single rule for any purpose.

<sect2> Blocks of rules
<p>

A block is a sequence of rules enclosed within braces ("<tt>{</tt>" and "<tt>}</tt>").
Both the opening and the closing brace follow the rule syntax, but
they take no parameters except for an optional scope specification.
The block is treated as a single rule, which is useful with conditional
rules like

<tscreen><verb>
if   condition
{
	do   this
	do   that
}
</verb></tscreen>

The rules are applied sequentially, as you would expect, for every
unit of the proper size as given by the scope of the opening brace.
This means that every word (if the scope is <tt>word</tt>) is processed
separately throughout all the rules in the block.  This involves
some splitting of execution on entering the block.  By default, no
such splitting is done and the block inherits its scope from its
master rule (a conditional rule, a block it is encapsulated in,
or the global implicit block which covers all the rules altogether).
Consequently, the scope of any enclosed rule may not be larger
than the scope of the block.

Any macros defined in the block are local to the block.  The semantic
details are C-like and are by no means important.


<sect2> Choices of rules
<p>

A choice is a sequence of rules enclosed within brackets ("<tt>[</tt>" and "<tt>]</tt>").
Both the opening and the closing bracket follow the rule syntax, but
they take no parameters except for possible scope specification.
The choice is treated as a single rule.

Whenever the choice is applied, one of its subordinate rules is chosen
nondeterministically for every unit of the proper size as given by the
scope of the opening brace, and only this rule is applied.

Generally, choices behave like blocks; the main difference is that with
blocks, all of the rules are applied, whereas with choices, exactly one
of them gets applied (possibly different rules for different pieces of
the text processed).

Empty choices (with no rules within) are not tolerated, contrary
to empty blocks.


<sect2> Length-based selection of rules <label id="counters">
<p>

A (length-based) switch is a sequence of rules enclosed within angle
brackets ("<tt>&lt;</tt>" and "<tt>&gt;</tt>").  Both the opening and the closing bracket follow
the rule syntax, but they take no parameters except for possible scope
and target specification.  The switch is treated as a single rule.

Whenever the switch is applied to a scope unit, target units contained
within are counted.  If <tt>n</tt> units are found, the <tt>n</tt>-th rule in sequence of
the subordinate rule is applied. 

If there is less than <tt>n</tt> rules available, the last one will be used.
You can avoid this behavior by specifying "nothing" after the last rule.


<sect2> Repeated rules and choice probabilities
<p>

Write "<tt>3x</tt>" before a rule to repeat it three times (in a block)
or to make it three times more probable (in a choice):

<tscreen><verb>
[
	3x prosody		typical.dic
	   prosody		variant.dic
]
</verb></tscreen>

(The first alternative now has 75% of being chosen, while
the other one is left for the remaining 25%.)

The repeat count must be a positive integer.  You can not use
this feature just after conditional rules, because repeated
rules are not counted as a single rule for syntactic purposes:
	
<tscreen><verb>
	if  $something
		2x   regress   0>x(!_!)	  #...wrong!
</verb></tscreen>
		
You should rewrite this to

<tscreen><verb>
	if  $something
	{
		2x   regress   0>x(!_!)
	}
</verb></tscreen>

Huge integers (like one million) are disallowed.  This is because
the current implementation needs a few bytes of memory (one pointer)
per every repetition.



<sect1> Conditional rules
<p>

The conditional rules execute the following rule if and only if
a condition is met.  The condition is specified as the parameter,
the following (conditioned) rule is given on a separate line
(or lines, if a <ref id="composites" name="composite rule">
follows).  (Comments, whitespace and empty lines may intervene as
usual.) It is not syntactically necessary to indent the conditioned
rules with whitespace, but it is strongly recommended for readability.

<sect2> Type <tt>inside</tt> <label id="inside-rule">
<p>

  Apply a rule or a block of rules within certain units only.
  The parameter is a list of values at the scope level, wherein the
  following rule should be applied; the "except" operator may be used.

  Every unit (a sentence, for example), which fulfills the criterion,
  is processed separately, therefore the scope of the following rule may
  be at most that of the <tt>inside</tt> rule itself.

<sect2> Type <tt>near</tt>
<p>

  Apply a rule or a block of rules within units which contain
  at least one of the specified units.  The parameter is a list
  of values at the target level, which are looked up in a unit
  of the scope level; the <ref id="except" name="except">
  operator may be used.  If an occurence is found, the following
  rule gets applied to the scope level unit.

  If the parameter begins with an asterisk, the asterisk is treated
  as an except operator and the test is negated.  In other words,
  the following rule gets applied, if every target level unit contained
  meets the set description with the leading asterisk ignored.
  You can combine asterisk and an extra except operator to get tests of
  the "contains no characters of this class" type.

<sect2> Type <tt>with</tt> <label id="with-rule">
<p>

  Apply a rule or a block of rules for a list of units.
  In contrast with the preceding rule type, this refers not only
  to the token at the scope level (such as space), but to the whole
  structure (such as the string of phones delimited by the space).
  
  The parameter is a filename; the file should list the strings
  subject to the following rule, such as special words.

<sect2> Type <tt>if</tt>
<p>

  Apply a rule or a block of rules only if a condition (given
  by the parameter) is met.  The condition must currently be specified
  as a boolean voice configuration option (possibly a soft option)
  or its negation (i.e. prefixed with an exclamation mark).

  Example:
<tscreen><verb>
if   !colloquial
{
	...
}
</verb></tscreen>

  The rules within the block will be applied only if the colloquial
  option is <em>not</em> set.

  This <tt>if</tt> rule inherits its scope from its parent rule
  if not specified explicitly.

  Again, the scope of a subordinate rule may not be larger than that of
  the <tt>if</tt> rule itself.


<sect1> Special rules

<sect2> Type <tt>regex</tt>
<p>

  Regular expression substitution.  The parameter is of the form
  <tt>/regular&lowbar;expression/replacement/</tt>.  This rule type is similar to <tt>subst</tt>
  with only one dictionary item, but it is way more powerful and more arcane;
  its use is not intended for end wizards nor trivial tasks.
  For a regular expressions' overview, UNIX users can consult e.g. the <tt>grep</tt>
  manual page, whereas Windows users can telnet to a nearby UNIX machine and
  write <tt>man grep</tt> there.

  Epos uses the extended regular expression syntax with the following difference:
  in "regular" regular expressions, parentheses match themselves, while
  the open group and close group operators are <tt>&bsol;(</tt> and <tt>&bsol;)</tt>, respectively.
  As we use groups heavily and next to none real parentheses, we decided
  to do it the other way round.  Also, <tt>sed</tt> users may be surprised
  by the iterative behavior of the <tt/regex/ rule type in Epos.
  
  The replacement may contain escape sequences referring to the match of
  the <tt>n</tt>-th group within the regular expression: <tt>&bsol;1</tt> to <tt>&bsol;9</tt>.
  <tt>&bsol;0</tt> represents the entire match, but this is probably unusable under the
  current design, as this would cause an infinite substitution loop.
  
  In order to use this type of rule, you need to have the <tt>rx</tt> or <tt>regex</tt>
  library already installed and have <tt>WANT&lowbar;REGEX</tt> enabled in <tt>common.h</tt>.
  This is because we don't actually implement the regex parsing stuff; we leave it
  to your OS libraries.  In case you don't have such libraries installed, we use
  the glibc implementation (<tt>rx.c</tt> in the Epos distribution).

  Note that if your system doesn't support locale setting nor provides
  a usable regex library, you can't use named character classes such
  as <tt>[:upper:]</tt> in your regular expressions.  This is the case
  on Windows CE.

<sect2> Type <tt>debug</tt> <label id="debug-rule">
<p>

  Debugging information during the application of the rules.
  Scope and target are ignored, the parameter is parsed lazily.

  Parameter "<tt>elem</tt>": dump the current state of the text being processed
  Parameter "<tt>pause</tt>": wait until keypress



<sect1> The "except" (<tt>!</tt>) operator <label id="except">
<p>

Whenever an unordered list of tokens should be specified within the parameter
to some rule (use common sense and/or individual rule descriptions above),
you can also make negative specifications, such as "all consonants except
l and r".  To do this, use the exclamation mark serving as an "except" operator:
<tt>&dollar;consonants!lr</tt> (The right operand is subtracted from the left one.)
If there is no left operand, say in <tt>!x</tt>, the semantics is "all but x".
You see that <tt>!</tt> alone means "everything".

The operator is right-associative; <tt>!&dollar;vowels!ou</tt> means "all excluding vowels,
but <tt>o</tt> and <tt>u</tt> don't count as vowels just now".

Let us repeat that this operator will never work for ordered lists, not even
for the <tt>syll</tt> rule sonority groups.


<sect1> Escaping special characters
<p>

You can use the backslash to escape any special character including itself
anywhere in the rules or <tt>.ini</tt> file strings.
<tt/&bsol;n/, <tt/&bsol;t/, <tt/&bsol;&lsqb;/ may be used
to insert a newline, tab, or escape characters, respectively.

