Rules

In Epos, nearly all of the TTS processing is controlled by a rule file; there is one rule file per language and it usually has the .rul suffix. The rule file for the German language, for instance, resides by default in lng/german/german.rul. The rules may also slightly vary for the individual voices using the . Text Structure Representation overview

The text being processed by Epos is internally stored in a format suitable for the application of transformational rules. Every phonetic unit (or an approximation of one) is represented by a single node in the structure. The nodes are organized into layers corresponding to linguistic levels of description, such that a unit of level Level name | written TSR semantics | spoken TSR semantics @ Available Text Structure Representation layers (an example) Every unit, be it segmental level or not, may contain a character. The TSR, as generated by the text parser, contains the appropriate punctuation at suprasegmental levels (that is, levels except the phone level): spaces at the word level, commas at the intonational unit level, periods, question marks and such will become the contents of a sentence (terminated utterance) level. Some suprasegmental units will have no content, because they have been delimited only implicitly; for example, a colon-final word has been delimited by a comma, but the comma is actually a colon level symbol: the last word will have no content. This content may be modified by the rules and actually, it often is. This allows marking up a unit for a later use (changing its content into an arbitrary character, such as a digit or anything else, then applying some rules only within units having this contents using a .) Rule file syntax overview

The rules are applied sequentially, unless stated otherwise. Each rule operates units of a certain level within a unit of some other level; for instance, a rule may assimilate phones within a word, another rule may change the syllabic prosody within a colon. The smaller units being manipulated are called target units, the larger unit is referred to as a scope unit; the respective levels are called scope and target. Each scope unit is always processed separately (from any other scope units) as if no other text ever existed. For example, if the scope of some assimilation happens to be "word", every word will have the rule applied in isolation and the assimilation will never apply across the word boundary. Any line of the rules file may contain at most one rule and possibly some comment. The rule begins with an operation code specifier (what to do), followed by the parameter (one word, opcode specific), and possibly by scope and target specification, if the defaults (usually word and phone, respectively) are not suitable. The scope and the target can be one of the as defined with the Using comments and the @include directive

Any text starting with a semicolon or @include directive can be used to nest the rule files. The same rules apply within .ini files; for more details, see . Variables (macros)

A line, which doesn't contain a rule, may contain a identifier = replacement, for example, $vowel = aeiouy Alternatively, the keyword external may follow an identifier instead of the equality sign and the replacement: $some_pathname external Here the identifier is assigned the value of its corresponding configuration parameter (for the current voice or current language if possible). The macros will get expanded anywhere where they occur except for their own point of definition. Therefore, $vowel $short$long will be a valid macro definition, provided that $short and $long have already been defined. The expansion is performed at the definition time and it is not iterated, because the replacement is not expected to contain the dollar sign. Macros can later be redefined if you wish and they can be local to a block of rules as described below. If there be any uncertainty concerning the exact length of the identifier, you can use braces to embrace it: ${name} is usually equal to $name, but $nameaeiou is not equal to ${name}aeiou. It is also possible to use a colon or an ampersand an a delimiter: $name&aeiou. For an abundance of examples see existing rule files. Dictionary-oriented rules

The rule types described in this subsection operate in some way on a list of words (or other strings), which can range from a few items up to machine-generated megabytes of data. These strings are usually listed in a separate file, while the parameter of such a rule is the file name. Alternatively, the strings can be quoted inside the rule file, especially if only a few ones are listed. Such a collection of strings is called a We use adaptive hash tables -- and balanced AVL trees for collisions -- for representation of the dictionary in memory to achieve instant lookups of any item, even in a huge dictionary. The replacee cannot contain whitespace (unless escaped with a backslash), but the replacer can. That is, if more than two words are found on a line, the first one becomes replacee and the rest of the line, except for post-replacee and trailing whitespace, becomes the replacer. However, some rule types may not allow multiple word replacers. In addition to or instead of a dictionary item, a dictionary line may contain whitespace and comments. The comment begins with a semicolon or hash mark, which is preceded with whitespace or located at the beginning of line, and lasts up to the end of line. A preceding backslash can be used to escape special characters (to interpret them literally). Instead of a file name reference, it is possible to quote the contents of the dictionary directly; this is done by encapsulating the contents in double quotes. Dictionary items are whitespace-separated, the replacer and replacee are separated with a comma. The dictionary may either be parsed and loaded into memory at Epos startup or at the moment of the first use. The former option's advantage is early error reporting, while the latter can sometimes completely avoid loading a huge unused dictionary. Use the option Type subst

Substring substitution. The replacers replace every occurrence of their respective replacees; longer matches are matched first; the process is iterated until no replacee occurs in the string. It is required either to have a phone target, or to keep all the replacers and replacees of the same length, because it is not obvious how to handle the children of the units affected. Note also that to be considered a match with the phone target, all characters other than phones also have to match (must be found or not found on the same positions in both the replacee and the occurrence in question) except for the terminating scope-level separator (if any), which is invisible to this rule type. Any replacer may begin with a ^ or end with a $. That forces the substring being replaced to be at the beginning or the end of the scope unit, respectively. The replacer should not contain units of the scope level or higher. Unless the phone target, this rule type will drop the internal structure of the replaced text as soon as a match is found. In other words: an affected scope unit with a replacer is parsed as any other plain text. Infinitely looping substitutions are currently reported as an error condition. Type prep

Preposition. If the scope unit is identical to some replacee, it gets replaced with its respective replacer and merged to its right-hand neighbor. If there is no such neighbor, nothing happens. Again, the target must currently be phone or all the replacers of sizes corresponding to their respective replacees. Type postp

Postposition. See type prep, but the resultant unit is merged to its left-hand neighbor. Type prosody

This rule type is a prosody modeling rule which uses a dictionary of prosodic adjustments to be applied. . Type segments

Setup the segment layer below the phone layer. The parameter names a file, which contains phone to segment mappings, again in the dictionary format The replacees represent three character segment identifiers, the replacers are the respective segment numbers (decimal). It is possible, and indeed typical to include multiple identifiers for the same segment number. The middle character denotes the phone the resulting segment will be assigned to. The left hand and right hand characters may either be a question mark, or they may specify the right hand and/or left hand neighbors to match a specific character. The question mark is therefore a kind of wildcard. If both fully specified and partly specified segments exist for a given triplet of phones, they will be placed from left to right in this order: p l o u t e f 0p? pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0 or, with the traditional Czech segment inventory: p l o u t e f 0p? ?pl pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0 (In this second example, for instance the diphones ?e? 20241 generates three identical segments number 241 for the stationary part of the specified vowel. Type with

This is actually a conditional rule, though it also uses a dictionary. It applies an arbitrary rule upon the units (words) listed in the dictionary. . Contentual rules

The contentual rules manipulate unit contents. That is, they're suitable for implementation of more regular letter-to-sound rules, character replacement and other transformations. They are a magnitude faster than e.g. the more general Type regress

Assimilation, elision or other mutation of phones or other units depending on their immediate environment. The parameter is of the form o>n(l_r), where o,n,l,r are arbitrary strings. The semantic is "change tokens in o to their corresponding tokens in n whenever the left neighbor is in l and right one is in r". The first two strings should therefore either be of equal length, or n should be a single character, with the obvious interpretations of "corresponding". The zero character (00) may be included in any of the strings; it means "no element", and it can be used to insert new units, delete the old ones, and to limit the change to the beginning or the end of the scope unit, respectively. On the other hand, if the contents of some unit is literal 0 before the application of this rule, it should, in theory, stay untouched. Use e.g. regex or subst rules to handle this case properly, as Epos presently confuses no-element zeroes and real zero characters. Examples: regress 0>'(0_aeiou) word phone inserts the apostrophe before the vowels listed at the beginning of a word. regress $voiceless>$voiced(!_$voiced) word phone assimilates voiceless consonants to their voiced counterparts (assuming $voiced and $voiceless have been defined previously), when they're followed by a voiced consonant. The change proceeds from the right to the left, therefore ppb will change to bbb. See for the explanation of the exclamation mark (here: "everywhere"). Type progress

As above, but the change proceeds from left to right. In the second example for the regress rule, the result would be pbb if progress was employed. Structural rules

The structural rules can be used to restructuralize the text. They usually interact with multiple levels of description simultaneously. Type raise

Move a unit to another level of description, e.g. when a segment level unit should directly affect the prosody. The parameter is of the form from:to (from and to are arbitrary strings, they can employ the "except" operator (exclamation mark). The tokens in from, if found at the target level, are copied to the scope level, if the original scope token is listed in to. It is also possible to omit the colon and the to string; the default interpretation is "everywhere". Type syll

Roughly speaking, this rule type can be used to split words to syllables according to the theory of sonority, i.e. at the least sonorous phones. More generally, it could be used to do any sort of inserting unit boundaries depending on local values of a simple metric. The parameter is an ordering of the target units (typically, phones). Example: syll 0 inserts the following (and other) syllable boundaries: a|pa ap|pa ap|ppppa arp|pa ar|pra a|pr|pa Tokens not listed are considered least sonorous, order of tokens within the same sonority group (see the example) is irrelevant. As you can see from the example, the syllable boundaries are inserted exactly once per every sequence of equivalent target units (e.g. equisonorous phones) such that both preceding and following target units of the group have higher sonority, and they're inserted either between the first and second element of the group, or, if the group consists of a single unit, before that unit. This semantics is suitable for the syllabification task in all languages known to us where syllabification is not primarily morphologically based, but this rule type can also be used for other tasks involving a unit split as some point defined by its contents, e.g. splitting a higher level prosodic unit before or after certain words. The authors are eager to hear from you if you'd prefer an extension or simplification of this rule type or if you can comment on syllabification issues over a wide range of languages. Prosody modeling rules

The utterance prosody is modeled in Epos by assigning values for the following prosodic quantities of individual text structure units (possibly at multiple levels of description): pitch (fundamental frequency) volume (intensity) and duration (time factor) Currently, these are values per cent, 100 being the neutral value. Epos doesn't currently provide sets of segment inventories for multiple pitch ranges, therefore extreme values, such as 15 or 1500 may sound very unnatural. The prosody adjustments at different levels sum up for the actual values assigned to the generated segments. For example, a phone with the frequency (pitch) value of 130 in a word with the value of 120 will contain segments (after the Type contour

This rule assigns a specified prosody contour to units at some level of description within a unit which consists of them. For example, the rule can be used to assign pitch contours to stress units; individual values will probably be assigned to syllables. The parameter describes a single prosody contour. The first letter denotes the prosodic quantity (frequency, intensity or duration) to be specified; the second is a slash; the adjustments follow as colon-separated decimal integers. For an example, contour f/+2:+0:-2 word syll assigns a falling pitch contour to a trisyllabic word. The number of syllables in a word, or, more generally, of the target units in a scope unit, must match the number of adjustments specified in a to ensure that. As an exception to that, it is possible to specify padding in the contour. At most one adjustment may be immediately followed by an asterisk. This adjustment will be used for zero or more consecutive target units as necessary to stretch the contour over the scope unit. Type prosody

Individual prosodic feature generation. (See also for assigning whole contours more conveniently.) Typically, there will be many instances of this rule in the rules file, each of which using a different configuration file for different purpose (e.g. one may handle word stress, another one sentence-final melody of wh- questions, another one semantic emphasis corresponding to an exclamation mark). The parameter of a PROSODY rule is the name of a file formatted as a dictionary (see ) and is further specified here. Each prosodic adjustment occupies one line; it affects exactly one of frequency, intensity and duration (T, I, or F, respectively) of units positioned among others as specified. Their ordering is insignificant, because each of them affects different units or a different quantity of them. The structure of an adjustment is very simple, so let's just pick an example: i/3:4 -20. The first letter must be one of T, I, F and specifies the quantity that may be adjusted; the first number specified denotes the position within a unit whose length is to be equal to the second number: here, the rule applies at every third syllable of every tetrasyllable, provided that the target of the rule is syllable, while the scope is word (this is specified in the rules file as usual, not in the prosody file). The last number, separated by whitespace, is the intensity adjustment to be added everywhere this specification applies. It is an integer value. It is also possible to have an adjustment applied for any length of the scope unit (in the example above, for words of any number of syllables. To do this, use "*" as the second number of the adjustment. Also, it may make sense to count the target unit starting at the end of the scope unit; in this case append the word "last" to the first number. An example could be f/1last:* -30, or "drop the pitch by 30 for last syllables of every word". Consequently, at most three distinct rules may affect a unit; if that happens, only one is chosen -- the more specific one, or, if both contain the asterisk, the one counting from the beginning is chosen. An example, in order of decreasing precedence: t/1:2 +30 t/1:* +20 t/2last:* +5 You can therefore override general adjustments with exceptions for some lengths which have to be handled separately. If multiple prosodic rules (using their own files) supply adjustments for a certain unit, the adjustments are summed. It is important to understand the difference between e.g. a syllable and its phones: the syllable can have an entirely different prosodic value than its phones; for every given segment, the value for any prosodic quantity is obtained by totalling the values for all of higher levels units it is contained in. This independence of levels of description might theoretically be useful for modeling tone languages. Type smooth

Smoothing out of one of the F,I,T quantities. The parameter is quantity/left_weights/base_weight\right_weights where the left_weights, if there are multiple ones, shall be slash separated, the right_weights shall be backslash separated. The new value of the quantity specified for any target is computed as a weighted average of the values for the surrounding units at the same level. If the target is too near to the scope boundary to have enough neighbors in some direction, the value for the last unit in that direction instead. Example: smooth i/10/20/40\20\10 word syll applied to the second word un-ne-ce-ssa-ry will adjust intensity values for all of the syllables. E.g. the second syllable will be computed as 0.3 x i("un") + 0.4 x i("ne") + 0.2 x i("ce") + 0.1 x i("ssa") The computations for different units do not interfere. The weights can also be specified as negative quantities and/or as sums of more values. This permits linear parameterization of the rules. The unit::project method is responsible for that; it is called before the actual smoothing. Prosodic adjustments existing at lower levels than is the one being smoothened are ignored by the Composite rules

Multiple rules are occasionally necessary where there are syntactical placeholders for a single rule only. Or, several rules have to be grouped in a certain way -- for example, when one rule has to be chosen nondeterministically out of a set of rules. To satisfy these needs, Epos rules include three types of composite rules with different semantics. A composite rule is syntactically treated a single rule for any purpose. Blocks of rules

A block is a sequence of rules enclosed within braces ("{" and "}"). Both the opening and the closing brace follow the rule syntax, but they take no parameters except for an optional scope specification. The block is treated as a single rule, which is useful with conditional rules like if condition { do this do that } The rules are applied sequentially, as you would expect, for every unit of the proper size as given by the scope of the opening brace. This means that every word (if the scope is word) is processed separately throughout all the rules in the block. This involves some splitting of execution on entering the block. By default, no such splitting is done and the block inherits its scope from its master rule (a conditional rule, a block it is encapsulated in, or the global implicit block which covers all the rules altogether). Consequently, the scope of any enclosed rule may not be larger than the scope of the block. Any macros defined in the block are local to the block. The semantic details are C-like and are by no means important. Choices of rules

A choice is a sequence of rules enclosed within brackets ("[" and "]"). Both the opening and the closing bracket follow the rule syntax, but they take no parameters except for possible scope specification. The choice is treated as a single rule. Whenever the choice is applied, one of its subordinate rules is chosen nondeterministically for every unit of the proper size as given by the scope of the opening brace, and only this rule is applied. Generally, choices behave like blocks; the main difference is that with blocks, all of the rules are applied, whereas with choices, exactly one of them gets applied (possibly different rules for different pieces of the text processed). Empty choices (with no rules within) are not tolerated, contrary to empty blocks. Length-based selection of rules

A (length-based) switch is a sequence of rules enclosed within angle brackets ("<" and ">"). Both the opening and the closing bracket follow the rule syntax, but they take no parameters except for possible scope and target specification. The switch is treated as a single rule. Whenever the switch is applied to a scope unit, target units contained within are counted. If n units are found, the n-th rule in sequence of the subordinate rule is applied. If there is less than n rules available, the last one will be used. You can avoid this behavior by specifying "nothing" after the last rule. Repeated rules and choice probabilities

Write "3x" before a rule to repeat it three times (in a block) or to make it three times more probable (in a choice): [ 3x prosody typical.dic prosody variant.dic ] (The first alternative now has 75% of being chosen, while the other one is left for the remaining 25%.) The repeat count must be a positive integer. You can not use this feature just after conditional rules, because repeated rules are not counted as a single rule for syntactic purposes: if $something 2x regress 0>x(!_!) #...wrong! You should rewrite this to if $something { 2x regress 0>x(!_!) } Huge integers (like one million) are disallowed. This is because the current implementation needs a few bytes of memory (one pointer) per every repetition. Conditional rules

The conditional rules execute the following rule if and only if a condition is met. The condition is specified as the parameter, the following (conditioned) rule is given on a separate line (or lines, if a follows). (Comments, whitespace and empty lines may intervene as usual.) It is not syntactically necessary to indent the conditioned rules with whitespace, but it is strongly recommended for readability. Type inside

Apply a rule or a block of rules within certain units only. The parameter is a list of values at the scope level, wherein the following rule should be applied; the "except" operator may be used. Every unit (a sentence, for example), which fulfills the criterion, is processed separately, therefore the scope of the following rule may be at most that of the inside rule itself. Type near

Apply a rule or a block of rules within units which contain at least one of the specified units. The parameter is a list of values at the target level, which are looked up in a unit of the scope level; the operator may be used. If an occurence is found, the following rule gets applied to the scope level unit. If the parameter begins with an asterisk, the asterisk is treated as an except operator and the test is negated. In other words, the following rule gets applied, if every target level unit contained meets the set description with the leading asterisk ignored. You can combine asterisk and an extra except operator to get tests of the "contains no characters of this class" type. Type with

Apply a rule or a block of rules for a list of units. In contrast with the preceding rule type, this refers not only to the token at the scope level (such as space), but to the whole structure (such as the string of phones delimited by the space). The parameter is a filename; the file should list the strings subject to the following rule, such as special words. Type if

Apply a rule or a block of rules only if a condition (given by the parameter) is met. The condition must currently be specified as a boolean voice configuration option (possibly a soft option) or its negation (i.e. prefixed with an exclamation mark). Example: if !colloquial { ... } The rules within the block will be applied only if the colloquial option is not set. This if rule inherits its scope from its parent rule if not specified explicitly. Again, the scope of a subordinate rule may not be larger than that of the if rule itself. Special rules Type regex

Regular expression substitution. The parameter is of the form /regular_expression/replacement/. This rule type is similar to subst with only one dictionary item, but it is way more powerful and more arcane; its use is not intended for end wizards nor trivial tasks. For a regular expressions' overview, UNIX users can consult e.g. the grep manual page, whereas Windows users can telnet to a nearby UNIX machine and write man grep there. Epos uses the extended regular expression syntax with the following difference: in "regular" regular expressions, parentheses match themselves, while the open group and close group operators are $ and $, respectively. As we use groups heavily and next to none real parentheses, we decided to do it the other way round. Also, sed users may be surprised by the iterative behavior of the n-th group within the regular expression: \1 to \9. \0 represents the entire match, but this is probably unusable under the current design, as this would cause an infinite substitution loop. In order to use this type of rule, you need to have the rx or regex library already installed and have WANT_REGEX enabled in common.h. This is because we don't actually implement the regex parsing stuff; we leave it to your OS libraries. In case you don't have such libraries installed, we use the glibc implementation (rx.c in the Epos distribution). Note that if your system doesn't support locale setting nor provides a usable regex library, you can't use named character classes such as [:upper:] in your regular expressions. This is the case on Windows CE. Type debug

Debugging information during the application of the rules. Scope and target are ignored, the parameter is parsed lazily. Parameter "elem": dump the current state of the text being processed Parameter "pause": wait until keypress The "except" (!) operator

Whenever an unordered list of tokens should be specified within the parameter to some rule (use common sense and/or individual rule descriptions above), you can also make negative specifications, such as "all consonants except l and r". To do this, use the exclamation mark serving as an "except" operator: $consonants!lr (The right operand is subtracted from the left one.) If there is no left operand, say in !x, the semantics is "all but x". You see that ! alone means "everything". The operator is right-associative; !$vowels!ou means "all excluding vowels, but o and u don't count as vowels just now". Let us repeat that this operator will never work for ordered lists, not even for the syll rule sonority groups. Escaping special characters

You can use the backslash to escape any special character including itself anywhere in the rules or .ini file strings.