The Epos Speech System: Introduction

1. Introduction

In the early nineties, the Institute of Phonetics, Faculty of Arts, Charles University, Prague, and the Institute of Radio-Engineering and Electronics, Academy of Sciences, Prague, had managed to assemble a complete TTS implementation for the Czech and Slovak languages, using a linear prediction based diphone synthesis. This TTS engine then served as a base for further research in speech synthesis, until the source became too large and complicated to be easily modified, ported to new hardware or operating system, or to be well understood by anybody except the authors. By the end of 1995, when a need for testing some new prosody modelling hypotheses had arisen, these limitations were slowly becoming a major burden. A new implementation of part of the system was eventually written from scratch (starting in 1996) and it is still expanding, integrating the original results with numerous recent improvements. It has been baptized Epos in 1998.

Our primary design goal is to allow the user the ultimate control over the TTS processing. We avoid hard-wired constants; we use configuration options instead, with sensible default values. Most of the language-dependent processing is driven by a rules file, a text file using an intuitive and well-documented syntax. A rules file lists the rules to be applied on a written text structure representation to yield a corresponding spoken text structure representation (in fact it could be the other way round in principle, but somehow no one seems to need that). Some aspects of user-definable behavior don't fit into the concept of a rules file, and are therefore settable with various options in conventional configuration files. Finally, many other external files can be referenced either by the rules file, or a configuration file, such as segment inventories or dictionaries.

Most of these files have to be processed before any actual TTS processing has finished. That's why Epos is implemented as a background process, i.e. as a daemon under UNIX-like OSes and as a service on Windows NT and similar OSes. Epos reserves a TCP/IP port for any communication with client applications using a custom, quite generic protocol for TTS data flow control, called TTSCP. A simple TTSCP client utility named say has been provided with Epos, but there are many more specialized TTSCP clients in existence.

Epos currently supports three main speech generation algorithms. Two ones employ segment concatenation in the time domain; these have been contributed by Zdenek Kadlec, Masaryk University, Brno, and Martin Petriska, Slovak Technical University, Bratislava, respectively. These algorithms, including low quality segment inventories, are available as part of Epos subject to GPL. The third one is a linear prediction coding speech synthesis written by Ellen Víchová, which actually gives good results, but is not freely available (existing commercial applications prohibit this). You can however use a virtual speech synthesis to synthesize your own texts using the linear prediction algorithm, if you are connected to the Internet. (Your text is partly processed and sent to our server. Then the generated speech signal is sent back to you.) We are constantly working on improving especially the freely distributable syntheses.

The name Epos is not an acronym. It is Greek for...um, go have a look yourself.

This section still has to be (re)written by somebody. At present, try to look at http://epos.ure.cas.cz for additional introductory information or ask the authors by email.

Meanwhile, tell us what kind of introductory information you would like to see here. The documentation is provided for you and we need your feedback.

Next Previous Contents