A Practical Introduction To The aeneas Package

RSS • Permalink • Created 21 May 2015 • Written by Alberto Pettarin

This post is a practical introduction to the aeneas package, with concrete examples of how to use it to compute audio/text sync maps.

The Problem `aeneas` Solves

aeneas is a Python library and a set of tools to automagically synchronize audio and text.

In other words, the main function of this software is to automate the computation of a synchronization map file ("sync map" for short) between an audio file and a list of text fragments. Sync maps have a variety of uses, including reflowable EPUB 3 Audio-eBooks or FXL Read-aloud EPUB 3 ebooks (SMIL files) and closed captioning videos (SRT/WebVTT/TTML files).

In abstract terms, a sync map associates each text fragment with the time interval, in the audio file, when that text fragment is spoken:

[00:00:00.000, 00:00:02.680] <=> 1                                                      
[00:00:02.680, 00:00:05.480] <=> From fairest creatures we desire increase,            
[00:00:05.480, 00:00:08.640] <=> That thereby beauty's rose might never die,           
[00:00:08.640, 00:00:11.960] <=> But as the riper should by time decease,              
[00:00:11.960, 00:00:15.279] <=> His tender heir might bear his memory:                
[00:00:15.279, 00:00:18.519] <=> But thou contracted to thine own bright eyes,         
[00:00:18.519, 00:00:22.760] <=> Feed'st thy light's flame with self-substantial fuel, 
[00:00:22.760, 00:00:25.719] <=> Making a famine where abundance lies,                 
[00:00:25.719, 00:00:31.239] <=> Thy self thy foe, to thy sweet self too cruel:        
[00:00:31.239, 00:00:34.280] <=> Thou that art now the world's fresh ornament,         
[00:00:34.280, 00:00:36.960] <=> And only herald to the gaudy spring,                  
[00:00:36.960, 00:00:40.640] <=> Within thine own bud buriest thy content,             
[00:00:40.640, 00:00:43.600] <=> And tender churl mak'st waste in niggarding:          
[00:00:43.600, 00:00:48.000] <=> Pity the world, or else this glutton be,              
[00:00:48.000, 00:00:53.280] <=> To eat the world's due, by the grave and thee.

The major advantage of aeneas is to eliminate the need for human labor to produce the timings (which usually involves painfully long "listen-and-mark" sessions), while still producing a "correct" output, that is, sync maps indistinguishable from those that a human operator would produce manually.

The repo on GitHub includes the library source code, some "pre built" programs to compute the sync maps (which cover the most frequent use cases), unit tests and the documentation.

Installing `aeneas`

Assuming you have Python 2.7.x and Git in your machine, installing aeneas is easy:

$ git clone https://github.com/readbeyond/aeneas.git
$ cd aeneas
$ pip install -r requirements.txt
$ python check_dependencies.py

Note that you might need to:

$ apt-get install ffmpeg*
$ apt-get install espeak*

if you do not have ffmpeg and espeak installed already.

If you are running an (old) stable version of Debian, you might get an error when installing the scikits.audiolab Python package. In that case, please see this thread. (I will see if I can remove the dependency from this library, by switching to a less-problematic-to-get one.)

Right now the only supported OS is Linux (Debian), but I have aeneas configured and running on my Mac Mini (OS X) and it was confirmed to be working on a Windows 8 machine too.

Please see the online documentation for more information.

Computing A Sync Map With `execute_task`

In the aeneas jargon, a Task represents the atomic unit of work, that is, an audio file and a list of text fragments to be synchronized, and for which you want to obtain a sync map file, in the format (SMIL, SRT, TXT, etc.) you need.

To generate the sync map file, you can use the execute_task script included in the package:

$ python -m aeneas.tools.execute_task audio.mp3 text.txt config_string map.smil

The script takes the following parameters:

the path to the audio file (audio.mp3)
the path to the file containing the text fragments (text.txt)
the configuration string (config_string)
the path to the sync map file to be created (map.smil)

Let's examine each argument.

The Audio File

The audio file contains the narration of the text to be synchronized. Any format readable by ffmpeg can be used, including the popular MP3, MP4, AAC, OGG, WAV, FLAC, WebM. (Make sure you have the relevant codecs installed.)

The Text File

The text file contains the text fragments to be synchronized. Currently, three formats are supported:

plain
parsed
unparsed

In all three cases, the file must be encoded using UTF-8 (without BOM).

`plain` Format

The first format, plain, simply lists the fragments, one per line. For example, if text.txt contains the following 15 lines:

1
From fairest creatures we desire increase,
That thereby beauty's rose might never die,
But as the riper should by time decease,
His tender heir might bear his memory:
But thou contracted to thine own bright eyes,
Feed'st thy light's flame with self-substantial fuel,
Making a famine where abundance lies,
Thy self thy foe, to thy sweet self too cruel:
Thou that art now the world's fresh ornament,
And only herald to the gaudy spring,
Within thine own bud buriest thy content,
And tender churl mak'st waste in niggarding:
Pity the world, or else this glutton be,
To eat the world's due, by the grave and thee.

execute_task will align 15 fragments, one for the title (1) and 14 others, one for each verse.

If text.txt contains the following 107 lines:

1
From
fairest
creatures
we
desire
increase,
That
thereby
beauty's
rose
might
never
die,
But
as
the
riper
should
by
time
decease,
His
tender
heir
might
bear
his
memory:
But
thou
contracted
to
thine
own
bright
eyes,
Feed'st
thy
light's
flame
with
self-substantial
fuel,
Making
a
famine
where
abundance
lies,
Thy
self
thy
foe,
to
thy
sweet
self
too
cruel:
Thou
that
art
now
the
world's
fresh
ornament,
And
only
herald
to
the
gaudy
spring,
Within
thine
own
bud
buriest
thy
content,
And
tender
churl
mak'st
waste
in
niggarding:
Pity
the
world,
or
else
this
glutton
be,
To
eat
the
world's
due,
by
the
grave
and
thee.

execute_task will align 107 fragments, at word-level granularity.

If you specify the text fragments using the plain text file format, aeneas will automatically assign to each fragment, in the same order they appear in the input text file, the following ids: f000001, f000002, f000003, etc. This is done because for certain sync map formats, like SMIL, you need a (unique) id for each text fragment.

`parsed` Format

The second format, parsed, is similar, but it allows the user to explicitly provide the id of each text fragment.

To do so, each line still corresponds to a text fragment but now it must contain the id, the | (pipe) character as the separator, and the text of the fragment.

For example, the following text.txt:

f000001|1
f000002|From fairest creatures we desire increase,
f000003|That thereby beauty's rose might never die,
f000004|But as the riper should by time decease,
f000005|His tender heir might bear his memory:
f000006|But thou contracted to thine own bright eyes,
f000007|Feed'st thy light's flame with self-substantial fuel,
f000008|Making a famine where abundance lies,
f000009|Thy self thy foe, to thy sweet self too cruel:
f000010|Thou that art now the world's fresh ornament,
f000011|And only herald to the gaudy spring,
f000012|Within thine own bud buriest thy content,
f000013|And tender churl mak'st waste in niggarding:
f000014|Pity the world, or else this glutton be,
f000015|To eat the world's due, by the grave and thee.

is equivalent to the first plain example above.

Clearly, a best practice consists in generating the ids as valid XML ids (i.e., as shown above, one letter followed by a fixed number of digits, forming progressive, consecutive numbers). However, nothing impedes you from providing something like:

avocado|1
banana|From fairest creatures we desire increase,
cherry|That thereby beauty's rose might never die,
date|But as the riper should by time decease,
elderberry|His tender heir might bear his memory:
fig|But thou contracted to thine own bright eyes,
grapefruit|Feed'st thy light's flame with self-substantial fuel,
hackberry|Making a famine where abundance lies,
indianprune|Thy self thy foe, to thy sweet self too cruel:
jackfruit|Thou that art now the world's fresh ornament,
kiwi|And only herald to the gaudy spring,
lime|Within thine own bud buriest thy content,
mango|And tender churl mak'st waste in niggarding:
nectarine|Pity the world, or else this glutton be,
orange|To eat the world's due, by the grave and thee.

(whatever logic is behind the choice of the ids!)

`unparsed` Format

If you are working with EPUB 3 eBooks with Media Overlays, probably you have already produced the (X)HTML file, where each text fragment to be highlighted has its id attribute.

If this is the case, the unparsed text file format allows aeneas to extract the text fragments by directly parsing the XML DOM. Suppose you have the following text.xhtml file:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta name="viewport" content="width=768,height=1024"/>
  <link rel="stylesheet" href="../Styles/style.css" type="text/css"/>
  <title>Sonnet I</title>
 </head>
 <body>
  <div id="divTitle">
   <h1><span id="f001">1</span></h1>
  </div>
  <div id="divSonnet"> 
   <p>
    <span id="f002">From fairest creatures we desire increase,</span><br/>
    <span id="f003">That thereby beauty’s rose might never die,</span><br/>
    <span id="f004">But as the riper should by time decease,</span><br/>
    <span id="f005">His tender heir might bear his memory:</span><br/>
    <span id="f006">But thou contracted to thine own bright eyes,</span><br/>
    <span id="f007">Feed’st thy light’s flame with self-substantial fuel,</span><br/>
    <span id="f008">Making a famine where abundance lies,</span><br/>
    <span id="f009">Thy self thy foe, to thy sweet self too cruel:</span><br/>
    <span id="f010">Thou that art now the world’s fresh ornament,</span><br/>
    <span id="f011">And only herald to the gaudy spring,</span><br/>
    <span id="f012">Within thine own bud buriest thy content,</span><br/>
    <span id="f013">And tender churl mak’st waste in niggarding:</span><br/>
    <span id="f014">Pity the world, or else this glutton be,</span><br/>
    <span id="f015">To eat the world’s due, by the grave and thee.</span>
   </p>
  </div>
 </body>
</html>

Clearly, you must instruct aeneas to identify the elements that contain the text to be actually used for the synchronization. In the above example, you want to extract the text from elements with an id attribute matching the following regular expression: f[0-9][0-9][0-9] (an f followed by three digits).

To do so, you will specify

is_text_unparsed_id_regex=f[0-9][0-9][0-9]

in the configuration string (see below).

If not ambiguous (know your source!), you can also use the wildcard characters + and *. In the above example, you can use f[0-9]+ (an f followed by one or more digits) instead of f[0-9][0-9][0-9].

To reduce ambiguity, you might also instruct aeneas to look for elements with a given value in their class attribute. If your input file is:

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en" xml:lang="en">
 <head>
  <meta charset="utf-8"/>
  <meta name="viewport" content="width=768,height=1024"/>
  <link rel="stylesheet" href="../Styles/style.css" type="text/css"/>
  <title>Sonnet I</title>
 </head>
 <body>
  <div id="divTitle">
   <h1><span class="ra" id="f001">1</span></h1>
  </div>
  <div id="divSonnet"> 
   <p>
    <span class="ra" id="f002">From fairest creatures we desire increase,</span><br/>
    <span class="ra" id="f003">That thereby beauty’s rose might never die,</span><br/>
    <span class="ra" id="f004">But as the riper should by time decease,</span><br/>
    <span class="ra" id="f005">His tender heir might bear his memory:</span><br/>
    <span class="ra" id="f006">But thou contracted to thine own bright eyes,</span><br/>
    <span class="ra" id="f007">Feed’st thy light’s flame with self-substantial fuel,</span><br/>
    <span class="ra" id="f008">Making a famine where abundance lies,</span><br/>
    <span class="ra" id="f009">Thy self thy foe, to thy sweet self too cruel:</span><br/>
    <span class="ra" id="f010">Thou that art now the world’s fresh ornament,</span><br/>
    <span class="ra" id="f011">And only herald to the gaudy spring,</span><br/>
    <span class="ra" id="f012">Within thine own bud buriest thy content,</span><br/>
    <span class="ra" id="f013">And tender churl mak’st waste in niggarding:</span><br/>
    <span class="ra" id="f014">Pity the world, or else this glutton be,</span><br/>
    <span class="ra" id="f015">To eat the world’s due, by the grave and thee.</span>
   </p>
  </div>
 </body>
</html>

you might want to specify both the following requirements:

id must match f[0-9]+, and
class must match (that is, must contain the value) ra.

Similarly to the previous case, your configuration string will contain

is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_class_regex=ra

Finally, aeneas asks you to specify the order in which the extracted text fragments should be aligned. In fact, the order in which the elements might appear in the DOM might be different from their order in the audio file.

For example, you might have the following portion of DOM:

<h1 id="f001">1</h1>
<p>
<span id="f003">That thereby beauty's rose might never die,</span><br/>
<span id="f005">His tender heir might bear his memory:</span><br/>
<span id="f004">But as the riper should by time decease,</span><br/>
<span id="f002">From fairest creatures we desire increase,</span><br/>
</p>

and you want the extracted fragments to appear in this order:

f001 (1)
f002 (From fairest creatures we desire increase,)
f003 (That thereby beauty's rose might never die,)
f004 (But as the riper should by time decease,)
f005 (His tender heir might bear his memory:)

In this case, you will specify the following parameter in the configuration string:

is_text_unparsed_id_sort=numeric

which will instruct aeneas to disregard any non-digit appearing in the id values, and sort the text fragments according to the remaining numeric part (leading zeroes are ignored).

Other options for is_text_unparsed_id_sort include unsorted (do not reorder the text fragments) and lexicographic (sort the ids based on their lexicographic order).

The Configuration String

As mentioned above, there are a few parameters you must specify to execute_task, in order to have your input files processed correctly. To that end, you need to write a configuration string, which is a UTF-8 encoded string that looks like this:

key1=value1|key2=value2|key3=value with spaces in it| ... |keyN=valueN

The order of the key=value pairs does not matter, but you must use the | (pipe) character to separate them.

(I know this syntax looks a bit clumsy and cumbersome, but it is very compact and it can be directly passed to APIs, like we did in ReadBeyond Sync. If I have time, I will enhance execute_task and execute_job with an argument parser, allowing the user to specify parameters using switches like --language en or -f smil.)

You need to specify at least three parameters:

the language of your input materials (e.g., task_language=en)
the format of the text file (e.g., is_text_type=plain)
the format of the sync map to be output (e.g., os_task_file_format=srt)

The resulting string is:

task_language=en|is_text_type=plain|os_task_file_format=srt

For example, assuming you have an audio file /tmp/audio.mp3, a plain text file /tmp/subs.txt, both in English (en), and you want to output a file /tmp/subs.srt in SRT format (srt), you will issue the following command:

$ python -m aeneas.tools.execute_task /tmp/audio.mp3 /tmp/subs.txt "task_language=en|is_text_type=plain|os_task_file_format=srt" /tmp/subs.srt

If you need to run several tasks sharing the same configuration string, you might want to assign the latter to a shell variable CONFIG_STRING:

$ CONFIG_STRING="task_language=en|is_text_type=plain|os_task_file_format=srt"
$ python -m aeneas.tools.execute_task /tmp/audio1.mp3 /tmp/subs1.txt "$CONFIG_STRING" /tmp/subs1.srt
$ python -m aeneas.tools.execute_task /tmp/audio2.mp3 /tmp/subs2.txt "$CONFIG_STRING" /tmp/subs2.srt
$ python -m aeneas.tools.execute_task /tmp/audio3.mp3 /tmp/subs3.txt "$CONFIG_STRING" /tmp/subs3.srt

This mechanism is adequate as long as you have few tasks and/or you want to run them one-by-one. An handier mechanism leverages the execute_job program, described below.

Optional Parameters

The configuration string might have additional, optional parameters.

The two most useful ones are:

is_audio_file_head_length=X: ignore the first X seconds of the audio file
is_audio_file_process_length=Y: synchronize only Y seconds of the audio file

which allow you to "cut" (for the synchronization purposes) the head of the audio file, its tail or both. For example, if you have an audio file of total length 60s:

is_audio_file_head_length=20: sync from 20s to 60s in the audio file
is_audio_file_process_length=50: sync from 0s to 50s in the audio file
is_audio_file_head_length=20|is_audio_file_process_length=30: sync from 20s to 20s+30s=50s in the audio file

Implied Parameters

As discussed above while describing the unparsed text format, when you specify the is_text_type=unparsed parameter, you must also specify:

is_text_unparsed_id_regex
is_text_unparsed_id_sort,
optionally, you might also set is_text_unparsed_class_regex

When you want to output in SMIL format (is_task_file_format=smil), you must also specify the values for the src attribute of:

the <audio> elements, with os_task_file_smil_audio_ref
the <text> elements, with os_task_file_smil_page_ref

For example, you might have the following configuration string:

task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=../audio/audio.mp3|os_task_file_smil_page_ref=text.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric

For the sake of clarity, I will break it down into pairs:

task_language=en
os_task_file_format=smil
os_task_file_smil_audio_ref=../audio/audio.mp3
os_task_file_smil_page_ref=text.xhtml
is_text_type=unparsed
is_text_unparsed_id_regex=f[0-9]+
is_text_unparsed_id_sort=numeric

which will instruct aeneas to produce a SMIL file like this:

<smil xmlns="http://www.w3.org/ns/SMIL" xmlns:epub="http://www.idpf.org/2007/ops" version="3.0">
 <body>
  <seq id="s000001" epub:textref="text.xhtml">
   <par id="p000001">
    <text src="text.xhtml#f001"/>
    <audio clipBegin="00:00:00.000" clipEnd="00:00:02.680" src="../audio/audio.mp3"/>
   </par>
   <par id="p000002">
    <text src="text.xhtml#f002"/>
    <audio clipBegin="00:00:02.680" clipEnd="00:00:05.480" src="../audio/audio.mp3"/>
   </par>
   <par id="p000003">
    <text src="text.xhtml#f003"/>
    <audio clipBegin="00:00:05.480" clipEnd="00:00:08.640" src="../audio/audio.mp3"/>
   </par>
   <par id="p000004">
    <text src="text.xhtml#f004"/>
    <audio clipBegin="00:00:08.640" clipEnd="00:00:11.960" src="../audio/audio.mp3"/>
   </par>
   <par id="p000005">
    <text src="text.xhtml#f005"/>
    <audio clipBegin="00:00:11.960" clipEnd="00:00:14.320" src="../audio/audio.mp3"/>
   </par>
   <par id="p000006">
    <text src="text.xhtml#f006"/>
    <audio clipBegin="00:00:14.320" clipEnd="00:00:18.839" src="../audio/audio.mp3"/>
   </par>
   <par id="p000007">
    <text src="text.xhtml#f007"/>
    <audio clipBegin="00:00:18.839" clipEnd="00:00:22.760" src="../audio/audio.mp3"/>
   </par>
   <par id="p000008">
    <text src="text.xhtml#f008"/>
    <audio clipBegin="00:00:22.760" clipEnd="00:00:25.320" src="../audio/audio.mp3"/>
   </par>
   <par id="p000009">
    <text src="text.xhtml#f009"/>
    <audio clipBegin="00:00:25.320" clipEnd="00:00:31.239" src="../audio/audio.mp3"/>
   </par>
   <par id="p000010">
    <text src="text.xhtml#f010"/>
    <audio clipBegin="00:00:31.239" clipEnd="00:00:34.280" src="../audio/audio.mp3"/>
   </par>
   <par id="p000011">
    <text src="text.xhtml#f011"/>
    <audio clipBegin="00:00:34.280" clipEnd="00:00:36.479" src="../audio/audio.mp3"/>
   </par>
   <par id="p000012">
    <text src="text.xhtml#f012"/>
    <audio clipBegin="00:00:36.479" clipEnd="00:00:40.640" src="../audio/audio.mp3"/>
   </par>
   <par id="p000013">
    <text src="text.xhtml#f013"/>
    <audio clipBegin="00:00:40.640" clipEnd="00:00:43.600" src="../audio/audio.mp3"/>
   </par>
   <par id="p000014">
    <text src="text.xhtml#f014"/>
    <audio clipBegin="00:00:43.600" clipEnd="00:00:48.000" src="../audio/audio.mp3"/>
   </par>
   <par id="p000015">
    <text src="text.xhtml#f015"/>
    <audio clipBegin="00:00:48.000" clipEnd="00:00:53.240" src="../audio/audio.mp3"/>
   </par>
  </seq>
 </body>
</smil>

Please note that, for <audio> elements, the relative path ../audio/audio.mp3 has been used, as specified in the configuration string.

References To The Documentation

Languages: docs
Input text formats: docs
Output sync map formats: docs
ID sorting algorithms: docs
Parameter keys: docs

Please also refer to the examples you can find in the aeneas/tests/res and long_tests directories of the cloned repo.

Computing Multiple Sync Maps At Once

As briefly mentioned above, especially if you work with EPUB 3 eBooks, you might have dozens of tasks to run, all with the same configuration parameters.

In this case, you can create a Job, that is, a set of Tasks, and process them in batch using the aeneas.tools.execute_job command.

In its simplest form, this command takes two arguments:

$ python -m aeneas.tools.execute_job /path/to/job.zip /path/to/output/dir/

/path/to/job.zip is a ZIP file containing all the input assets (i.e., a pair of audio/text files for each task) and a special configuration file config.txt (or config.xml) containing the runtime instructions
/path/to/output/dir/ is the directory where the output archive, containing the output sync maps, one for each task, should be created

Note that, instead of creating an input ZIP file, you can also pass a path to an uncompressed directory /path/to/job/:

$ python -m aeneas.tools.execute_job /path/to/job/ /path/to/output/dir/

In the aeneas/tests/res/example_jobs directory you can find several examples of job directories, with different ways of arranging the input files inside the input container directory hierarchy, and with different runtime parameters.

In what follows, I will describe the contents of the config.txt textual/INI-like configuration file, which is the simplest way of specifying a job configuration, yet it should cover a vast majority of use cases.

If you need a finer control over the job configuration, for example you have different tasks with different languages, you can create a config.xml XML configuration file: see the documentation for more details.

The`config.txt` Configuration File: Flat Case

Suppose you have the following files in the flat_example directory:

flat_example
├── config.txt
└── OEBPS
    └── Resources
        ├── sonnet001.mp3
        ├── sonnet001.xhtml
        ├── sonnet002.mp3
        ├── sonnet002.xhtml
        ├── sonnet003.mp3
        └── sonnet003.xhtml

The config.txt file contains the following:

is_hierarchy_type=flat
is_hierarchy_prefix=OEBPS/Resources/
is_text_file_relative_path=.
is_text_file_name_regex=.*\.xhtml
is_text_type=unparsed
is_text_unparsed_id_regex=f[0-9]+
is_text_unparsed_id_sort=numeric
is_audio_file_relative_path=.
is_audio_file_name_regex=.*\.mp3

os_job_file_name=output_flat
os_job_file_container=zip
os_job_file_hierarchy_type=flat
os_job_file_hierarchy_prefix=OEBPS/Resources/
os_task_file_name=$PREFIX.smil
os_task_file_format=smil
os_task_file_smil_page_ref=$PREFIX.xhtml
os_task_file_smil_audio_ref=$PREFIX.mp3

job_language=en
job_description=Example (flat hierarchy, unparsed text files, smil output)

If you run the following command:

$ python -m aeneas.tools.execute_job flat_example/ /tmp/

you will get a ZIP file /tmp/output_flat.zip containing three SMIL files, one for each of the three tasks found:

.
└── OEBPS
    └── Resources
        ├── sonnet001.smil
        ├── sonnet002.smil
        └── sonnet003.smil

The is_hierarchy_type=flat tells aeneas that the assets in the input container are contained within the same directory, positioned at is_hierarchy_prefix=OEBPS/Resources/. (Note that all the paths for the input assets are relative to the config.txt file.)

The audio and text files for each task are identified by matching the is_text_file_name_regex and is_audio_file_name_regex regular expressions. A task is created only if both the audio file and the text file are matched and they share the same name prefix.

Similarly, the os_job_file_hierarchy_type=flat and os_job_file_hierarchy_prefix=OEBPS/Resources/ specify the desired output directory hierarchy. Note that the $PREFIX placeholder will be replaced by each task name (i.e., sonnet001, sonnet002, sonnet003 in the example).

Finally, please note that the language is set (for all the tasks) to English by the job_language=en line.

The`config.txt` Configuration File: Paged Case

If your tasks are divided into subdirectories of the main directory paged_example:

paged_example
├── config.txt
└── OEBPS
    └── Pages
        ├── 01
        │   ├── audio.mp3
        │   └── page.xhtml
        ├── 02
        │   ├── audio.mp3
        │   └── page.xhtml
        └── 03
            ├── audio.mp3
            └── page.xhtml

you must specify the paged hierarchy in your config.txt:

is_hierarchy_type=paged
is_hierarchy_prefix=OEBPS/Pages/
is_task_dir_name_regex=[0-9]+
is_text_file_relative_path=.
is_text_file_name_regex=page.xhtml
is_text_type=unparsed
is_text_unparsed_id_regex=f[0-9]+
is_text_unparsed_id_sort=numeric
is_audio_file_relative_path=.
is_audio_file_name_regex=audio.mp3

os_job_file_name=output_paged
os_job_file_container=zip
os_job_file_hierarchy_type=paged
os_job_file_hierarchy_prefix=OEBPS/Pages/
os_task_file_name=map.smil
os_task_file_format=smil
os_task_file_smil_page_ref=page.xhtml
os_task_file_smil_audio_ref=audio.mp3

job_language=en
job_description=Example (paged hierarchy, unparsed text files, smil output)

Executing:

$ python -m aeneas.tools.execute_job paged_example/ /tmp/

will create the ZIP file /tmp/output_paged.zip containing:

.
└── OEBPS
    └── Pages
        ├── 01
        │   └── map.smil
        ├── 02
        │   └── map.smil
        └── 03
            └── map.smil

Please note that if you use is_hierarchy_type=paged, you must provide a regex for is_task_dir_name_regex which will be used to to identify the tasks by matching the subdirectory names (is_task_dir_name_regex=[0-9]+ in the example).

work • aeneas, audacity, audio-ebooks, epub, epub3, fixed_layout, floss, free, media_overlays, python, srt, sync, sync_map, ttml, vtt, xml

A Practical Introduction To The aeneas Package

The Problem aeneas Solves

Installing aeneas

Computing A Sync Map With execute_task