Skip to content

Commit

Permalink
Merge pull request #9 from pettarin/master
Browse files Browse the repository at this point in the history
Release as v1.2.0.
  • Loading branch information
readbeyond committed Sep 27, 2015
2 parents 47da67b + 18d9dfb commit ccab249
Show file tree
Hide file tree
Showing 171 changed files with 8,797 additions and 2,672 deletions.
148 changes: 80 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

**aeneas** is a Python library and a set of tools to automagically synchronize audio and text.

* Version: 1.1.2
* Date: 2015-09-24
* Version: 1.2.0
* Date: 2015-09-27
* Developed by: [ReadBeyond](http://www.readbeyond.it/)
* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)
* License: the GNU Affero General Public License Version 3 (AGPL v3)
Expand All @@ -17,7 +17,7 @@ and an audio file containing the narration of the (same) text.

For example, given [this text file](aeneas/tests/res/container/job/assets/p001.xhtml)
and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3),
**aeneas** computes the following map:
**aeneas** computes the following abstract map:

```
[00:00:00.000, 00:00:02.680] <=> 1
Expand All @@ -37,28 +37,28 @@ and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3),
[00:00:48.000, 00:00:53.280] <=> To eat the world's due, by the grave and thee.
```

Moreover, the map can be output in several formats: SMIL for EPUB 3,
SRT/TTML/VTT for closed captioning, JS for Web usage,
The map can be output to file in several formats: SMIL for EPUB 3,
SRT/TTML/VTT for closed captioning, JSON/RBSE for Web usage,
or raw CSV/SSV/TSV/TXT/XML for further processing.


## System Requirements, Supported Platforms and Installation

### System Requirements

1. 2 GB RAM (4 GB recommended), 2 GHz CPU (3 GHz 64bit recommended)
2. `ffmpeg` and `ffprobe` executable available in your `$PATH` (`apt-get install ffmpeg*` from [`deb-multimedia`](http://www.deb-multimedia.org/))
3. `espeak` executable available in your `$PATH` (`apt-get install espeak*`)
1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
2. `ffmpeg` and `ffprobe` executables available in your `$PATH`
3. `espeak` executable available in your `$PATH`
4. Python 2.7.x
5. Python optional modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab` (`pip install ...`)
6. (Optional but strongly suggested) Python C headers to compile the Python C extensions (`apt-get install python-dev`)
5. Python modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab`
6. (Optional but strongly suggested) Python C headers to compile the Python C extensions

Depending on the format(s) of audio files you work with,
you might need to install additional audio codecs for `ffmpeg`.
Similarly, you might need to install additional voices
for `espeak`, depending on the language(s) you work on.
(Installing _all_ the codecs and _all_ the voices available
in the Debian repository might be a good idea.)
might be a good idea.)

If installing the above dependencies proves difficult on your OS,
consider using the [Vagrant box](http://www.vagrantup.com)
Expand All @@ -68,87 +68,92 @@ created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).

**aeneas** has been developed and tested on **Debian 64bit**,
which is the **only supported OS** at the moment.
Other Linux distributions should be good too.

However, it should work on Mac OS X and Windows as well,
once you make sure `ffmpeg`, `ffprobe` and `espeak`
However, **aeneas** has been confirmed to work
on other Linux distributions (Ubuntu, Slackware),
on Mac OS X (with developer tools installed) and on Windows Vista/7/8.1/10.

Whatever your OS is, make sure
`ffmpeg`, `ffprobe` (which is part of `ffmpeg` distribution), and `espeak`
are properly installed and
callable by the `subprocess` Python module.
A way to ensure the latter consists
in adding the three executables to your `$PATH`.
Alternatively, you can use VirtualBox
in adding these three executables to your `$PATH`.

If installing **aeneas** natively on your OS proves difficult,
you can use VirtualBox and [Vagrant](http://www.vagrantup.com)
to run **aeneas** inside a virtualized Debian image,
for example using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).
using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).

### Installation

```bash
$ git clone https://github.com/readbeyond/aeneas.git
$ cd aeneas
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
$ python check_dependencies.py
```
#### Linux and Mac OS X

If the last command prints a success message,
you have all the required dependencies installed
and you can confidently run **aeneas** in production.

If you are a user of a `deb`-based Linux distribution
(e.g., Debian, Ubuntu),
1. If you are a user of a `deb`-based Linux distribution
(e.g., Debian or Ubuntu),
you can install all the dependencies by running
[the provided `install_dependencies.sh` script](install_dependencies.sh)

```bash
$ sudo bash install_dependencies.sh
```
```bash
$ sudo bash install_dependencies.sh
```

2. If you have another Linux distribution or Mac OS X,
just make sure you have
`ffmpeg`, `ffprobe` (part of the `ffmpeg` package),
and `espeak` installed and available on your command line.
You also need Python 2.x and its "developer" package
containing the C headers.

3. Run the following commands:

```bash
$ git clone https://github.com/readbeyond/aeneas.git
$ cd aeneas
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
$ python check_dependencies.py
```

Then, run `python setup.py build_ext --inplace` and `python check_dependencies.py` as above.
If the last command prints a success message,
you have all the required dependencies installed
and you can confidently run **aeneas** in production.

If you are a Windows user, please read the installation instructions
#### Windows

Please read the installation instructions
contained in the
["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/)
["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/),
based on
[these directions](https://groups.google.com/d/msg/aeneas-forced-alignment/p9cb1FA0X0I/8phzUgIqBAAJ),
written by Richard Margetts.

If installing natively proves difficult on your OS,
consider using the [Vagrant box](http://www.vagrantup.com)
created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).


## Usage

1. Clone this GitHub repo:
1. Install `aeneas` as described above. (Only the first time!)

```bash
$ git clone https://github.com/readbeyond/aeneas.git
```
2. Open a command prompt/shell/terminal and go to the root directory
of the aeneas repository, that is, the one containing this `README.md` file.

2. Enter the root directory:
3. To compute a synchronization map `map.json` for a pair
(`audio.mp3`, `text.txt` in `plain` format), you can run:

```bash
$ cd aeneas
$ python -m aeneas.tools.execute_task audio.mp3 text.txt "task_language=en|os_task_file_format=json|is_text_type=plain" map.json
```

3. (Optional, but strongly suggested) Compile the Python C extensions:

```bash
$ python setup.py build_ext --inplace
```
The third parameter (the _configuration string_) can specify several parameters/options.
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.

4. To compute a SMIL synchronization map `map.smil` for a pair
(`audio.mp3`, `text.txt`), you can run:
4. To compute a synchronization map `map.smil` for a pair
(`audio.mp3`, `page.xhtml` containing fragments marked by `id` attributes like `f001`),
you can run:

```bash
$ python -m aeneas.tools.execute_task audio.mp3 text.txt config_string map.smil
$ python -m aeneas.tools.execute_task audio.mp3 page.xhtml "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" map.smil
```

`config_string` is string containing all the
parameters to parse `text.txt` correctly and to
format `map.smil` as desired.
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.

5. If you have several tasks to run,
you can create a job container and a configuration file,
and run them all at once:
Expand All @@ -163,8 +168,8 @@ and run them all at once:
and format the output sync map files.
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.

You might want to run the above modules without arguments
to get their manual:
You might want to run `execute_task` or `execute_job`
without arguments to get an usage message and some examples:

```bash
$ python -m aeneas.tools.execute_task
Expand Down Expand Up @@ -202,20 +207,20 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
* Input audio file formats: all those supported by `ffmpeg`
* Batch processing
* Output sync map formats: CSV, JS, SMIL, TSV, TTML, TXT, VTT, XML
* Supported (= tested) languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, TR, UK
* Output sync map formats: CSV, JSON, SMIL, SSV, TSV, TTML, TXT, VTT, XML
* Tested languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FA, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, SW, TR, UK
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
* Code suitable for a Web app deployment (e.g., on-demand AWS instances)
* Adjustable splitting times, including a max character/second constraint for CC applications
* Automated detection of audio head/tail
* MFCC and DTW computed as Python C extensions to reduce the processing time


## Limitations and Missing Features

* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
* Audio is assumed to be spoken: not suitable/YMMV for song captioning
* DTW computation is memory hungry
* No protection against memory trashing
* No protection against memory trashing if you feed extremely long audio files


## TODO List
Expand All @@ -228,7 +233,6 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read
* Improving (removing?) dependency from `espeak`, `ffmpeg`, `ffprobe` executables
* Multilevel sync map granularity (e.g., multilevel SMIL output)
* Supporting input text encodings other than UTF-8
* Adding (i.e., testing) more languages
* Better documentation
* Testing other approaches, like HMM
* Publishing the package on PyPI
Expand Down Expand Up @@ -292,6 +296,8 @@ No copy rights were harmed in the making of this project.

* **August 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) partially sponsored the port of the MFCC/DTW code to C (v1.1.0)

* **September 2015**: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)

### Supporting

Would you like supporting the development of **aeneas**?
Expand All @@ -311,8 +317,11 @@ Feel free to [get in touch](mailto:[email protected]).

If you are able to contribute code directly,
that's great!
Feel free to open a pull request,
we will be glad to have a look at it.
Please do not work on the `master` branch.
Instead, please create a new branch,
and open a pull request from there.
I will be glad to have a look at it!
Please make your code consistent with
the existing code base style
Expand Down Expand Up @@ -366,6 +375,9 @@ and a Web application
**August 2015**: release of v1.1.0, including Python C extensions
to speed the computation of audio/text alignment up
**September 2015**: release of v1.2.0,
including code to automatically detect the audio head/tail
## Acknowledgments
Many thanks to **Nicola Montecchio**,
Expand Down
Loading

0 comments on commit ccab249

Please sign in to comment.