-
Notifications
You must be signed in to change notification settings - Fork 226
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9 from pettarin/master
Release as v1.2.0.
- Loading branch information
Showing
171 changed files
with
8,797 additions
and
2,672 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,8 +2,8 @@ | |
|
||
**aeneas** is a Python library and a set of tools to automagically synchronize audio and text. | ||
|
||
* Version: 1.1.2 | ||
* Date: 2015-09-24 | ||
* Version: 1.2.0 | ||
* Date: 2015-09-27 | ||
* Developed by: [ReadBeyond](http://www.readbeyond.it/) | ||
* Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/) | ||
* License: the GNU Affero General Public License Version 3 (AGPL v3) | ||
|
@@ -17,7 +17,7 @@ and an audio file containing the narration of the (same) text. | |
|
||
For example, given [this text file](aeneas/tests/res/container/job/assets/p001.xhtml) | ||
and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3), | ||
**aeneas** computes the following map: | ||
**aeneas** computes the following abstract map: | ||
|
||
``` | ||
[00:00:00.000, 00:00:02.680] <=> 1 | ||
|
@@ -37,28 +37,28 @@ and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3), | |
[00:00:48.000, 00:00:53.280] <=> To eat the world's due, by the grave and thee. | ||
``` | ||
|
||
Moreover, the map can be output in several formats: SMIL for EPUB 3, | ||
SRT/TTML/VTT for closed captioning, JS for Web usage, | ||
The map can be output to file in several formats: SMIL for EPUB 3, | ||
SRT/TTML/VTT for closed captioning, JSON/RBSE for Web usage, | ||
or raw CSV/SSV/TSV/TXT/XML for further processing. | ||
|
||
|
||
## System Requirements, Supported Platforms and Installation | ||
|
||
### System Requirements | ||
|
||
1. 2 GB RAM (4 GB recommended), 2 GHz CPU (3 GHz 64bit recommended) | ||
2. `ffmpeg` and `ffprobe` executable available in your `$PATH` (`apt-get install ffmpeg*` from [`deb-multimedia`](http://www.deb-multimedia.org/)) | ||
3. `espeak` executable available in your `$PATH` (`apt-get install espeak*`) | ||
1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU) | ||
2. `ffmpeg` and `ffprobe` executables available in your `$PATH` | ||
3. `espeak` executable available in your `$PATH` | ||
4. Python 2.7.x | ||
5. Python optional modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab` (`pip install ...`) | ||
6. (Optional but strongly suggested) Python C headers to compile the Python C extensions (`apt-get install python-dev`) | ||
5. Python modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab` | ||
6. (Optional but strongly suggested) Python C headers to compile the Python C extensions | ||
|
||
Depending on the format(s) of audio files you work with, | ||
you might need to install additional audio codecs for `ffmpeg`. | ||
Similarly, you might need to install additional voices | ||
for `espeak`, depending on the language(s) you work on. | ||
(Installing _all_ the codecs and _all_ the voices available | ||
in the Debian repository might be a good idea.) | ||
might be a good idea.) | ||
|
||
If installing the above dependencies proves difficult on your OS, | ||
consider using the [Vagrant box](http://www.vagrantup.com) | ||
|
@@ -68,87 +68,92 @@ created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant). | |
|
||
**aeneas** has been developed and tested on **Debian 64bit**, | ||
which is the **only supported OS** at the moment. | ||
Other Linux distributions should be good too. | ||
|
||
However, it should work on Mac OS X and Windows as well, | ||
once you make sure `ffmpeg`, `ffprobe` and `espeak` | ||
However, **aeneas** has been confirmed to work | ||
on other Linux distributions (Ubuntu, Slackware), | ||
on Mac OS X (with developer tools installed) and on Windows Vista/7/8.1/10. | ||
|
||
Whatever your OS is, make sure | ||
`ffmpeg`, `ffprobe` (which is part of `ffmpeg` distribution), and `espeak` | ||
are properly installed and | ||
callable by the `subprocess` Python module. | ||
A way to ensure the latter consists | ||
in adding the three executables to your `$PATH`. | ||
Alternatively, you can use VirtualBox | ||
in adding these three executables to your `$PATH`. | ||
|
||
If installing **aeneas** natively on your OS proves difficult, | ||
you can use VirtualBox and [Vagrant](http://www.vagrantup.com) | ||
to run **aeneas** inside a virtualized Debian image, | ||
for example using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant). | ||
using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant). | ||
|
||
### Installation | ||
|
||
```bash | ||
$ git clone https://github.com/readbeyond/aeneas.git | ||
$ cd aeneas | ||
$ pip install -r requirements.txt | ||
$ python setup.py build_ext --inplace | ||
$ python check_dependencies.py | ||
``` | ||
#### Linux and Mac OS X | ||
|
||
If the last command prints a success message, | ||
you have all the required dependencies installed | ||
and you can confidently run **aeneas** in production. | ||
|
||
If you are a user of a `deb`-based Linux distribution | ||
(e.g., Debian, Ubuntu), | ||
1. If you are a user of a `deb`-based Linux distribution | ||
(e.g., Debian or Ubuntu), | ||
you can install all the dependencies by running | ||
[the provided `install_dependencies.sh` script](install_dependencies.sh) | ||
|
||
```bash | ||
$ sudo bash install_dependencies.sh | ||
``` | ||
```bash | ||
$ sudo bash install_dependencies.sh | ||
``` | ||
|
||
2. If you have another Linux distribution or Mac OS X, | ||
just make sure you have | ||
`ffmpeg`, `ffprobe` (part of the `ffmpeg` package), | ||
and `espeak` installed and available on your command line. | ||
You also need Python 2.x and its "developer" package | ||
containing the C headers. | ||
|
||
3. Run the following commands: | ||
|
||
```bash | ||
$ git clone https://github.com/readbeyond/aeneas.git | ||
$ cd aeneas | ||
$ pip install -r requirements.txt | ||
$ python setup.py build_ext --inplace | ||
$ python check_dependencies.py | ||
``` | ||
|
||
Then, run `python setup.py build_ext --inplace` and `python check_dependencies.py` as above. | ||
If the last command prints a success message, | ||
you have all the required dependencies installed | ||
and you can confidently run **aeneas** in production. | ||
|
||
If you are a Windows user, please read the installation instructions | ||
#### Windows | ||
|
||
Please read the installation instructions | ||
contained in the | ||
["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/) | ||
["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/), | ||
based on | ||
[these directions](https://groups.google.com/d/msg/aeneas-forced-alignment/p9cb1FA0X0I/8phzUgIqBAAJ), | ||
written by Richard Margetts. | ||
|
||
If installing natively proves difficult on your OS, | ||
consider using the [Vagrant box](http://www.vagrantup.com) | ||
created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant). | ||
|
||
|
||
## Usage | ||
|
||
1. Clone this GitHub repo: | ||
1. Install `aeneas` as described above. (Only the first time!) | ||
|
||
```bash | ||
$ git clone https://github.com/readbeyond/aeneas.git | ||
``` | ||
2. Open a command prompt/shell/terminal and go to the root directory | ||
of the aeneas repository, that is, the one containing this `README.md` file. | ||
|
||
2. Enter the root directory: | ||
3. To compute a synchronization map `map.json` for a pair | ||
(`audio.mp3`, `text.txt` in `plain` format), you can run: | ||
|
||
```bash | ||
$ cd aeneas | ||
$ python -m aeneas.tools.execute_task audio.mp3 text.txt "task_language=en|os_task_file_format=json|is_text_type=plain" map.json | ||
``` | ||
|
||
3. (Optional, but strongly suggested) Compile the Python C extensions: | ||
|
||
```bash | ||
$ python setup.py build_ext --inplace | ||
``` | ||
The third parameter (the _configuration string_) can specify several parameters/options. | ||
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. | ||
|
||
4. To compute a SMIL synchronization map `map.smil` for a pair | ||
(`audio.mp3`, `text.txt`), you can run: | ||
4. To compute a synchronization map `map.smil` for a pair | ||
(`audio.mp3`, `page.xhtml` containing fragments marked by `id` attributes like `f001`), | ||
you can run: | ||
|
||
```bash | ||
$ python -m aeneas.tools.execute_task audio.mp3 text.txt config_string map.smil | ||
$ python -m aeneas.tools.execute_task audio.mp3 page.xhtml "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" map.smil | ||
``` | ||
|
||
`config_string` is string containing all the | ||
parameters to parse `text.txt` correctly and to | ||
format `map.smil` as desired. | ||
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. | ||
|
||
5. If you have several tasks to run, | ||
you can create a job container and a configuration file, | ||
and run them all at once: | ||
|
@@ -163,8 +168,8 @@ and run them all at once: | |
and format the output sync map files. | ||
See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details. | ||
|
||
You might want to run the above modules without arguments | ||
to get their manual: | ||
You might want to run `execute_task` or `execute_job` | ||
without arguments to get an usage message and some examples: | ||
|
||
```bash | ||
$ python -m aeneas.tools.execute_task | ||
|
@@ -202,20 +207,20 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read | |
* Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.) | ||
* Input audio file formats: all those supported by `ffmpeg` | ||
* Batch processing | ||
* Output sync map formats: CSV, JS, SMIL, TSV, TTML, TXT, VTT, XML | ||
* Supported (= tested) languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, TR, UK | ||
* Output sync map formats: CSV, JSON, SMIL, SSV, TSV, TTML, TXT, VTT, XML | ||
* Tested languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FA, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, SW, TR, UK | ||
* Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes | ||
* Code suitable for a Web app deployment (e.g., on-demand AWS instances) | ||
* Adjustable splitting times, including a max character/second constraint for CC applications | ||
* Automated detection of audio head/tail | ||
* MFCC and DTW computed as Python C extensions to reduce the processing time | ||
|
||
|
||
## Limitations and Missing Features | ||
|
||
* Audio should match the text: large portions of spurious text or audio might produce a wrong sync map | ||
* Audio is assumed to be spoken: not suitable/YMMV for song captioning | ||
* DTW computation is memory hungry | ||
* No protection against memory trashing | ||
* No protection against memory trashing if you feed extremely long audio files | ||
|
||
|
||
## TODO List | ||
|
@@ -228,7 +233,6 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read | |
* Improving (removing?) dependency from `espeak`, `ffmpeg`, `ffprobe` executables | ||
* Multilevel sync map granularity (e.g., multilevel SMIL output) | ||
* Supporting input text encodings other than UTF-8 | ||
* Adding (i.e., testing) more languages | ||
* Better documentation | ||
* Testing other approaches, like HMM | ||
* Publishing the package on PyPI | ||
|
@@ -292,6 +296,8 @@ No copy rights were harmed in the making of this project. | |
|
||
* **August 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) partially sponsored the port of the MFCC/DTW code to C (v1.1.0) | ||
|
||
* **September 2015**: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0) | ||
|
||
### Supporting | ||
|
||
Would you like supporting the development of **aeneas**? | ||
|
@@ -311,8 +317,11 @@ Feel free to [get in touch](mailto:[email protected]). | |
|
||
If you are able to contribute code directly, | ||
that's great! | ||
Feel free to open a pull request, | ||
we will be glad to have a look at it. | ||
Please do not work on the `master` branch. | ||
Instead, please create a new branch, | ||
and open a pull request from there. | ||
I will be glad to have a look at it! | ||
Please make your code consistent with | ||
the existing code base style | ||
|
@@ -366,6 +375,9 @@ and a Web application | |
**August 2015**: release of v1.1.0, including Python C extensions | ||
to speed the computation of audio/text alignment up | ||
**September 2015**: release of v1.2.0, | ||
including code to automatically detect the audio head/tail | ||
## Acknowledgments | ||
Many thanks to **Nicola Montecchio**, | ||
|
Oops, something went wrong.