Merge pull request #9 from pettarin/master

Release as v1.2.0.
readbeyond · Sep 27, 2015 · ccab249 · ccab249
2 parents 47da67b + 18d9dfb
commit ccab249
Show file tree

Hide file tree

Showing 171 changed files with 8,797 additions and 2,672 deletions.
diff --git a/README.md b/README.md
@@ -2,8 +2,8 @@
 
 **aeneas** is a Python library and a set of tools to automagically synchronize audio and text.
 
-* Version: 1.1.2
-* Date: 2015-09-24
+* Version: 1.2.0
+* Date: 2015-09-27
 * Developed by: [ReadBeyond](http://www.readbeyond.it/)
 * Lead Developer: [Alberto Pettarin](http://www.albertopettarin.it/)
 * License: the GNU Affero General Public License Version 3 (AGPL v3)
@@ -17,7 +17,7 @@ and an audio file containing the narration of the (same) text.
 
 For example, given [this text file](aeneas/tests/res/container/job/assets/p001.xhtml)
 and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3),
-**aeneas** computes the following map:
+**aeneas** computes the following abstract map:
 
 ```
 [00:00:00.000, 00:00:02.680] <=> 1                                                      
@@ -37,28 +37,28 @@ and [this audio file](aeneas/tests/res/container/job/assets/p001.mp3),
 [00:00:48.000, 00:00:53.280] <=> To eat the world's due, by the grave and thee.  
 ```
 
-Moreover, the map can be output in several formats: SMIL for EPUB 3,
-SRT/TTML/VTT for closed captioning, JS for Web usage,
+The map can be output to file in several formats: SMIL for EPUB 3,
+SRT/TTML/VTT for closed captioning, JSON/RBSE for Web usage,
 or raw CSV/SSV/TSV/TXT/XML for further processing.
 
 
 ## System Requirements, Supported Platforms and Installation
 
 ### System Requirements
 
-1. 2 GB RAM (4 GB recommended), 2 GHz CPU (3 GHz 64bit recommended)
-2. `ffmpeg` and `ffprobe` executable available in your `$PATH` (`apt-get install ffmpeg*` from [`deb-multimedia`](http://www.deb-multimedia.org/))
-3. `espeak` executable available in your `$PATH` (`apt-get install espeak*`)
+1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
+2. `ffmpeg` and `ffprobe` executables available in your `$PATH`
+3. `espeak` executable available in your `$PATH`
 4. Python 2.7.x
-5. Python optional modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab` (`pip install ...`)
-6. (Optional but strongly suggested) Python C headers to compile the Python C extensions (`apt-get install python-dev`)
+5. Python modules `BeautifulSoup`, `lxml`, `numpy`, and `scikits.audiolab`
+6. (Optional but strongly suggested) Python C headers to compile the Python C extensions
 
 Depending on the format(s) of audio files you work with,
 you might need to install additional audio codecs for `ffmpeg`.
 Similarly, you might need to install additional voices
 for `espeak`, depending on the language(s) you work on.
 (Installing _all_ the codecs and _all_ the voices available
-in the Debian repository might be a good idea.)
+might be a good idea.)
 
 If installing the above dependencies proves difficult on your OS,
 consider using the [Vagrant box](http://www.vagrantup.com)
@@ -68,87 +68,92 @@ created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).
 
 **aeneas** has been developed and tested on **Debian 64bit**,
 which is the **only supported OS** at the moment.
-Other Linux distributions should be good too.
 
-However, it should work on Mac OS X and Windows as well,
-once you make sure `ffmpeg`, `ffprobe` and `espeak`
+However, **aeneas** has been confirmed to work
+on other Linux distributions (Ubuntu, Slackware),
+on Mac OS X (with developer tools installed) and on Windows Vista/7/8.1/10.
+
+Whatever your OS is, make sure
+`ffmpeg`, `ffprobe` (which is part of `ffmpeg` distribution), and `espeak`
 are properly installed and
 callable by the `subprocess` Python module.
 A way to ensure the latter consists
-in adding the three executables to your `$PATH`.
-Alternatively, you can use VirtualBox
+in adding these three executables to your `$PATH`.
+
+If installing **aeneas** natively on your OS proves difficult,
+you can use VirtualBox and [Vagrant](http://www.vagrantup.com)
 to run **aeneas** inside a virtualized Debian image,
-for example using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).
+using [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).
 
 ### Installation
 
-```bash
-$ git clone https://github.com/readbeyond/aeneas.git
-$ cd aeneas
-$ pip install -r requirements.txt
-$ python setup.py build_ext --inplace
-$ python check_dependencies.py
-```
+#### Linux and Mac OS X
 
-If the last command prints a success message,
-you have all the required dependencies installed
-and you can confidently run **aeneas** in production.
-
-If you are a user of a `deb`-based Linux distribution
-(e.g., Debian, Ubuntu),
+1. If you are a user of a `deb`-based Linux distribution
+(e.g., Debian or Ubuntu),
 you can install all the dependencies by running
 [the provided `install_dependencies.sh` script](install_dependencies.sh)
 
-```bash
-$ sudo bash install_dependencies.sh
-```
+    ```bash
+    $ sudo bash install_dependencies.sh
+    ```
+
+2. If you have another Linux distribution or Mac OS X,
+just make sure you have
+`ffmpeg`, `ffprobe` (part of the `ffmpeg` package),
+and `espeak` installed and available on your command line.
+You also need Python 2.x and its "developer" package
+containing the C headers.
+
+3. Run the following commands:
+
+    ```bash
+    $ git clone https://github.com/readbeyond/aeneas.git
+    $ cd aeneas
+    $ pip install -r requirements.txt
+    $ python setup.py build_ext --inplace
+    $ python check_dependencies.py
+    ```
 
-Then, run `python setup.py build_ext --inplace` and `python check_dependencies.py` as above.
+If the last command prints a success message,
+you have all the required dependencies installed
+and you can confidently run **aeneas** in production.
 
-If you are a Windows user, please read the installation instructions
+#### Windows
+
+Please read the installation instructions
 contained in the
-["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/)
+["Using aeneas for Audio-Text Synchronization" PDF](http://software.sil.org/scriptureappbuilder/resources/),
 based on
 [these directions](https://groups.google.com/d/msg/aeneas-forced-alignment/p9cb1FA0X0I/8phzUgIqBAAJ),
 written by Richard Margetts.
 
-If installing natively proves difficult on your OS,
-consider using the [Vagrant box](http://www.vagrantup.com)
-created by [aeneas-vagrant](https://github.com/readbeyond/aeneas-vagrant).
-
 
 ## Usage
 
-1. Clone this GitHub repo:
+1. Install `aeneas` as described above. (Only the first time!)
 
-    ```bash
-    $ git clone https://github.com/readbeyond/aeneas.git
-    ```
+2. Open a command prompt/shell/terminal and go to the root directory
+of the aeneas repository, that is, the one containing this `README.md` file.
 
-2. Enter the root directory:
+3. To compute a synchronization map `map.json` for a pair
+(`audio.mp3`, `text.txt` in `plain` format), you can run:
 
     ```bash
-    $ cd aeneas
+    $ python -m aeneas.tools.execute_task audio.mp3 text.txt "task_language=en|os_task_file_format=json|is_text_type=plain" map.json
     ```
 
-3. (Optional, but strongly suggested) Compile the Python C extensions:
-
-    ```bash
-    $ python setup.py build_ext --inplace
-    ```
+    The third parameter (the _configuration string_) can specify several parameters/options.
+    See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.
 
-4. To compute a SMIL synchronization map `map.smil` for a pair
-(`audio.mp3`, `text.txt`), you can run:
+4. To compute a synchronization map `map.smil` for a pair
+(`audio.mp3`, `page.xhtml` containing fragments marked by `id` attributes like `f001`),
+you can run:
 
     ```bash
-    $ python -m aeneas.tools.execute_task audio.mp3 text.txt config_string map.smil 
+    $ python -m aeneas.tools.execute_task audio.mp3 page.xhtml "task_language=en|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" map.smil
     ```
 
-    `config_string` is string containing all the
-    parameters to parse `text.txt` correctly and to
-    format `map.smil` as desired.
-    See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.
-
 5. If you have several tasks to run,
 you can create a job container and a configuration file,
 and run them all at once:
@@ -163,8 +168,8 @@ and run them all at once:
     and format the output sync map files.
     See the [documentation](http://www.readbeyond.it/aeneas/docs/) for details.
 
-You might want to run the above modules without arguments
-to get their manual:
+You might want to run `execute_task` or `execute_job`
+without arguments to get an usage message and some examples:
 
 ```bash
 $ python -m aeneas.tools.execute_task
@@ -202,20 +207,20 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read
 * Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
 * Input audio file formats: all those supported by `ffmpeg`
 * Batch processing
-* Output sync map formats: CSV, JS, SMIL, TSV, TTML, TXT, VTT, XML
-* Supported (= tested) languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, TR, UK
+* Output sync map formats: CSV, JSON, SMIL, SSV, TSV, TTML, TXT, VTT, XML
+* Tested languages: BG, CA, CY, DA, DE, EL, EN, ES, ET, FA, FI, FR, GA, GRC, HR, HU, IS, IT, LA, LT, LV, NL, NO, RO, RU, PL, PT, SK, SR, SV, SW, TR, UK
 * Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
 * Code suitable for a Web app deployment (e.g., on-demand AWS instances)
 * Adjustable splitting times, including a max character/second constraint for CC applications
+* Automated detection of audio head/tail
 * MFCC and DTW computed as Python C extensions to reduce the processing time
 
 
 ## Limitations and Missing Features 
 
 * Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
 * Audio is assumed to be spoken: not suitable/YMMV for song captioning
-* DTW computation is memory hungry
-* No protection against memory trashing
+* No protection against memory trashing if you feed extremely long audio files
 
 
 ## TODO List
@@ -228,7 +233,6 @@ Changelog: [http://www.readbeyond.it/aeneas/docs/changelog.html](http://www.read
 * Improving (removing?) dependency from `espeak`, `ffmpeg`, `ffprobe` executables
 * Multilevel sync map granularity (e.g., multilevel SMIL output)
 * Supporting input text encodings other than UTF-8
-* Adding (i.e., testing) more languages
 * Better documentation
 * Testing other approaches, like HMM
 * Publishing the package on PyPI
@@ -292,6 +296,8 @@ No copy rights were harmed in the making of this project.
 
 * **August 2015**: [Michele Gianella](https://plus.google.com/+michelegianella/about) partially sponsored the port of the MFCC/DTW code to C (v1.1.0)
 
+* **September 2015**: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)
+
 ### Supporting
 
 Would you like supporting the development of **aeneas**?
@@ -311,8 +317,11 @@ Feel free to [get in touch](mailto:[email protected]).
 
 If you are able to contribute code directly,
 that's great!
-Feel free to open a pull request,
-we will be glad to have a look at it.
+
+Please do not work on the `master` branch.
+Instead, please create a new branch,
+and open a pull request from there.
+I will be glad to have a look at it!
 
 Please make your code consistent with
 the existing code base style
@@ -366,6 +375,9 @@ and a Web application
 **August 2015**: release of v1.1.0, including Python C extensions
 to speed the computation of audio/text alignment up
 
+**September 2015**: release of v1.2.0,
+including code to automatically detect the audio head/tail
+
 ## Acknowledgments
 
 Many thanks to **Nicola Montecchio**,