Document phonetic recognizer

This commit is contained in:
Daniel Wolf 2019-01-01 23:11:19 +01:00
parent bfc98a1c81
commit d029458c70
2 changed files with 22 additions and 0 deletions

View File

@ -1,5 +1,9 @@
# Version history # Version history
## Unreleased
* **Added** basic support for non-English recordings through phonetic recognition ([issue #45](https://github.com/DanielSWolf/rhubarb-lip-sync/issues/45)).
## Version 1.8.0 ## Version 1.8.0
* **Added** support for Ogg Vorbis (.ogg) file format ([issue #40](https://github.com/DanielSWolf/rhubarb-lip-sync/issues/40)). * **Added** support for Ogg Vorbis (.ogg) file format ([issue #40](https://github.com/DanielSWolf/rhubarb-lip-sync/issues/40)).

View File

@ -123,6 +123,11 @@ The following command-line options are the most common:
| _<input file>_ | _<input file>_
| The audio file to be analyzed. This must be the last command-line argument. Supported file formats are WAVE (.wav) and Ogg Vorbis (.ogg). | The audio file to be analyzed. This must be the last command-line argument. Supported file formats are WAVE (.wav) and Ogg Vorbis (.ogg).
| `-r` _<recognizer>_, `--recognizer` _<recognizer>_
| Specifies how Rhubarb Lip Sync recognizes speech within the recording. Options: `pocketSphinx` (use for English recordings), `phonetic` (use for non-English recordings). For details, see <<recognizers>>.
_Default value: ``pocketSphinx``_
| `-f` _<format>_, `--exportFormat` _<format>_ | `-f` _<format>_, `--exportFormat` _<format>_
| The export format. Options: `tsv` (tab-separated values, see <<tsv,details>>), `xml` (see <<xml,details>>), `json` (see <<json,details>>). | The export format. Options: `tsv` (tab-separated values, see <<tsv,details>>), `xml` (see <<xml,details>>), `json` (see <<json,details>>).
@ -192,6 +197,19 @@ Note that for short audio files, Rhubarb Lip Sync may choose to use fewer thread
_Default value: as many threads as your CPU has cores_ _Default value: as many threads as your CPU has cores_
|=== |===
[[recognizers]]
== Recognizers
The first step in processing an audio file is determining what is being said. More specifically, Rhubarb Lip Sync uses speech recognition to figure out what sound is being said at what point in time. You can choose between two recognizers:
=== PocketSphinx
PocketSphinx is an open-source speech recognition library that generally gives good results. This is the default recognizer. The downside is that PocketSphinx only recognizes English dialog. So if your recordings are in a language other than English, this is not a good choice.
=== Phonetic
Rhubarb Lip Sync also comes with a phonetic recognizer. _Phonetic_ means that this recognizer won't try to understand entire (English) words and phrases. Instead, it will recognize individual sounds and syllables. The results are usually less precise than those from the PocketSphinx recognizer. The advantage is that this recognizer is language-independent. Use it if your recordings are not in English.
[[outputFormats]] [[outputFormats]]
== Output formats == Output formats