From d029458c7021e5a0a4fd7a43d2258f28c75f243f Mon Sep 17 00:00:00 2001 From: Daniel Wolf Date: Tue, 1 Jan 2019 23:11:19 +0100 Subject: [PATCH] Document phonetic recognizer --- CHANGELOG.md | 4 ++++ README.adoc | 18 ++++++++++++++++++ 2 files changed, 22 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 516cbe2..c466bc4 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,9 @@ # Version history +## Unreleased + +* **Added** basic support for non-English recordings through phonetic recognition ([issue #45](https://github.com/DanielSWolf/rhubarb-lip-sync/issues/45)). + ## Version 1.8.0 * **Added** support for Ogg Vorbis (.ogg) file format ([issue #40](https://github.com/DanielSWolf/rhubarb-lip-sync/issues/40)). diff --git a/README.adoc b/README.adoc index fb08c03..e0f4646 100644 --- a/README.adoc +++ b/README.adoc @@ -123,6 +123,11 @@ The following command-line options are the most common: | __ | The audio file to be analyzed. This must be the last command-line argument. Supported file formats are WAVE (.wav) and Ogg Vorbis (.ogg). +| `-r` __, `--recognizer` __ +| Specifies how Rhubarb Lip Sync recognizes speech within the recording. Options: `pocketSphinx` (use for English recordings), `phonetic` (use for non-English recordings). For details, see <>. + +_Default value: ``pocketSphinx``_ + | `-f` __, `--exportFormat` __ | The export format. Options: `tsv` (tab-separated values, see <>), `xml` (see <>), `json` (see <>). @@ -192,6 +197,19 @@ Note that for short audio files, Rhubarb Lip Sync may choose to use fewer thread _Default value: as many threads as your CPU has cores_ |=== +[[recognizers]] +== Recognizers + +The first step in processing an audio file is determining what is being said. More specifically, Rhubarb Lip Sync uses speech recognition to figure out what sound is being said at what point in time. You can choose between two recognizers: + +=== PocketSphinx + +PocketSphinx is an open-source speech recognition library that generally gives good results. This is the default recognizer. The downside is that PocketSphinx only recognizes English dialog. So if your recordings are in a language other than English, this is not a good choice. + +=== Phonetic + +Rhubarb Lip Sync also comes with a phonetic recognizer. _Phonetic_ means that this recognizer won't try to understand entire (English) words and phrases. Instead, it will recognize individual sounds and syllables. The results are usually less precise than those from the PocketSphinx recognizer. The advantage is that this recognizer is language-independent. Use it if your recordings are not in English. + [[outputFormats]] == Output formats