Fix gaps in phonetic recognition

Randomly, entire utterances yielded no phones with the phonetic recognizer. The cause was a check for empty utterances that made sense for word recognition, but not for phonetic recognition.
2019-01-18 22:29:26 +01:00 · 2019-01-18 22:29:26 +01:00 · f3d4cfbb31
parent 357cb0b65e
commit f3d4cfbb31
1 changed files with 12 additions and 3 deletions
--- a/rhubarb/src/recognition/pocketSphinxTools.cpp
+++ b/rhubarb/src/recognition/pocketSphinxTools.cpp
@ -219,9 +219,18 @@ BoundedTimeline<string> recognizeWords(const vector<int16_t>& audioBuffer, ps_de
 	BoundedTimeline<string> result(
 		TimeRange(0_cs, centiseconds(100 * audioBuffer.size() / sphinxSampleRate))
 	);
-	const bool noWordsRecognized = reinterpret_cast<ngram_search_t*>(decoder.search)->bpidx == 0;
-	if (noWordsRecognized) {
-		return result;
+	const bool phonetic = cmd_ln_boolean_r(decoder.config, "-allphone_ci");
+	if (!phonetic) {
+		// If the decoder is in word mode (as opposed to phonetic recognition), it expects each
+		// utterance to contain speech. If it doesn't, ps_seg_word() logs the annoying error
+		// "Couldn't find <s> in first frame".
+		// Not every utterance does contain speech, however. In this case, we exit early to prevent
+		// the log output.
+		// We *don't* to that in phonetic mode because here, the same code would omit valid phones.
+		const bool noWordsRecognized = reinterpret_cast<ngram_search_t*>(decoder.search)->bpidx == 0;
+		if (noWordsRecognized) {
+			return result;
+		}
 	}

 	// Collect words