Commit Graph

43 Commits

Author SHA1 Message Date
Daniel Wolf 81111ef96a Fixed infinite loop with short recordings 2016-08-11 15:45:16 +02:00
Daniel Wolf 78027ea63c Thread count can be limited via command-line argument 2016-08-11 10:29:01 +02:00
Daniel Wolf 206cde4658 Supporting noises (breathing, smacking, etc.) 2016-08-11 10:18:03 +02:00
Daniel Wolf 734d06ad38 Disabling PocketSphinx's VAD
We're performing VAD ourselves
2016-08-10 20:46:32 +02:00
Daniel Wolf 6888dadd04 Speedup through better multithreading
* Fixed excessive locking
* Using more threads for voice recognition
2016-08-04 19:39:43 +02:00
Daniel Wolf 1cb41b8309 Workaround for another kind of decoder corruption 2016-08-03 21:33:13 +02:00
Daniel Wolf 26cae93478 Refactored audio handling
Now audio clips can be passed around as const references
and don't carry state any more.
2016-07-27 21:58:37 +02:00
Daniel Wolf b3b2366468 Re-written library code for parallel execution
The new implementation correctly re-throws exceptions on the calling thread
instead of terminating the application.
2016-07-27 21:44:39 +02:00
Daniel Wolf ed27b8470c Workaround for PocketSphinx bug
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#7529
Also minor refactoring.
2016-06-30 20:06:38 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00
Daniel Wolf 90e1375f1b Handling zero-length audio files 2016-04-12 20:45:47 +02:00
Daniel Wolf 04c828506d Simplified code using Timeline<T> 2016-04-09 22:07:25 +02:00
Daniel Wolf a8900f80ec Removing DC offset from audio
Also a bit of refactoring regarding audio processing
2016-03-16 21:01:43 +01:00
Daniel Wolf 35ec1f8a45 Introduced template functions to unify enum<->string conversions 2016-03-08 22:20:40 +01:00
Daniel Wolf ad9d8e6567 Renamed `audioInput` directory to `audio` 2016-03-08 18:21:17 +01:00
Daniel Wolf b78e418a8f Refactored audio streams
* All streams are now mono (simplifies reasoning about samples)
* Streams can be cloned
* Streams can be seeked within
2016-03-07 21:28:31 +01:00
Daniel Wolf 04ca644cca Added structured logging 2016-03-03 22:31:16 +01:00
Daniel Wolf cdffb56613 Redirecting pocketsphinx log to main log 2016-03-03 22:31:16 +01:00
Daniel Wolf 7a1f446ca3 Using GSL 2016-02-29 20:58:58 +01:00
Daniel Wolf 667edf9485 Improved dialog handling 2016-02-10 21:53:58 +01:00
Daniel Wolf 05ef692706 Added (primitive) option to explicitly supply the dialog 2016-02-09 22:08:11 +01:00
Daniel Wolf 75872fe45d Using -dither to prevent recognition errors in connection with zero silence 2016-02-01 20:26:14 +01:00
Daniel Wolf 7aa6057b8e Allowing for long pauses in speech without breaking sync 2016-01-28 21:52:50 +01:00
Daniel Wolf c425885929 Showing combined progress for entire task 2016-01-28 19:13:40 +01:00
Daniel Wolf 8e7fcc4efe Implemented two-step phone detection for better accuracy 2016-01-28 14:19:32 +01:00
Daniel Wolf 2bfe671f82 Simplified directory structure to make Visual Studio build work 2016-01-08 16:59:18 +01:00
Daniel Wolf 0f33fcfbd0 Removing zero silence, seems like Sphinx doesn't like it
See http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
I couldn't reproduce the original problem, but it doesn't seem to hurt, either.
2016-01-08 16:44:03 +01:00
Daniel Wolf 31cb3b195c Showing progress bar 2016-01-08 10:53:35 +01:00
Daniel Wolf 5c0fe24fae Refactoring: Using camelCase throughout 2016-01-06 20:47:37 +01:00
Renamed from src/phone_extraction.cpp (Browse further)