Commit Graph

37 Commits

Author SHA1 Message Date
Daniel Wolf 26cae93478 Refactored audio handling
Now audio clips can be passed around as const references
and don't carry state any more.
2016-07-27 21:58:37 +02:00
Daniel Wolf b3b2366468 Re-written library code for parallel execution
The new implementation correctly re-throws exceptions on the calling thread
instead of terminating the application.
2016-07-27 21:44:39 +02:00
Daniel Wolf ed27b8470c Workaround for PocketSphinx bug
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#7529
Also minor refactoring.
2016-06-30 20:06:38 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00
Daniel Wolf 90e1375f1b Handling zero-length audio files 2016-04-12 20:45:47 +02:00
Daniel Wolf 04c828506d Simplified code using Timeline<T> 2016-04-09 22:07:25 +02:00
Daniel Wolf a8900f80ec Removing DC offset from audio
Also a bit of refactoring regarding audio processing
2016-03-16 21:01:43 +01:00
Daniel Wolf 35ec1f8a45 Introduced template functions to unify enum<->string conversions 2016-03-08 22:20:40 +01:00
Daniel Wolf ad9d8e6567 Renamed `audioInput` directory to `audio` 2016-03-08 18:21:17 +01:00
Daniel Wolf b78e418a8f Refactored audio streams
* All streams are now mono (simplifies reasoning about samples)
* Streams can be cloned
* Streams can be seeked within
2016-03-07 21:28:31 +01:00
Daniel Wolf 04ca644cca Added structured logging 2016-03-03 22:31:16 +01:00
Daniel Wolf cdffb56613 Redirecting pocketsphinx log to main log 2016-03-03 22:31:16 +01:00
Daniel Wolf 7a1f446ca3 Using GSL 2016-02-29 20:58:58 +01:00
Daniel Wolf 667edf9485 Improved dialog handling 2016-02-10 21:53:58 +01:00
Daniel Wolf 05ef692706 Added (primitive) option to explicitly supply the dialog 2016-02-09 22:08:11 +01:00
Daniel Wolf 75872fe45d Using -dither to prevent recognition errors in connection with zero silence 2016-02-01 20:26:14 +01:00
Daniel Wolf 7aa6057b8e Allowing for long pauses in speech without breaking sync 2016-01-28 21:52:50 +01:00
Daniel Wolf c425885929 Showing combined progress for entire task 2016-01-28 19:13:40 +01:00
Daniel Wolf 8e7fcc4efe Implemented two-step phone detection for better accuracy 2016-01-28 14:19:32 +01:00
Daniel Wolf 2bfe671f82 Simplified directory structure to make Visual Studio build work 2016-01-08 16:59:18 +01:00
Daniel Wolf 0f33fcfbd0 Removing zero silence, seems like Sphinx doesn't like it
See http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
I couldn't reproduce the original problem, but it doesn't seem to hurt, either.
2016-01-08 16:44:03 +01:00
Daniel Wolf 31cb3b195c Showing progress bar 2016-01-08 10:53:35 +01:00
Daniel Wolf 5c0fe24fae Refactoring: Using camelCase throughout 2016-01-06 20:47:37 +01:00
Renamed from src/phone_extraction.cpp (Browse further)