Commit Graph

397 Commits

Author SHA1 Message Date
Daniel Wolf 8fa494fb77 Improved VAD quality via dry run 2016-06-30 20:42:36 +02:00
Daniel Wolf 6de7ba020a Fixed VAD error handling 2016-06-30 20:17:28 +02:00
Daniel Wolf ed27b8470c Workaround for PocketSphinx bug
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#7529
Also minor refactoring.
2016-06-30 20:06:38 +02:00
Daniel Wolf 2c0471e79f Improved lip animation for B/P and L sounds 2016-06-29 22:35:14 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf f13449f810 Added thread info to logging 2016-06-29 21:47:25 +02:00
Daniel Wolf 75407dab54 Augmenting each detected voice activity to give recognizer some silence samples to work with 2016-06-29 21:47:25 +02:00
Daniel Wolf 2a5ed95698 Improved animation quality through new algorithm
Using "lazy" ruleset instead of 1:1 mapping from phones
2016-06-29 21:46:08 +02:00
Daniel Wolf 8c9466bcf3 Removed mouth shape H (special shape for 'L' sound) 2016-06-26 21:06:22 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 3a0a38575f Sped up VAD via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 84097756c8 Added ThreadPool class 2016-06-26 14:02:17 +02:00
Daniel Wolf 0aeb35c42e Fixed deprecated library calls 2016-06-26 11:06:44 +02:00
Daniel Wolf 96b0ad9b1d Switched to better acoustic model 2016-06-25 22:07:28 +02:00
Daniel Wolf da78375a10 Added CMU Sphinx US English acoustic model 2016-06-25 22:00:47 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf 8502256241 Updated LICENSE.md 2016-06-25 21:51:06 +02:00
Daniel Wolf f275267ac7 Small VAD improvements
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf faa3f2b4bb Fixed overflow with long audio files 2016-06-24 21:51:17 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf aec3dbae01 Added WebRTC library 2016-06-21 22:13:05 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf 478766ff6e Updated CMU SphinxBase and PocketSphinx 2016-06-19 20:53:24 +02:00
Daniel Wolf b2f702c8f4 Fixed OS X build 2016-06-16 19:41:49 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 542a5ee3d8 Added join function for strings 2016-06-15 20:07:51 +02:00
Daniel Wolf 1e29151974 Fixed string conversion for Timed<void> 2016-06-14 17:36:54 +02:00
Daniel Wolf 5cc13cb16f Improved error message 2016-06-14 17:36:18 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf 4ed5908627 Implemented US-English G2P using sound change rules 2016-06-03 20:02:34 +02:00
Daniel Wolf 7a763e8755 Fixed syntax error in sound change data 2016-06-03 20:00:46 +02:00
Daniel Wolf bf19d267ee Added sound change code and data 2016-06-03 10:37:47 +02:00
Daniel Wolf 8be6485685 Implemented string conversion from Latin-1 to Unicode 2016-06-02 22:21:37 +02:00
Daniel Wolf 4d45bf7c89 Merged ascii.cpp into stringTools.cpp 2016-06-02 20:09:37 +02:00
Daniel Wolf 4d95b4c2c5 Implemented text tokenization using Flite 2016-06-02 18:24:27 +02:00
Daniel Wolf 8d1c618cec Patched Flite to prevent name collision with PocketSphinx 2016-06-02 18:24:27 +02:00
Daniel Wolf 942cabd773 Added Flite as library 2016-06-02 18:24:26 +02:00
Daniel Wolf 9f4ebd23e3 Added Flite 1.4 code
I'm not using version 2.0 because that version makes it almost impossible
to create a slim build without compiling all the voice synth code (which
we don't need).
2016-06-02 18:24:26 +02:00
Daniel Wolf d4b9a8e0c6 Implemented simple conversion from Unicode string to ASCII 2016-06-02 18:24:25 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 9eef09145e Added getPairs function 2016-05-12 21:44:46 +02:00