Commit Graph

204 Commits

Author SHA1 Message Date
Daniel Wolf c19ad1c8d0 Using biased language model to handle dialog more forgivingly
Using a fixed 0.1-0.9 ratio between default and dialog language model
2016-10-21 21:41:50 +02:00
Daniel Wolf 9cfe577612 Fixed bad config when creating language model from dialog 2016-10-21 21:17:17 +02:00
Daniel Wolf 529a32e1b2 Better animation of short pauses 2016-10-14 20:25:30 +02:00
Daniel Wolf 503ba9104a Treating schwa as a separate phone 2016-09-30 17:12:10 +02:00
Daniel Wolf 1f6f6d6175 Added convenience function Timed<T>.getDuration() 2016-09-29 12:06:47 +02:00
Daniel Wolf f5b7971f52 Refactoring: Replaced audio "length" with "duration" 2016-09-29 12:06:28 +02:00
Daniel Wolf f44baaa05f Improve noise detection heuristic 2016-09-29 12:06:06 +02:00
Daniel Wolf 760f6c2ce6 Refactoring and better logging 2016-09-29 10:44:34 +02:00
Daniel Wolf 750078618c Sharing audio buffer between operations 2016-09-26 13:11:01 +02:00
Daniel Wolf de05f69507 Fixed compiler warning 2016-09-23 21:15:55 +02:00
Daniel Wolf 2fdd98f5b3 Removed potentially unsafe conversion 2016-09-23 21:15:34 +02:00
Daniel Wolf 938079a75f Renamed phoneExtraction to phoneRecognition 2016-09-21 10:32:26 +02:00
Daniel Wolf 600b3429a7 No longer discarding "burnt" decoders
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#1d89/0491/7f0c/60fc
2016-09-21 10:28:31 +02:00
Daniel Wolf eea1eb381c Refactored ObjectPool to correctly handle custom deleters 2016-09-21 10:25:08 +02:00
Daniel Wolf d97c880754 Performing per-utterance cepstral mean normalization
See discussion in https://sourceforge.net/p/cmusphinx/discussion/help/thread/51e2979b/
2016-09-18 22:02:02 +02:00
Daniel Wolf f4f9ffe883 Logging bin path, hoping to crack that elusive segfault 2016-09-18 22:00:55 +02:00
Daniel Wolf cf13499158 Caching bin path 2016-09-18 22:00:08 +02:00
Daniel Wolf 0ab009e17a Workaround for off-by-one error in whereami library 2016-09-11 13:17:52 +02:00
Daniel Wolf 2607b9a12b Fixed Boost version check 2016-09-11 12:59:09 +02:00
Daniel Wolf c679b8fb71 Using different xml_writer_settings signature for old Boost versions 2016-09-11 11:40:17 +02:00
Daniel Wolf 261a768e0d Removed Boost.Predef since it's not available in Boost 1.54 2016-09-11 11:40:17 +02:00
Daniel Wolf d4b86357cf Using boost::optional<T>.get_value_or() instead of value_or() for old Boost versions 2016-09-11 11:40:16 +02:00
Daniel Wolf d98de34b98 Replaced calls to boost::optional<T>.value() with operator*
Boost 1.54 doesn't support value() yet, plus * is cleaner
2016-09-11 11:40:16 +02:00
Daniel Wolf 2aef178eb0 Better error messages for incompatible WAVE files 2016-09-10 21:19:12 +02:00
Daniel Wolf b95a3f621c Fixed Linux build 2016-08-31 22:21:53 +02:00
Daniel Wolf 8fd78d63cf Animating pauses only between words, not at start or end of recording 2016-08-11 16:28:04 +02:00
Daniel Wolf a632e7a3b3 Fixed TSV export
Exporter now terminates with shape X rather than A.
2016-08-11 15:49:51 +02:00
Daniel Wolf 81111ef96a Fixed infinite loop with short recordings 2016-08-11 15:45:16 +02:00
Daniel Wolf 78027ea63c Thread count can be limited via command-line argument 2016-08-11 10:29:01 +02:00
Daniel Wolf 206cde4658 Supporting noises (breathing, smacking, etc.) 2016-08-11 10:18:03 +02:00
Daniel Wolf bd1f8226ec Added TimeRange.trim() method 2016-08-11 10:16:50 +02:00
Daniel Wolf 734d06ad38 Disabling PocketSphinx's VAD
We're performing VAD ourselves
2016-08-10 20:46:32 +02:00
Daniel Wolf a851a76ce5 Minor improvements to animation rules 2016-08-10 20:13:05 +02:00
Daniel Wolf 8b025a3522 Fixed predictive mouth animation 2016-08-10 18:53:01 +02:00
Daniel Wolf 16892ae991 Fixed OS X build 2016-08-10 18:24:24 +02:00
Daniel Wolf b22378221f Better AH animation 2016-08-07 20:38:02 +02:00
Daniel Wolf c65c8b4eb3 Better animation of pauses in speech 2016-08-05 19:34:57 +02:00
Daniel Wolf 1c50ece142 Refactoring 2016-08-05 17:17:25 +02:00
Daniel Wolf b62fe8af98 Improved timing of bilabial stops ("B", "P") 2016-08-04 22:21:48 +02:00
Daniel Wolf c566ac56cc Suppressing log messages in console for non-debug builds 2016-08-04 21:02:40 +02:00
Daniel Wolf 229105a965 Fixed erratic progress display 2016-08-04 20:39:40 +02:00
Daniel Wolf 6888dadd04 Speedup through better multithreading
* Fixed excessive locking
* Using more threads for voice recognition
2016-08-04 19:39:43 +02:00
Daniel Wolf 1cb41b8309 Workaround for another kind of decoder corruption 2016-08-03 21:33:13 +02:00
Daniel Wolf 0a577d1947 Fixed audio resampling
Audio was cut off due to incorrect length calculation
2016-08-03 20:55:45 +02:00
Daniel Wolf f356855bbd Implemented tweening for smoother animation 2016-08-02 22:02:59 +02:00
Daniel Wolf 95d46ef0b7 Re-written animation code
* Still uses (almost) the same rules, but more powerful underlying concept
* Re-introduced shape H for "L" sounds
* Introduced shape X for idle position
2016-07-31 21:42:37 +02:00
Daniel Wolf 26cae93478 Refactored audio handling
Now audio clips can be passed around as const references
and don't carry state any more.
2016-07-27 21:58:37 +02:00
Daniel Wolf 799f334fa7 Using unique_ptr instead of raw pointers in object pool 2016-07-27 21:44:39 +02:00
Daniel Wolf b3b2366468 Re-written library code for parallel execution
The new implementation correctly re-throws exceptions on the calling thread
instead of terminating the application.
2016-07-27 21:44:39 +02:00
Daniel Wolf 5198ee9230 Made Lazy<T> copyable 2016-07-20 20:16:23 +02:00
Daniel Wolf 17b43ad205 Added class Lazy<T> 2016-07-19 21:33:07 +02:00
Daniel Wolf ddcadad710 Introduced user-defined literal "cs" for centiseconds
Now that ReSharper supports it (see https://youtrack.jetbrains.com/issue/RSCPP-14653)
2016-07-05 21:17:51 +02:00
Daniel Wolf 0447cbb4ff Refactored VAD multithreading 2016-06-30 20:52:29 +02:00
Daniel Wolf 8fa494fb77 Improved VAD quality via dry run 2016-06-30 20:42:36 +02:00
Daniel Wolf 6de7ba020a Fixed VAD error handling 2016-06-30 20:17:28 +02:00
Daniel Wolf ed27b8470c Workaround for PocketSphinx bug
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#7529
Also minor refactoring.
2016-06-30 20:06:38 +02:00
Daniel Wolf 2c0471e79f Improved lip animation for B/P and L sounds 2016-06-29 22:35:14 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf f13449f810 Added thread info to logging 2016-06-29 21:47:25 +02:00
Daniel Wolf 75407dab54 Augmenting each detected voice activity to give recognizer some silence samples to work with 2016-06-29 21:47:25 +02:00
Daniel Wolf 2a5ed95698 Improved animation quality through new algorithm
Using "lazy" ruleset instead of 1:1 mapping from phones
2016-06-29 21:46:08 +02:00
Daniel Wolf 8c9466bcf3 Removed mouth shape H (special shape for 'L' sound) 2016-06-26 21:06:22 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 3a0a38575f Sped up VAD via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 84097756c8 Added ThreadPool class 2016-06-26 14:02:17 +02:00
Daniel Wolf 0aeb35c42e Fixed deprecated library calls 2016-06-26 11:06:44 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf f275267ac7 Small VAD improvements
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf faa3f2b4bb Fixed overflow with long audio files 2016-06-24 21:51:17 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf b2f702c8f4 Fixed OS X build 2016-06-16 19:41:49 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 542a5ee3d8 Added join function for strings 2016-06-15 20:07:51 +02:00
Daniel Wolf 1e29151974 Fixed string conversion for Timed<void> 2016-06-14 17:36:54 +02:00
Daniel Wolf 5cc13cb16f Improved error message 2016-06-14 17:36:18 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf 4ed5908627 Implemented US-English G2P using sound change rules 2016-06-03 20:02:34 +02:00
Daniel Wolf 8be6485685 Implemented string conversion from Latin-1 to Unicode 2016-06-02 22:21:37 +02:00
Daniel Wolf 4d45bf7c89 Merged ascii.cpp into stringTools.cpp 2016-06-02 20:09:37 +02:00
Daniel Wolf 4d95b4c2c5 Implemented text tokenization using Flite 2016-06-02 18:24:27 +02:00
Daniel Wolf d4b9a8e0c6 Implemented simple conversion from Unicode string to ASCII 2016-06-02 18:24:25 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 9eef09145e Added getPairs function 2016-05-12 21:44:46 +02:00
Daniel Wolf baf2423b27 Added time manipulation functions to TimeRange and Timeline 2016-04-19 22:06:20 +02:00
Daniel Wolf 895b942df3 Implemented AudioStreamSegment 2016-04-19 22:04:43 +02:00
Daniel Wolf ce204c68de Fixed constness 2016-04-19 21:12:44 +02:00
Daniel Wolf c14fb1c7b2 Fixed output format for structured logging 2016-04-19 19:30:38 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 44d18d00f8 Added header file to CMakeLists.txt
This makes navigation easier for me. Plus, ReSharper didn't like not knowing the header files.
2016-04-14 22:14:57 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00