Commit Graph

139 Commits

Author SHA1 Message Date
Daniel Wolf 8c9466bcf3 Removed mouth shape H (special shape for 'L' sound) 2016-06-26 21:06:22 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 3a0a38575f Sped up VAD via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 84097756c8 Added ThreadPool class 2016-06-26 14:02:17 +02:00
Daniel Wolf 0aeb35c42e Fixed deprecated library calls 2016-06-26 11:06:44 +02:00
Daniel Wolf 96b0ad9b1d Switched to better acoustic model 2016-06-25 22:07:28 +02:00
Daniel Wolf da78375a10 Added CMU Sphinx US English acoustic model 2016-06-25 22:00:47 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf 8502256241 Updated LICENSE.md 2016-06-25 21:51:06 +02:00
Daniel Wolf f275267ac7 Small VAD improvements
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf faa3f2b4bb Fixed overflow with long audio files 2016-06-24 21:51:17 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf aec3dbae01 Added WebRTC library 2016-06-21 22:13:05 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf 478766ff6e Updated CMU SphinxBase and PocketSphinx 2016-06-19 20:53:24 +02:00
Daniel Wolf b2f702c8f4 Fixed OS X build 2016-06-16 19:41:49 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 542a5ee3d8 Added join function for strings 2016-06-15 20:07:51 +02:00
Daniel Wolf 1e29151974 Fixed string conversion for Timed<void> 2016-06-14 17:36:54 +02:00
Daniel Wolf 5cc13cb16f Improved error message 2016-06-14 17:36:18 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf 4ed5908627 Implemented US-English G2P using sound change rules 2016-06-03 20:02:34 +02:00
Daniel Wolf 7a763e8755 Fixed syntax error in sound change data 2016-06-03 20:00:46 +02:00
Daniel Wolf bf19d267ee Added sound change code and data 2016-06-03 10:37:47 +02:00
Daniel Wolf 8be6485685 Implemented string conversion from Latin-1 to Unicode 2016-06-02 22:21:37 +02:00
Daniel Wolf 4d45bf7c89 Merged ascii.cpp into stringTools.cpp 2016-06-02 20:09:37 +02:00
Daniel Wolf 4d95b4c2c5 Implemented text tokenization using Flite 2016-06-02 18:24:27 +02:00
Daniel Wolf 8d1c618cec Patched Flite to prevent name collision with PocketSphinx 2016-06-02 18:24:27 +02:00
Daniel Wolf 942cabd773 Added Flite as library 2016-06-02 18:24:26 +02:00
Daniel Wolf 9f4ebd23e3 Added Flite 1.4 code
I'm not using version 2.0 because that version makes it almost impossible
to create a slim build without compiling all the voice synth code (which
we don't need).
2016-06-02 18:24:26 +02:00
Daniel Wolf d4b9a8e0c6 Implemented simple conversion from Unicode string to ASCII 2016-06-02 18:24:25 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 9eef09145e Added getPairs function 2016-05-12 21:44:46 +02:00
Daniel Wolf baf2423b27 Added time manipulation functions to TimeRange and Timeline 2016-04-19 22:06:20 +02:00
Daniel Wolf 895b942df3 Implemented AudioStreamSegment 2016-04-19 22:04:43 +02:00
Daniel Wolf ce204c68de Fixed constness 2016-04-19 21:12:44 +02:00
Daniel Wolf c14fb1c7b2 Fixed output format for structured logging 2016-04-19 19:30:38 +02:00
Daniel Wolf 560281807e Version 0.2.0 2016-04-17 20:22:17 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 44d18d00f8 Added header file to CMakeLists.txt
This makes navigation easier for me. Plus, ReSharper didn't like not knowing the header files.
2016-04-14 22:14:57 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00