Commit Graph

48 Commits

Author SHA1 Message Date
Daniel Wolf 8e1d1fbdd3 Unified acronym capitalization
See http://stackoverflow.com/a/27172000/52041
2016-11-16 11:56:52 +01:00
Daniel Wolf 3e34425c11 Refactoring: Split code into multiple projects 2016-11-16 11:01:01 +01:00
Daniel Wolf 1f6f6d6175 Added convenience function Timed<T>.getDuration() 2016-09-29 12:06:47 +02:00
Daniel Wolf f5b7971f52 Refactoring: Replaced audio "length" with "duration" 2016-09-29 12:06:28 +02:00
Daniel Wolf f44baaa05f Improve noise detection heuristic 2016-09-29 12:06:06 +02:00
Daniel Wolf 750078618c Sharing audio buffer between operations 2016-09-26 13:11:01 +02:00
Daniel Wolf d97c880754 Performing per-utterance cepstral mean normalization
See discussion in https://sourceforge.net/p/cmusphinx/discussion/help/thread/51e2979b/
2016-09-18 22:02:02 +02:00
Daniel Wolf 2aef178eb0 Better error messages for incompatible WAVE files 2016-09-10 21:19:12 +02:00
Daniel Wolf 78027ea63c Thread count can be limited via command-line argument 2016-08-11 10:29:01 +02:00
Daniel Wolf 206cde4658 Supporting noises (breathing, smacking, etc.) 2016-08-11 10:18:03 +02:00
Daniel Wolf 16892ae991 Fixed OS X build 2016-08-10 18:24:24 +02:00
Daniel Wolf 229105a965 Fixed erratic progress display 2016-08-04 20:39:40 +02:00
Daniel Wolf 0a577d1947 Fixed audio resampling
Audio was cut off due to incorrect length calculation
2016-08-03 20:55:45 +02:00
Daniel Wolf 26cae93478 Refactored audio handling
Now audio clips can be passed around as const references
and don't carry state any more.
2016-07-27 21:58:37 +02:00
Daniel Wolf b3b2366468 Re-written library code for parallel execution
The new implementation correctly re-throws exceptions on the calling thread
instead of terminating the application.
2016-07-27 21:44:39 +02:00
Daniel Wolf ddcadad710 Introduced user-defined literal "cs" for centiseconds
Now that ReSharper supports it (see https://youtrack.jetbrains.com/issue/RSCPP-14653)
2016-07-05 21:17:51 +02:00
Daniel Wolf 0447cbb4ff Refactored VAD multithreading 2016-06-30 20:52:29 +02:00
Daniel Wolf 8fa494fb77 Improved VAD quality via dry run 2016-06-30 20:42:36 +02:00
Daniel Wolf 6de7ba020a Fixed VAD error handling 2016-06-30 20:17:28 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf 75407dab54 Augmenting each detected voice activity to give recognizer some silence samples to work with 2016-06-29 21:47:25 +02:00
Daniel Wolf 3a0a38575f Sped up VAD via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 0aeb35c42e Fixed deprecated library calls 2016-06-26 11:06:44 +02:00
Daniel Wolf f275267ac7 Small VAD improvements
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf faa3f2b4bb Fixed overflow with long audio files 2016-06-24 21:51:17 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 895b942df3 Implemented AudioStreamSegment 2016-04-19 22:04:43 +02:00
Daniel Wolf ce204c68de Fixed constness 2016-04-19 21:12:44 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00
Daniel Wolf 4941bff739 Replaced strerror_s with (less safe) strerror
libc++ (Xcode) doesn't seem to support it.
2016-04-13 10:37:10 +02:00
Daniel Wolf d8fbd3596b Fixed UnboundedStream constructor 2016-04-13 10:37:10 +02:00
Daniel Wolf db6f2e076b Fixed GCC build 2016-04-12 23:04:16 +02:00
Daniel Wolf 90e1375f1b Handling zero-length audio files 2016-04-12 20:45:47 +02:00
Daniel Wolf 7bc4e37a1a Improved error handling and error messages 2016-04-12 18:02:52 +02:00
Daniel Wolf 04c828506d Simplified code using Timeline<T> 2016-04-09 22:07:25 +02:00
Daniel Wolf 2be3751a4f Renamed TimeSegment to TimeRange 2016-03-28 20:30:55 +02:00
Daniel Wolf 8c1e24e9c8 Implemented voice activity detection 2016-03-16 21:01:44 +01:00
Daniel Wolf 425f47491c Fixed compiler warnings 2016-03-16 21:01:43 +01:00
Daniel Wolf a8900f80ec Removing DC offset from audio
Also a bit of refactoring regarding audio processing
2016-03-16 21:01:43 +01:00
Daniel Wolf ad9d8e6567 Renamed `audioInput` directory to `audio` 2016-03-08 18:21:17 +01:00