Commit Graph

158 Commits

Author SHA1 Message Date
Daniel Wolf 26cae93478 Refactored audio handling
Now audio clips can be passed around as const references
and don't carry state any more.
2016-07-27 21:58:37 +02:00
Daniel Wolf 799f334fa7 Using unique_ptr instead of raw pointers in object pool 2016-07-27 21:44:39 +02:00
Daniel Wolf b3b2366468 Re-written library code for parallel execution
The new implementation correctly re-throws exceptions on the calling thread
instead of terminating the application.
2016-07-27 21:44:39 +02:00
Daniel Wolf 5198ee9230 Made Lazy<T> copyable 2016-07-20 20:16:23 +02:00
Daniel Wolf 17b43ad205 Added class Lazy<T> 2016-07-19 21:33:07 +02:00
Daniel Wolf ddcadad710 Introduced user-defined literal "cs" for centiseconds
Now that ReSharper supports it (see https://youtrack.jetbrains.com/issue/RSCPP-14653)
2016-07-05 21:17:51 +02:00
Daniel Wolf 0447cbb4ff Refactored VAD multithreading 2016-06-30 20:52:29 +02:00
Daniel Wolf 8fa494fb77 Improved VAD quality via dry run 2016-06-30 20:42:36 +02:00
Daniel Wolf 6de7ba020a Fixed VAD error handling 2016-06-30 20:17:28 +02:00
Daniel Wolf ed27b8470c Workaround for PocketSphinx bug
See https://sourceforge.net/p/cmusphinx/discussion/help/thread/f1dd91c5/#7529
Also minor refactoring.
2016-06-30 20:06:38 +02:00
Daniel Wolf 2c0471e79f Improved lip animation for B/P and L sounds 2016-06-29 22:35:14 +02:00
Daniel Wolf 2d314f4bc7 Multithreaded recognition: refactoring and fixes
* Decoders are correctly released after use
* Determining optimal thread count for multithreading
2016-06-29 21:47:25 +02:00
Daniel Wolf f13449f810 Added thread info to logging 2016-06-29 21:47:25 +02:00
Daniel Wolf 75407dab54 Augmenting each detected voice activity to give recognizer some silence samples to work with 2016-06-29 21:47:25 +02:00
Daniel Wolf 2a5ed95698 Improved animation quality through new algorithm
Using "lazy" ruleset instead of 1:1 mapping from phones
2016-06-29 21:46:08 +02:00
Daniel Wolf 8c9466bcf3 Removed mouth shape H (special shape for 'L' sound) 2016-06-26 21:06:22 +02:00
Daniel Wolf 9bf8355742 Sped up recognition via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 3a0a38575f Sped up VAD via multithreading 2016-06-26 21:06:21 +02:00
Daniel Wolf 84097756c8 Added ThreadPool class 2016-06-26 14:02:17 +02:00
Daniel Wolf 0aeb35c42e Fixed deprecated library calls 2016-06-26 11:06:44 +02:00
Daniel Wolf c9b17e1937 Improved tokenization by taking dictionary into account 2016-06-25 21:52:04 +02:00
Daniel Wolf f275267ac7 Small VAD improvements
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf faa3f2b4bb Fixed overflow with long audio files 2016-06-24 21:51:17 +02:00
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf b2f702c8f4 Fixed OS X build 2016-06-16 19:41:49 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 542a5ee3d8 Added join function for strings 2016-06-15 20:07:51 +02:00
Daniel Wolf 1e29151974 Fixed string conversion for Timed<void> 2016-06-14 17:36:54 +02:00
Daniel Wolf 5cc13cb16f Improved error message 2016-06-14 17:36:18 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf 4ed5908627 Implemented US-English G2P using sound change rules 2016-06-03 20:02:34 +02:00
Daniel Wolf 8be6485685 Implemented string conversion from Latin-1 to Unicode 2016-06-02 22:21:37 +02:00
Daniel Wolf 4d45bf7c89 Merged ascii.cpp into stringTools.cpp 2016-06-02 20:09:37 +02:00
Daniel Wolf 4d95b4c2c5 Implemented text tokenization using Flite 2016-06-02 18:24:27 +02:00
Daniel Wolf d4b9a8e0c6 Implemented simple conversion from Unicode string to ASCII 2016-06-02 18:24:25 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 9eef09145e Added getPairs function 2016-05-12 21:44:46 +02:00
Daniel Wolf baf2423b27 Added time manipulation functions to TimeRange and Timeline 2016-04-19 22:06:20 +02:00
Daniel Wolf 895b942df3 Implemented AudioStreamSegment 2016-04-19 22:04:43 +02:00
Daniel Wolf ce204c68de Fixed constness 2016-04-19 21:12:44 +02:00
Daniel Wolf c14fb1c7b2 Fixed output format for structured logging 2016-04-19 19:30:38 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 44d18d00f8 Added header file to CMakeLists.txt
This makes navigation easier for me. Plus, ReSharper didn't like not knowing the header files.
2016-04-14 22:14:57 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00
Daniel Wolf 4941bff739 Replaced strerror_s with (less safe) strerror
libc++ (Xcode) doesn't seem to support it.
2016-04-13 10:37:10 +02:00
Daniel Wolf d8fbd3596b Fixed UnboundedStream constructor 2016-04-13 10:37:10 +02:00
Daniel Wolf db6f2e076b Fixed GCC build 2016-04-12 23:04:16 +02:00
Daniel Wolf 4b8e38970a Added hanging indent to help output to make it more readable 2016-04-12 21:23:15 +02:00
Daniel Wolf fd6b3b1e2f Supporting multiple export formats
- Simplified XML export format
- Added TSV and JSON formats
- Using TSV as standard export format
2016-04-12 21:08:23 +02:00
Daniel Wolf 90e1375f1b Handling zero-length audio files 2016-04-12 20:45:47 +02:00
Daniel Wolf 7bc4e37a1a Improved error handling and error messages 2016-04-12 18:02:52 +02:00
Daniel Wolf 04c828506d Simplified code using Timeline<T> 2016-04-09 22:07:25 +02:00
Daniel Wolf 83291aa96c Implemented class Timeline<T> 2016-04-09 20:56:25 +02:00
Daniel Wolf 2be3751a4f Renamed TimeSegment to TimeRange 2016-03-28 20:30:55 +02:00
Daniel Wolf 8c1e24e9c8 Implemented voice activity detection 2016-03-16 21:01:44 +01:00
Daniel Wolf 425f47491c Fixed compiler warnings 2016-03-16 21:01:43 +01:00
Daniel Wolf a8900f80ec Removing DC offset from audio
Also a bit of refactoring regarding audio processing
2016-03-16 21:01:43 +01:00
Daniel Wolf af5a6649c1 Implemented logging to log file 2016-03-08 22:59:44 +01:00
Daniel Wolf 35ec1f8a45 Introduced template functions to unify enum<->string conversions 2016-03-08 22:20:40 +01:00
Daniel Wolf ad9d8e6567 Renamed `audioInput` directory to `audio` 2016-03-08 18:21:17 +01:00
Daniel Wolf b78e418a8f Refactored audio streams
* All streams are now mono (simplifies reasoning about samples)
* Streams can be cloned
* Streams can be seeked within
2016-03-07 21:28:31 +01:00
Daniel Wolf 419b0ec469 Making sure log is written in case of exception 2016-03-06 20:40:31 +01:00
Daniel Wolf 04ca644cca Added structured logging 2016-03-03 22:31:16 +01:00
Daniel Wolf cdffb56613 Redirecting pocketsphinx log to main log 2016-03-03 22:31:16 +01:00
Daniel Wolf 7efea6f56b Prepared for logging using Boost.Log v2 2016-02-29 21:48:27 +01:00
Daniel Wolf 7a1f446ca3 Using GSL 2016-02-29 20:58:58 +01:00
Daniel Wolf 667edf9485 Improved dialog handling 2016-02-10 21:53:58 +01:00
Daniel Wolf 05ef692706 Added (primitive) option to explicitly supply the dialog 2016-02-09 22:08:11 +01:00
Daniel Wolf 9b10f38bcb Added missing include 2016-02-02 10:13:07 +01:00
Daniel Wolf f09155e486 Using raw pointers instead of iterators for string manipulation
This avoids an assertion error when I temporarily move 1 past end
2016-02-01 20:47:27 +01:00
Daniel Wolf 75872fe45d Using -dither to prevent recognition errors in connection with zero silence 2016-02-01 20:26:14 +01:00
Daniel Wolf 0cb0153874 Improved phone-to-mouth mapping 2016-01-31 21:39:49 +01:00
Daniel Wolf 7aa6057b8e Allowing for long pauses in speech without breaking sync 2016-01-28 21:52:50 +01:00
Daniel Wolf c425885929 Showing combined progress for entire task 2016-01-28 19:13:40 +01:00
Daniel Wolf 8e7fcc4efe Implemented two-step phone detection for better accuracy 2016-01-28 14:19:32 +01:00
Daniel Wolf 2bfe671f82 Simplified directory structure to make Visual Studio build work 2016-01-08 16:59:18 +01:00
Daniel Wolf 0f33fcfbd0 Removing zero silence, seems like Sphinx doesn't like it
See http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
I couldn't reproduce the original problem, but it doesn't seem to hurt, either.
2016-01-08 16:44:03 +01:00
Daniel Wolf 31cb3b195c Showing progress bar 2016-01-08 10:53:35 +01:00
Daniel Wolf f14feefeb0 Using #pragma once instead of include guards
Just looks cleaner
2016-01-06 21:08:39 +01:00
Daniel Wolf 9e9a432f70 Improved formatting of command-line output 2016-01-06 21:08:39 +01:00
Daniel Wolf 5c0fe24fae Refactoring: Using camelCase throughout 2016-01-06 20:47:37 +01:00
Daniel Wolf acd13e2890 Added a number of string-related tools. 2016-01-06 20:47:29 +01:00
Daniel Wolf 3e5d6e3625 Using TCLAP to parse command line 2016-01-06 20:47:27 +01:00
Daniel Wolf e2840dba3f Fixed warning 2015-12-21 13:26:56 +01:00
Daniel Wolf 4baab9b207 Fixed Windows build 2015-12-21 13:17:14 +01:00
Daniel Wolf 932803d5ad Ported platform-dependent code
Added code for Windows, OS X, Solaris, BSD, and Linux.
Right now, only the Windows version has been tested.
2015-12-14 20:46:31 +01:00
Daniel Wolf e4b5b39504 Fixed corner cases
Handling silences and last mouth shape
2015-12-03 23:07:15 +01:00
Daniel Wolf 7b282ce50f Using std::string instead of std::wstring for command-line args
Turns out that even if I manage to get Unicode command line args,
there still is no portable way of opening a file from a Unicode path.
2015-12-03 23:07:15 +01:00
Daniel Wolf 27ba3ef357 Generating XML output 2015-12-03 23:07:15 +01:00
Daniel Wolf 2ef99119b0 Generating mouth shapes using simple lookup table 2015-12-01 22:55:53 +01:00