Commit Graph

135 Commits

Author SHA1 Message Date
Daniel Wolf c6c31a831c Using WebRTC for voice activity detection (VAD)
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf 97f172282d Fixed off-by-one error in wave file reader 2016-06-21 21:47:08 +02:00
Daniel Wolf 0e00e58d91 Gracefully handling failed audio alignment 2016-06-21 19:20:27 +02:00
Daniel Wolf 944c374415 Migrated to latest CMU Sphinx version 2016-06-19 21:18:40 +02:00
Daniel Wolf b2f702c8f4 Fixed OS X build 2016-06-16 19:41:49 +02:00
Daniel Wolf 6c9612d2c3 Raised low-pass threshold to better cope with high-pitched voices 2016-06-15 20:14:51 +02:00
Daniel Wolf 4346552312 Improved speed of voice activity detection
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf c4b054176c Fixed WAVE file reader position calculation
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf 522f6c2019 Made audio stream handling safe for long streams 2016-06-15 20:14:43 +02:00
Daniel Wolf d1bbe8538e Added more logging 2016-06-15 20:14:43 +02:00
Daniel Wolf 542a5ee3d8 Added join function for strings 2016-06-15 20:07:51 +02:00
Daniel Wolf 1e29151974 Fixed string conversion for Timed<void> 2016-06-14 17:36:54 +02:00
Daniel Wolf 5cc13cb16f Improved error message 2016-06-14 17:36:18 +02:00
Daniel Wolf 0d488e8de2 Restored dialog option, this time based on language model
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf 4ed5908627 Implemented US-English G2P using sound change rules 2016-06-03 20:02:34 +02:00
Daniel Wolf 8be6485685 Implemented string conversion from Latin-1 to Unicode 2016-06-02 22:21:37 +02:00
Daniel Wolf 4d45bf7c89 Merged ascii.cpp into stringTools.cpp 2016-06-02 20:09:37 +02:00
Daniel Wolf 4d95b4c2c5 Implemented text tokenization using Flite 2016-06-02 18:24:27 +02:00
Daniel Wolf d4b9a8e0c6 Implemented simple conversion from Unicode string to ASCII 2016-06-02 18:24:25 +02:00
Daniel Wolf f1563919e1 Removing redundant prefixes from PocketSphinx log output 2016-05-17 17:56:11 +02:00
Daniel Wolf c67e916185 Splitting audio into utterances before processing
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf bbc933a821 Temporarily removed --dialog option 2016-05-17 14:28:18 +02:00
Daniel Wolf 2f31c5aa61 Refactoring
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf 9eef09145e Added getPairs function 2016-05-12 21:44:46 +02:00
Daniel Wolf baf2423b27 Added time manipulation functions to TimeRange and Timeline 2016-04-19 22:06:20 +02:00
Daniel Wolf 895b942df3 Implemented AudioStreamSegment 2016-04-19 22:04:43 +02:00
Daniel Wolf ce204c68de Fixed constness 2016-04-19 21:12:44 +02:00
Daniel Wolf c14fb1c7b2 Fixed output format for structured logging 2016-04-19 19:30:38 +02:00
Daniel Wolf 8d2d100376 Refactored enum serialization/deserialization 2016-04-17 20:22:16 +02:00
Daniel Wolf 44d18d00f8 Added header file to CMakeLists.txt
This makes navigation easier for me. Plus, ReSharper didn't like not knowing the header files.
2016-04-14 22:14:57 +02:00
Daniel Wolf 7ce79f9c08 Replaced Boost.Log with small custom logger
Boost.Log is a complex monstrosity and I can't get it to build on OS X.
2016-04-14 09:42:47 +02:00
Daniel Wolf 4941bff739 Replaced strerror_s with (less safe) strerror
libc++ (Xcode) doesn't seem to support it.
2016-04-13 10:37:10 +02:00
Daniel Wolf d8fbd3596b Fixed UnboundedStream constructor 2016-04-13 10:37:10 +02:00
Daniel Wolf db6f2e076b Fixed GCC build 2016-04-12 23:04:16 +02:00
Daniel Wolf 4b8e38970a Added hanging indent to help output to make it more readable 2016-04-12 21:23:15 +02:00
Daniel Wolf fd6b3b1e2f Supporting multiple export formats
- Simplified XML export format
- Added TSV and JSON formats
- Using TSV as standard export format
2016-04-12 21:08:23 +02:00
Daniel Wolf 90e1375f1b Handling zero-length audio files 2016-04-12 20:45:47 +02:00
Daniel Wolf 7bc4e37a1a Improved error handling and error messages 2016-04-12 18:02:52 +02:00
Daniel Wolf 04c828506d Simplified code using Timeline<T> 2016-04-09 22:07:25 +02:00
Daniel Wolf 83291aa96c Implemented class Timeline<T> 2016-04-09 20:56:25 +02:00
Daniel Wolf 2be3751a4f Renamed TimeSegment to TimeRange 2016-03-28 20:30:55 +02:00
Daniel Wolf 8c1e24e9c8 Implemented voice activity detection 2016-03-16 21:01:44 +01:00
Daniel Wolf 425f47491c Fixed compiler warnings 2016-03-16 21:01:43 +01:00
Daniel Wolf a8900f80ec Removing DC offset from audio
Also a bit of refactoring regarding audio processing
2016-03-16 21:01:43 +01:00
Daniel Wolf af5a6649c1 Implemented logging to log file 2016-03-08 22:59:44 +01:00
Daniel Wolf 35ec1f8a45 Introduced template functions to unify enum<->string conversions 2016-03-08 22:20:40 +01:00
Daniel Wolf ad9d8e6567 Renamed `audioInput` directory to `audio` 2016-03-08 18:21:17 +01:00
Daniel Wolf b78e418a8f Refactored audio streams
* All streams are now mono (simplifies reasoning about samples)
* Streams can be cloned
* Streams can be seeked within
2016-03-07 21:28:31 +01:00
Daniel Wolf 419b0ec469 Making sure log is written in case of exception 2016-03-06 20:40:31 +01:00
Daniel Wolf 04ca644cca Added structured logging 2016-03-03 22:31:16 +01:00
Daniel Wolf cdffb56613 Redirecting pocketsphinx log to main log 2016-03-03 22:31:16 +01:00
Daniel Wolf 7efea6f56b Prepared for logging using Boost.Log v2 2016-02-29 21:48:27 +01:00
Daniel Wolf 7a1f446ca3 Using GSL 2016-02-29 20:58:58 +01:00
Daniel Wolf 667edf9485 Improved dialog handling 2016-02-10 21:53:58 +01:00
Daniel Wolf 05ef692706 Added (primitive) option to explicitly supply the dialog 2016-02-09 22:08:11 +01:00
Daniel Wolf 9b10f38bcb Added missing include 2016-02-02 10:13:07 +01:00
Daniel Wolf f09155e486 Using raw pointers instead of iterators for string manipulation
This avoids an assertion error when I temporarily move 1 past end
2016-02-01 20:47:27 +01:00
Daniel Wolf 75872fe45d Using -dither to prevent recognition errors in connection with zero silence 2016-02-01 20:26:14 +01:00
Daniel Wolf 0cb0153874 Improved phone-to-mouth mapping 2016-01-31 21:39:49 +01:00
Daniel Wolf 7aa6057b8e Allowing for long pauses in speech without breaking sync 2016-01-28 21:52:50 +01:00
Daniel Wolf c425885929 Showing combined progress for entire task 2016-01-28 19:13:40 +01:00
Daniel Wolf 8e7fcc4efe Implemented two-step phone detection for better accuracy 2016-01-28 14:19:32 +01:00
Daniel Wolf 2bfe671f82 Simplified directory structure to make Visual Studio build work 2016-01-08 16:59:18 +01:00
Daniel Wolf 0f33fcfbd0 Removing zero silence, seems like Sphinx doesn't like it
See http://cmusphinx.sourceforge.net/wiki/faq#qwhy_my_accuracy_is_poor
I couldn't reproduce the original problem, but it doesn't seem to hurt, either.
2016-01-08 16:44:03 +01:00
Daniel Wolf 31cb3b195c Showing progress bar 2016-01-08 10:53:35 +01:00
Daniel Wolf f14feefeb0 Using #pragma once instead of include guards
Just looks cleaner
2016-01-06 21:08:39 +01:00
Daniel Wolf 9e9a432f70 Improved formatting of command-line output 2016-01-06 21:08:39 +01:00
Daniel Wolf 5c0fe24fae Refactoring: Using camelCase throughout 2016-01-06 20:47:37 +01:00
Daniel Wolf acd13e2890 Added a number of string-related tools. 2016-01-06 20:47:29 +01:00
Daniel Wolf 3e5d6e3625 Using TCLAP to parse command line 2016-01-06 20:47:27 +01:00
Daniel Wolf e2840dba3f Fixed warning 2015-12-21 13:26:56 +01:00
Daniel Wolf 4baab9b207 Fixed Windows build 2015-12-21 13:17:14 +01:00
Daniel Wolf 932803d5ad Ported platform-dependent code
Added code for Windows, OS X, Solaris, BSD, and Linux.
Right now, only the Windows version has been tested.
2015-12-14 20:46:31 +01:00
Daniel Wolf e4b5b39504 Fixed corner cases
Handling silences and last mouth shape
2015-12-03 23:07:15 +01:00
Daniel Wolf 7b282ce50f Using std::string instead of std::wstring for command-line args
Turns out that even if I manage to get Unicode command line args,
there still is no portable way of opening a file from a Unicode path.
2015-12-03 23:07:15 +01:00
Daniel Wolf 27ba3ef357 Generating XML output 2015-12-03 23:07:15 +01:00
Daniel Wolf 2ef99119b0 Generating mouth shapes using simple lookup table 2015-12-01 22:55:53 +01:00
Daniel Wolf 994e2be314 Redirecting PocketSphinx log output 2015-12-01 22:55:53 +01:00
Daniel Wolf d6f5c2ed1e Reading sound file name from command line 2015-12-01 22:55:53 +01:00
Daniel Wolf 132adb1083 Improved error handling
Plus some refactoring
2015-12-01 22:55:53 +01:00
Daniel Wolf f2f6f75932 Refactoring
- Moved phone recognition code to phone_extraction.cpp
- Introduced type centiseconds
- Code reorganization
2015-12-01 22:55:52 +01:00
Daniel Wolf 713e8b5d7f Fixed comment 2015-10-31 20:41:17 +01:00
Daniel Wolf d96bf12c96 Fixed model path; enabled fast mode 2015-10-19 22:03:29 +02:00
Daniel Wolf 3cd82e89f8 Reading WAVE file 2015-10-19 22:03:29 +02:00
Daniel Wolf 641f64022d Implemented WAVE reading, writing, and conversion 2015-10-19 22:03:20 +02:00