Daniel Wolf
75407dab54
Augmenting each detected voice activity to give recognizer some silence samples to work with
2016-06-29 21:47:25 +02:00
Daniel Wolf
2a5ed95698
Improved animation quality through new algorithm
...
Using "lazy" ruleset instead of 1:1 mapping from phones
2016-06-29 21:46:08 +02:00
Daniel Wolf
8c9466bcf3
Removed mouth shape H (special shape for 'L' sound)
2016-06-26 21:06:22 +02:00
Daniel Wolf
9bf8355742
Sped up recognition via multithreading
2016-06-26 21:06:21 +02:00
Daniel Wolf
3a0a38575f
Sped up VAD via multithreading
2016-06-26 21:06:21 +02:00
Daniel Wolf
84097756c8
Added ThreadPool class
2016-06-26 14:02:17 +02:00
Daniel Wolf
0aeb35c42e
Fixed deprecated library calls
2016-06-26 11:06:44 +02:00
Daniel Wolf
96b0ad9b1d
Switched to better acoustic model
2016-06-25 22:07:28 +02:00
Daniel Wolf
da78375a10
Added CMU Sphinx US English acoustic model
2016-06-25 22:00:47 +02:00
Daniel Wolf
c9b17e1937
Improved tokenization by taking dictionary into account
2016-06-25 21:52:04 +02:00
Daniel Wolf
8502256241
Updated LICENSE.md
2016-06-25 21:51:06 +02:00
Daniel Wolf
f275267ac7
Small VAD improvements
...
* RAII
* Slightly fewer false positives
2016-06-24 22:35:33 +02:00
Daniel Wolf
faa3f2b4bb
Fixed overflow with long audio files
2016-06-24 21:51:17 +02:00
Daniel Wolf
c6c31a831c
Using WebRTC for voice activity detection (VAD)
...
My simple power-based approach wasn't reliable enough.
2016-06-21 22:20:18 +02:00
Daniel Wolf
aec3dbae01
Added WebRTC library
2016-06-21 22:13:05 +02:00
Daniel Wolf
97f172282d
Fixed off-by-one error in wave file reader
2016-06-21 21:47:08 +02:00
Daniel Wolf
0e00e58d91
Gracefully handling failed audio alignment
2016-06-21 19:20:27 +02:00
Daniel Wolf
944c374415
Migrated to latest CMU Sphinx version
2016-06-19 21:18:40 +02:00
Daniel Wolf
478766ff6e
Updated CMU SphinxBase and PocketSphinx
2016-06-19 20:53:24 +02:00
Daniel Wolf
b2f702c8f4
Fixed OS X build
2016-06-16 19:41:49 +02:00
Daniel Wolf
6c9612d2c3
Raised low-pass threshold to better cope with high-pitched voices
2016-06-15 20:14:51 +02:00
Daniel Wolf
4346552312
Improved speed of voice activity detection
...
... by factor 2 by removing second pass.
Also added voice activity detection to progress calculation.
2016-06-15 20:14:51 +02:00
Daniel Wolf
c4b054176c
Fixed WAVE file reader position calculation
...
The bug only showed through massive seek times.
2016-06-15 20:14:44 +02:00
Daniel Wolf
522f6c2019
Made audio stream handling safe for long streams
2016-06-15 20:14:43 +02:00
Daniel Wolf
d1bbe8538e
Added more logging
2016-06-15 20:14:43 +02:00
Daniel Wolf
542a5ee3d8
Added join function for strings
2016-06-15 20:07:51 +02:00
Daniel Wolf
1e29151974
Fixed string conversion for Timed<void>
2016-06-14 17:36:54 +02:00
Daniel Wolf
5cc13cb16f
Improved error message
2016-06-14 17:36:18 +02:00
Daniel Wolf
0d488e8de2
Restored dialog option, this time based on language model
...
This approach should be more robust and error-tolerant.
2016-06-10 22:35:27 +02:00
Daniel Wolf
4ed5908627
Implemented US-English G2P using sound change rules
2016-06-03 20:02:34 +02:00
Daniel Wolf
7a763e8755
Fixed syntax error in sound change data
2016-06-03 20:00:46 +02:00
Daniel Wolf
bf19d267ee
Added sound change code and data
2016-06-03 10:37:47 +02:00
Daniel Wolf
8be6485685
Implemented string conversion from Latin-1 to Unicode
2016-06-02 22:21:37 +02:00
Daniel Wolf
4d45bf7c89
Merged ascii.cpp into stringTools.cpp
2016-06-02 20:09:37 +02:00
Daniel Wolf
4d95b4c2c5
Implemented text tokenization using Flite
2016-06-02 18:24:27 +02:00
Daniel Wolf
8d1c618cec
Patched Flite to prevent name collision with PocketSphinx
2016-06-02 18:24:27 +02:00
Daniel Wolf
942cabd773
Added Flite as library
2016-06-02 18:24:26 +02:00
Daniel Wolf
9f4ebd23e3
Added Flite 1.4 code
...
I'm not using version 2.0 because that version makes it almost impossible
to create a slim build without compiling all the voice synth code (which
we don't need).
2016-06-02 18:24:26 +02:00
Daniel Wolf
d4b9a8e0c6
Implemented simple conversion from Unicode string to ASCII
2016-06-02 18:24:25 +02:00
Daniel Wolf
f1563919e1
Removing redundant prefixes from PocketSphinx log output
2016-05-17 17:56:11 +02:00
Daniel Wolf
c67e916185
Splitting audio into utterances before processing
...
Advantages:
* No problems with long silences (PocketSphinx doesn't like them)
* Potential for parallelization
* Potential for improved phone timing accuracy
2016-05-17 16:01:10 +02:00
Daniel Wolf
bbc933a821
Temporarily removed --dialog option
2016-05-17 14:28:18 +02:00
Daniel Wolf
2f31c5aa61
Refactoring
...
* Rewriting Timeline<T> to be sparse, i.e., allow gaps
* Added specialized subclasses BoundedTimeline<T> and ContinuousTimeline<T>
* Timed<T> and TimeRange: has-a, not is-a
* Introducing Timed<void>
2016-05-17 14:28:18 +02:00
Daniel Wolf
9eef09145e
Added getPairs function
2016-05-12 21:44:46 +02:00
Daniel Wolf
baf2423b27
Added time manipulation functions to TimeRange and Timeline
2016-04-19 22:06:20 +02:00
Daniel Wolf
895b942df3
Implemented AudioStreamSegment
2016-04-19 22:04:43 +02:00
Daniel Wolf
ce204c68de
Fixed constness
2016-04-19 21:12:44 +02:00
Daniel Wolf
c14fb1c7b2
Fixed output format for structured logging
2016-04-19 19:30:38 +02:00
Daniel Wolf
560281807e
Version 0.2.0
2016-04-17 20:22:17 +02:00
Daniel Wolf
8d2d100376
Refactored enum serialization/deserialization
2016-04-17 20:22:16 +02:00