399 lines
8.4 KiB
Groff
399 lines
8.4 KiB
Groff
|
.TH POCKETSPHINX_CONTINUOUS 1 "2016-04-01"
|
||
|
.SH NAME
|
||
|
pocketsphinx_continuous \- Run speech recognition in continuous listening mode
|
||
|
.SH SYNOPSIS
|
||
|
.B pocketsphinx_continuous
|
||
|
.RI [ \fB\-infile\fR
|
||
|
\fIfilename.wav\fR ]
|
||
|
[ \fB\-inmic yes\fR ]
|
||
|
[ \fIoptions\fR ]...
|
||
|
.SH DESCRIPTION
|
||
|
.PP
|
||
|
This program opens the audio device or a file and waits for speech. When it
|
||
|
detects an utterance, it performs speech recognition on it.
|
||
|
.PP
|
||
|
To record from microphone and decode use
|
||
|
.TP
|
||
|
.B \-inmic yes
|
||
|
.PP
|
||
|
To decode a 16kHz 16-bit mono WAV file use
|
||
|
.TP
|
||
|
.B \-infile \fIfilename.wav\fR
|
||
|
.PP
|
||
|
You can also specify
|
||
|
.B \-lm
|
||
|
or
|
||
|
.B \-fsg
|
||
|
or
|
||
|
.B \-kws
|
||
|
depending on whether you are using a statistical language
|
||
|
model or a finite-state grammar or look for a keyphase.
|
||
|
.SH OPTIONS
|
||
|
.TP
|
||
|
.B \-adcdev
|
||
|
of audio device to use for input.
|
||
|
.TP
|
||
|
.B \-agc
|
||
|
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
|
||
|
.TP
|
||
|
.B \-agcthresh
|
||
|
Initial threshold for automatic gain control
|
||
|
.TP
|
||
|
.B \-allphone
|
||
|
phoneme decoding with phonetic lm
|
||
|
.TP
|
||
|
.B \-allphone_ci
|
||
|
Perform phoneme decoding with phonetic lm and context-independent units only
|
||
|
.TP
|
||
|
.B \-alpha
|
||
|
Preemphasis parameter
|
||
|
.TP
|
||
|
.B \-argfile
|
||
|
file giving extra arguments.
|
||
|
.TP
|
||
|
.B \-ascale
|
||
|
Inverse of acoustic model scale for confidence score calculation
|
||
|
.TP
|
||
|
.B \-aw
|
||
|
Inverse weight applied to acoustic scores.
|
||
|
.TP
|
||
|
.B \-backtrace
|
||
|
Print results and backtraces to log file.
|
||
|
.TP
|
||
|
.B \-beam
|
||
|
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
|
||
|
.TP
|
||
|
.B \-bestpath
|
||
|
Run bestpath (Dijkstra) search over word lattice (3rd pass)
|
||
|
.TP
|
||
|
.B \-bestpathlw
|
||
|
Language model probability weight for bestpath search
|
||
|
.TP
|
||
|
.B \-ceplen
|
||
|
Number of components in the input feature vector
|
||
|
.TP
|
||
|
.B \-cmn
|
||
|
Cepstral mean normalization scheme ('current', 'prior', or 'none')
|
||
|
.TP
|
||
|
.B \-cmninit
|
||
|
Initial values (comma-separated) for cepstral mean when 'prior' is used
|
||
|
.TP
|
||
|
.B \-compallsen
|
||
|
Compute all senone scores in every frame (can be faster when there are many senones)
|
||
|
.TP
|
||
|
.B \-debug
|
||
|
level for debugging messages
|
||
|
.TP
|
||
|
.B \-dict
|
||
|
pronunciation dictionary (lexicon) input file
|
||
|
.TP
|
||
|
.B \-dictcase
|
||
|
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
|
||
|
.TP
|
||
|
.B \-dither
|
||
|
Add 1/2-bit noise
|
||
|
.TP
|
||
|
.B \-doublebw
|
||
|
Use double bandwidth filters (same center freq)
|
||
|
.TP
|
||
|
.B \-ds
|
||
|
Frame GMM computation downsampling ratio
|
||
|
.TP
|
||
|
.B \-fdict
|
||
|
word pronunciation dictionary input file
|
||
|
.TP
|
||
|
.B \-feat
|
||
|
Feature stream type, depends on the acoustic model
|
||
|
.TP
|
||
|
.B \-featparams
|
||
|
containing feature extraction parameters.
|
||
|
.TP
|
||
|
.B \-fillprob
|
||
|
Filler word transition probability
|
||
|
.TP
|
||
|
.B \-frate
|
||
|
Frame rate
|
||
|
.TP
|
||
|
.B \-fsg
|
||
|
format finite state grammar file
|
||
|
.TP
|
||
|
.B \-fsgusealtpron
|
||
|
Add alternate pronunciations to FSG
|
||
|
.TP
|
||
|
.B \-fsgusefiller
|
||
|
Insert filler words at each state.
|
||
|
.TP
|
||
|
.B \-fwdflat
|
||
|
Run forward flat-lexicon search over word lattice (2nd pass)
|
||
|
.TP
|
||
|
.B \-fwdflatbeam
|
||
|
Beam width applied to every frame in second-pass flat search
|
||
|
.TP
|
||
|
.B \-fwdflatefwid
|
||
|
Minimum number of end frames for a word to be searched in fwdflat search
|
||
|
.TP
|
||
|
.B \-fwdflatlw
|
||
|
Language model probability weight for flat lexicon (2nd pass) decoding
|
||
|
.TP
|
||
|
.B \-fwdflatsfwin
|
||
|
Window of frames in lattice to search for successor words in fwdflat search
|
||
|
.TP
|
||
|
.B \-fwdflatwbeam
|
||
|
Beam width applied to word exits in second-pass flat search
|
||
|
.TP
|
||
|
.B \-fwdtree
|
||
|
Run forward lexicon-tree search (1st pass)
|
||
|
.TP
|
||
|
.B \-hmm
|
||
|
containing acoustic model files.
|
||
|
.TP
|
||
|
.B \-infile
|
||
|
file to transcribe.
|
||
|
.TP
|
||
|
.B \-inmic
|
||
|
Transcribe audio from microphone.
|
||
|
.TP
|
||
|
.B \-input_endian
|
||
|
Endianness of input data, big or little, ignored if NIST or MS Wav
|
||
|
.TP
|
||
|
.B \-jsgf
|
||
|
grammar file
|
||
|
.TP
|
||
|
.B \-keyphrase
|
||
|
to spot
|
||
|
.TP
|
||
|
.B \-kws
|
||
|
file with keyphrases to spot, one per line
|
||
|
.TP
|
||
|
.B \-kws_delay
|
||
|
Delay to wait for best detection score
|
||
|
.TP
|
||
|
.B \-kws_plp
|
||
|
Phone loop probability for keyword spotting
|
||
|
.TP
|
||
|
.B \-kws_threshold
|
||
|
Threshold for p(hyp)/p(alternatives) ratio
|
||
|
.TP
|
||
|
.B \-latsize
|
||
|
Initial backpointer table size
|
||
|
.TP
|
||
|
.B \-lda
|
||
|
containing transformation matrix to be applied to features (single-stream features only)
|
||
|
.TP
|
||
|
.B \-ldadim
|
||
|
Dimensionality of output of feature transformation (0 to use entire matrix)
|
||
|
.TP
|
||
|
.B \-lifter
|
||
|
Length of sin-curve for liftering, or 0 for no liftering.
|
||
|
.TP
|
||
|
.B \-lm
|
||
|
trigram language model input file
|
||
|
.TP
|
||
|
.B \-lmctl
|
||
|
a set of language model
|
||
|
.TP
|
||
|
.B \-lmname
|
||
|
language model in \fB\-lmctl\fR to use by default
|
||
|
.TP
|
||
|
.B \-logbase
|
||
|
Base in which all log-likelihoods calculated
|
||
|
.TP
|
||
|
.B \-logfn
|
||
|
to write log messages in
|
||
|
.TP
|
||
|
.B \-logspec
|
||
|
Write out logspectral files instead of cepstra
|
||
|
.TP
|
||
|
.B \-lowerf
|
||
|
Lower edge of filters
|
||
|
.TP
|
||
|
.B \-lpbeam
|
||
|
Beam width applied to last phone in words
|
||
|
.TP
|
||
|
.B \-lponlybeam
|
||
|
Beam width applied to last phone in single-phone words
|
||
|
.TP
|
||
|
.B \-lw
|
||
|
Language model probability weight
|
||
|
.TP
|
||
|
.B \-maxhmmpf
|
||
|
Maximum number of active HMMs to maintain at each frame (or \fB\-1\fR for no pruning)
|
||
|
.TP
|
||
|
.B \-maxwpf
|
||
|
Maximum number of distinct word exits at each frame (or \fB\-1\fR for no pruning)
|
||
|
.TP
|
||
|
.B \-mdef
|
||
|
definition input file
|
||
|
.TP
|
||
|
.B \-mean
|
||
|
gaussian means input file
|
||
|
.TP
|
||
|
.B \-mfclogdir
|
||
|
to log feature files to
|
||
|
.TP
|
||
|
.B \-min_endfr
|
||
|
Nodes ignored in lattice construction if they persist for fewer than N frames
|
||
|
.TP
|
||
|
.B \-mixw
|
||
|
mixture weights input file (uncompressed)
|
||
|
.TP
|
||
|
.B \-mixwfloor
|
||
|
Senone mixture weights floor (applied to data from \fB\-mixw\fR file)
|
||
|
.TP
|
||
|
.B \-mllr
|
||
|
transformation to apply to means and variances
|
||
|
.TP
|
||
|
.B \-mmap
|
||
|
Use memory-mapped I/O (if possible) for model files
|
||
|
.TP
|
||
|
.B \-ncep
|
||
|
Number of cep coefficients
|
||
|
.TP
|
||
|
.B \-nfft
|
||
|
Size of FFT
|
||
|
.TP
|
||
|
.B \-nfilt
|
||
|
Number of filter banks
|
||
|
.TP
|
||
|
.B \-nwpen
|
||
|
New word transition penalty
|
||
|
.TP
|
||
|
.B \-pbeam
|
||
|
Beam width applied to phone transitions
|
||
|
.TP
|
||
|
.B \-pip
|
||
|
Phone insertion penalty
|
||
|
.TP
|
||
|
.B \-pl_beam
|
||
|
Beam width applied to phone loop search for lookahead
|
||
|
.TP
|
||
|
.B \-pl_pbeam
|
||
|
Beam width applied to phone loop transitions for lookahead
|
||
|
.TP
|
||
|
.B \-pl_pip
|
||
|
Phone insertion penalty for phone loop
|
||
|
.TP
|
||
|
.B \-pl_weight
|
||
|
Weight for phoneme lookahead penalties
|
||
|
.TP
|
||
|
.B \-pl_window
|
||
|
Phoneme lookahead window size, in frames
|
||
|
.TP
|
||
|
.B \-rawlogdir
|
||
|
to log raw audio files to
|
||
|
.TP
|
||
|
.B \-remove_dc
|
||
|
Remove DC offset from each frame
|
||
|
.TP
|
||
|
.B \-remove_noise
|
||
|
Remove noise with spectral subtraction in mel-energies
|
||
|
.TP
|
||
|
.B \-remove_silence
|
||
|
Enables VAD, removes silence frames from processing
|
||
|
.TP
|
||
|
.B \-round_filters
|
||
|
Round mel filter frequencies to DFT points
|
||
|
.TP
|
||
|
.B \-samprate
|
||
|
Sampling rate
|
||
|
.TP
|
||
|
.B \-seed
|
||
|
Seed for random number generator; if less than zero, pick our own
|
||
|
.TP
|
||
|
.B \-sendump
|
||
|
dump (compressed mixture weights) input file
|
||
|
.TP
|
||
|
.B \-senlogdir
|
||
|
to log senone score files to
|
||
|
.TP
|
||
|
.B \-senmgau
|
||
|
to codebook mapping input file (usually not needed)
|
||
|
.TP
|
||
|
.B \-silprob
|
||
|
Silence word transition probability
|
||
|
.TP
|
||
|
.B \-smoothspec
|
||
|
Write out cepstral-smoothed logspectral files
|
||
|
.TP
|
||
|
.B \-svspec
|
||
|
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
|
||
|
.TP
|
||
|
.B \-time
|
||
|
Print word times in file transcription.
|
||
|
.TP
|
||
|
.B \-tmat
|
||
|
state transition matrix input file
|
||
|
.TP
|
||
|
.B \-tmatfloor
|
||
|
HMM state transition probability floor (applied to \fB\-tmat\fR file)
|
||
|
.TP
|
||
|
.B \-topn
|
||
|
Maximum number of top Gaussians to use in scoring.
|
||
|
.TP
|
||
|
.B \-topn_beam
|
||
|
Beam width used to determine top-N Gaussians (or a list, per-feature)
|
||
|
.TP
|
||
|
.B \-toprule
|
||
|
rule for JSGF (first public rule is default)
|
||
|
.TP
|
||
|
.B \-transform
|
||
|
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
|
||
|
.TP
|
||
|
.B \-unit_area
|
||
|
Normalize mel filters to unit area
|
||
|
.TP
|
||
|
.B \-upperf
|
||
|
Upper edge of filters
|
||
|
.TP
|
||
|
.B \-uw
|
||
|
Unigram weight
|
||
|
.TP
|
||
|
.B \-vad_postspeech
|
||
|
Num of silence frames to keep after from speech to silence.
|
||
|
.TP
|
||
|
.B \-vad_prespeech
|
||
|
Num of speech frames to keep before silence to speech.
|
||
|
.TP
|
||
|
.B \-vad_startspeech
|
||
|
Num of speech frames to trigger vad from silence to speech.
|
||
|
.TP
|
||
|
.B \-vad_threshold
|
||
|
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
|
||
|
.TP
|
||
|
.B \-var
|
||
|
gaussian variances input file
|
||
|
.TP
|
||
|
.B \-varfloor
|
||
|
Mixture gaussian variance floor (applied to data from \fB\-var\fR file)
|
||
|
.TP
|
||
|
.B \-varnorm
|
||
|
Variance normalize each utterance (only if CMN == current)
|
||
|
.TP
|
||
|
.B \-verbose
|
||
|
Show input filenames
|
||
|
.TP
|
||
|
.B \-warp_params
|
||
|
defining the warping function
|
||
|
.TP
|
||
|
.B \-warp_type
|
||
|
Warping function type (or shape)
|
||
|
.TP
|
||
|
.B \-wbeam
|
||
|
Beam width applied to word exits
|
||
|
.TP
|
||
|
.B \-wip
|
||
|
Word insertion penalty
|
||
|
.TP
|
||
|
.B \-wlen
|
||
|
Hamming window length
|
||
|
.SH AUTHOR
|
||
|
Written by numerous people at CMU from 1994 onwards. This manual page
|
||
|
by David Huggins-Daines <dhuggins@cs.cmu.edu>
|
||
|
.SH COPYRIGHT
|
||
|
Copyright \(co 1994-2016 Carnegie Mellon University. See the file
|
||
|
\fILICENSE\fR included with this package for more information.
|
||
|
.br
|
||
|
.SH "SEE ALSO"
|
||
|
.BR pocketsphinx_batch (1),
|
||
|
.BR sphinx_fe (1).
|
||
|
.br
|