2015-10-19 19:45:08 +00:00
|
|
|
.TH POCKETSPHINX_BATCH 1 "2007-08-27"
|
|
|
|
.SH NAME
|
|
|
|
pocketsphinx_batch \- Run speech recognition in batch mode
|
|
|
|
.SH SYNOPSIS
|
|
|
|
.B pocketsphinx_batch
|
2016-06-19 18:53:24 +00:00
|
|
|
.RI \fB\-ctl\fR
|
|
|
|
\fIctlfile\fR
|
|
|
|
\fB\-cepdir\fR
|
|
|
|
\fIcepdir\fR
|
|
|
|
\fB\-cepext\fR
|
|
|
|
\fI.mfc\fR
|
2015-10-19 19:45:08 +00:00
|
|
|
[\fI options \fR]...
|
|
|
|
.SH DESCRIPTION
|
|
|
|
.PP
|
|
|
|
Run speech recognition over a list of utterances in batchmode. A list
|
|
|
|
of arguments follows:
|
|
|
|
.TP
|
|
|
|
.B \-adchdr
|
|
|
|
Size of audio file header in bytes (headers are ignored)
|
|
|
|
.TP
|
|
|
|
.B \-adcin
|
|
|
|
Input is raw audio data
|
|
|
|
.TP
|
|
|
|
.B \-agc
|
|
|
|
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
|
|
|
|
.TP
|
|
|
|
.B \-agcthresh
|
|
|
|
Initial threshold for automatic gain control
|
|
|
|
.TP
|
|
|
|
.B \-allphone
|
|
|
|
phoneme decoding with phonetic lm
|
|
|
|
.TP
|
|
|
|
.B \-allphone_ci
|
|
|
|
Perform phoneme decoding with phonetic lm and context-independent units only
|
|
|
|
.TP
|
|
|
|
.B \-alpha
|
|
|
|
Preemphasis parameter
|
|
|
|
.TP
|
|
|
|
.B \-argfile
|
|
|
|
file giving extra arguments.
|
|
|
|
.TP
|
|
|
|
.B \-ascale
|
|
|
|
Inverse of acoustic model scale for confidence score calculation
|
|
|
|
.TP
|
|
|
|
.B \-aw
|
|
|
|
Inverse weight applied to acoustic scores.
|
|
|
|
.TP
|
|
|
|
.B \-backtrace
|
|
|
|
Print results and backtraces to log file.
|
|
|
|
.TP
|
|
|
|
.B \-beam
|
|
|
|
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
|
|
|
|
.TP
|
|
|
|
.B \-bestpath
|
|
|
|
Run bestpath (Dijkstra) search over word lattice (3rd pass)
|
|
|
|
.TP
|
|
|
|
.B \-bestpathlw
|
|
|
|
Language model probability weight for bestpath search
|
|
|
|
.TP
|
|
|
|
.B \-build_outdirs
|
|
|
|
Create missing subdirectories in output directory
|
|
|
|
.TP
|
|
|
|
.B \-cepdir
|
|
|
|
files directory (prefixed to filespecs in control file)
|
|
|
|
.TP
|
|
|
|
.B \-cepext
|
|
|
|
Input files extension (suffixed to filespecs in control file)
|
|
|
|
.TP
|
|
|
|
.B \-ceplen
|
|
|
|
Number of components in the input feature vector
|
|
|
|
.TP
|
|
|
|
.B \-cmn
|
|
|
|
Cepstral mean normalization scheme ('current', 'prior', or 'none')
|
|
|
|
.TP
|
|
|
|
.B \-cmninit
|
|
|
|
Initial values (comma-separated) for cepstral mean when 'prior' is used
|
|
|
|
.TP
|
|
|
|
.B \-compallsen
|
|
|
|
Compute all senone scores in every frame (can be faster when there are many senones)
|
|
|
|
.TP
|
|
|
|
.B \-ctl
|
|
|
|
file listing utterances to be processed
|
|
|
|
.TP
|
|
|
|
.B \-ctlcount
|
|
|
|
No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries)
|
|
|
|
.TP
|
|
|
|
.B \-ctlincr
|
|
|
|
Do every Nth line in the control file
|
|
|
|
.TP
|
|
|
|
.B \-ctloffset
|
|
|
|
No. of utterances at the beginning of \fB\-ctl\fR file to be skipped
|
|
|
|
.TP
|
|
|
|
.B \-ctm
|
|
|
|
output in CTM file format (may require post-sorting)
|
|
|
|
.TP
|
|
|
|
.B \-debug
|
|
|
|
level for debugging messages
|
|
|
|
.TP
|
|
|
|
.B \-dict
|
|
|
|
pronunciation dictionary (lexicon) input file
|
|
|
|
.TP
|
|
|
|
.B \-dictcase
|
|
|
|
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
|
|
|
|
.TP
|
|
|
|
.B \-dither
|
|
|
|
Add 1/2-bit noise
|
|
|
|
.TP
|
|
|
|
.B \-doublebw
|
|
|
|
Use double bandwidth filters (same center freq)
|
|
|
|
.TP
|
|
|
|
.B \-ds
|
|
|
|
Frame GMM computation downsampling ratio
|
|
|
|
.TP
|
|
|
|
.B \-fdict
|
|
|
|
word pronunciation dictionary input file
|
|
|
|
.TP
|
|
|
|
.B \-feat
|
|
|
|
Feature stream type, depends on the acoustic model
|
|
|
|
.TP
|
|
|
|
.B \-featparams
|
|
|
|
containing feature extraction parameters.
|
|
|
|
.TP
|
|
|
|
.B \-fillprob
|
|
|
|
Filler word transition probability
|
|
|
|
.TP
|
|
|
|
.B \-frate
|
|
|
|
Frame rate
|
|
|
|
.TP
|
|
|
|
.B \-fsg
|
|
|
|
format finite state grammar file
|
|
|
|
.TP
|
|
|
|
.B \-fsgctl
|
|
|
|
file listing FSG file to use for each utterance
|
|
|
|
.TP
|
|
|
|
.B \-fsgdir
|
|
|
|
directory for FSG files
|
|
|
|
.TP
|
|
|
|
.B \-fsgext
|
|
|
|
extension for FSG files (including leading dot)
|
|
|
|
.TP
|
|
|
|
.B \-fsgusealtpron
|
|
|
|
Add alternate pronunciations to FSG
|
|
|
|
.TP
|
|
|
|
.B \-fsgusefiller
|
|
|
|
Insert filler words at each state.
|
|
|
|
.TP
|
|
|
|
.B \-fwdflat
|
|
|
|
Run forward flat-lexicon search over word lattice (2nd pass)
|
|
|
|
.TP
|
|
|
|
.B \-fwdflatbeam
|
|
|
|
Beam width applied to every frame in second-pass flat search
|
|
|
|
.TP
|
|
|
|
.B \-fwdflatefwid
|
|
|
|
Minimum number of end frames for a word to be searched in fwdflat search
|
|
|
|
.TP
|
|
|
|
.B \-fwdflatlw
|
|
|
|
Language model probability weight for flat lexicon (2nd pass) decoding
|
|
|
|
.TP
|
|
|
|
.B \-fwdflatsfwin
|
|
|
|
Window of frames in lattice to search for successor words in fwdflat search
|
|
|
|
.TP
|
|
|
|
.B \-fwdflatwbeam
|
|
|
|
Beam width applied to word exits in second-pass flat search
|
|
|
|
.TP
|
|
|
|
.B \-fwdtree
|
|
|
|
Run forward lexicon-tree search (1st pass)
|
|
|
|
.TP
|
|
|
|
.B \-hmm
|
|
|
|
containing acoustic model files.
|
|
|
|
.TP
|
|
|
|
.B \-hyp
|
|
|
|
output file name
|
|
|
|
.TP
|
|
|
|
.B \-hypseg
|
|
|
|
output with segmentation file name
|
|
|
|
.TP
|
|
|
|
.B \-input_endian
|
|
|
|
Endianness of input data, big or little, ignored if NIST or MS Wav
|
|
|
|
.TP
|
|
|
|
.B \-jsgf
|
|
|
|
grammar file
|
|
|
|
.TP
|
|
|
|
.B \-keyphrase
|
|
|
|
to spot
|
|
|
|
.TP
|
|
|
|
.B \-kws
|
|
|
|
file with keyphrases to spot, one per line
|
|
|
|
.TP
|
|
|
|
.B \-kws_delay
|
|
|
|
Delay to wait for best detection score
|
|
|
|
.TP
|
|
|
|
.B \-kws_plp
|
|
|
|
Phone loop probability for keyword spotting
|
|
|
|
.TP
|
|
|
|
.B \-kws_threshold
|
|
|
|
Threshold for p(hyp)/p(alternatives) ratio
|
|
|
|
.TP
|
|
|
|
.B \-latsize
|
|
|
|
Initial backpointer table size
|
|
|
|
.TP
|
|
|
|
.B \-lda
|
|
|
|
containing transformation matrix to be applied to features (single-stream features only)
|
|
|
|
.TP
|
|
|
|
.B \-ldadim
|
|
|
|
Dimensionality of output of feature transformation (0 to use entire matrix)
|
|
|
|
.TP
|
|
|
|
.B \-lifter
|
|
|
|
Length of sin-curve for liftering, or 0 for no liftering.
|
|
|
|
.TP
|
|
|
|
.B \-lm
|
|
|
|
trigram language model input file
|
|
|
|
.TP
|
|
|
|
.B \-lmctl
|
|
|
|
a set of language model
|
2016-06-19 18:53:24 +00:00
|
|
|
.TP
|
|
|
|
.B \-lmname
|
|
|
|
language model in \fB\-lmctl\fR to use by default
|
|
|
|
.TP
|
|
|
|
.B \-lmnamectl
|
|
|
|
file listing LM name to use for each utterance
|
|
|
|
.TP
|
|
|
|
.B \-logbase
|
|
|
|
Base in which all log-likelihoods calculated
|
|
|
|
.TP
|
|
|
|
.B \-logfn
|
|
|
|
to write log messages in
|
|
|
|
.TP
|
|
|
|
.B \-logspec
|
|
|
|
Write out logspectral files instead of cepstra
|
|
|
|
.TP
|
|
|
|
.B \-lowerf
|
|
|
|
Lower edge of filters
|
|
|
|
.TP
|
|
|
|
.B \-lpbeam
|
|
|
|
Beam width applied to last phone in words
|
|
|
|
.TP
|
|
|
|
.B \-lponlybeam
|
|
|
|
Beam width applied to last phone in single-phone words
|
|
|
|
.TP
|
|
|
|
.B \-lw
|
|
|
|
Language model probability weight
|
|
|
|
.TP
|
|
|
|
.B \-maxhmmpf
|
|
|
|
Maximum number of active HMMs to maintain at each frame (or \fB\-1\fR for no pruning)
|
|
|
|
.TP
|
|
|
|
.B \-maxwpf
|
|
|
|
Maximum number of distinct word exits at each frame (or \fB\-1\fR for no pruning)
|
|
|
|
.TP
|
|
|
|
.B \-mdef
|
|
|
|
definition input file
|
|
|
|
.TP
|
|
|
|
.B \-mean
|
|
|
|
gaussian means input file
|
|
|
|
.TP
|
|
|
|
.B \-mfclogdir
|
|
|
|
to log feature files to
|
|
|
|
.TP
|
|
|
|
.B \-min_endfr
|
|
|
|
Nodes ignored in lattice construction if they persist for fewer than N frames
|
|
|
|
.TP
|
|
|
|
.B \-mixw
|
|
|
|
mixture weights input file (uncompressed)
|
|
|
|
.TP
|
|
|
|
.B \-mixwfloor
|
|
|
|
Senone mixture weights floor (applied to data from \fB\-mixw\fR file)
|
|
|
|
.TP
|
|
|
|
.B \-mllr
|
|
|
|
transformation to apply to means and variances
|
|
|
|
.TP
|
|
|
|
.B \-mllrctl
|
|
|
|
file listing MLLR transforms to use for each utterance
|
|
|
|
.TP
|
|
|
|
.B \-mllrdir
|
|
|
|
directory for MLLR transforms
|
|
|
|
.TP
|
|
|
|
.B \-mllrext
|
|
|
|
extension for MLLR transforms (including leading dot)
|
|
|
|
.TP
|
|
|
|
.B \-mmap
|
|
|
|
Use memory-mapped I/O (if possible) for model files
|
|
|
|
.TP
|
|
|
|
.B \-nbest
|
|
|
|
Number of N-best hypotheses to write to \fB\-nbestdir\fR (0 for no N-best)
|
|
|
|
.TP
|
|
|
|
.B \-nbestdir
|
|
|
|
for writing N-best hypothesis lists
|
|
|
|
.TP
|
|
|
|
.B \-nbestext
|
|
|
|
Extension for N-best hypothesis list files
|
|
|
|
.TP
|
|
|
|
.B \-ncep
|
|
|
|
Number of cep coefficients
|
|
|
|
.TP
|
|
|
|
.B \-nfft
|
|
|
|
Size of FFT
|
|
|
|
.TP
|
|
|
|
.B \-nfilt
|
|
|
|
Number of filter banks
|
|
|
|
.TP
|
|
|
|
.B \-nwpen
|
|
|
|
New word transition penalty
|
|
|
|
.TP
|
|
|
|
.B \-outlatbeam
|
|
|
|
Minimum posterior probability for output lattice nodes
|
|
|
|
.TP
|
|
|
|
.B \-outlatdir
|
|
|
|
for dumping word lattices
|
|
|
|
.TP
|
|
|
|
.B \-outlatext
|
|
|
|
Filename extension for dumping word lattices
|
|
|
|
.TP
|
|
|
|
.B \-outlatfmt
|
|
|
|
Format for dumping word lattices (s3 or htk)
|
|
|
|
.TP
|
|
|
|
.B \-pbeam
|
|
|
|
Beam width applied to phone transitions
|
|
|
|
.TP
|
|
|
|
.B \-pip
|
|
|
|
Phone insertion penalty
|
|
|
|
.TP
|
|
|
|
.B \-pl_beam
|
|
|
|
Beam width applied to phone loop search for lookahead
|
|
|
|
.TP
|
|
|
|
.B \-pl_pbeam
|
|
|
|
Beam width applied to phone loop transitions for lookahead
|
|
|
|
.TP
|
|
|
|
.B \-pl_pip
|
|
|
|
Phone insertion penalty for phone loop
|
|
|
|
.TP
|
|
|
|
.B \-pl_weight
|
|
|
|
Weight for phoneme lookahead penalties
|
|
|
|
.TP
|
|
|
|
.B \-pl_window
|
|
|
|
Phoneme lookahead window size, in frames
|
|
|
|
.TP
|
|
|
|
.B \-rawlogdir
|
|
|
|
to log raw audio files to
|
|
|
|
.TP
|
|
|
|
.B \-remove_dc
|
|
|
|
Remove DC offset from each frame
|
|
|
|
.TP
|
|
|
|
.B \-remove_noise
|
|
|
|
Remove noise with spectral subtraction in mel-energies
|
|
|
|
.TP
|
|
|
|
.B \-remove_silence
|
|
|
|
Enables VAD, removes silence frames from processing
|
|
|
|
.TP
|
|
|
|
.B \-round_filters
|
|
|
|
Round mel filter frequencies to DFT points
|
|
|
|
.TP
|
|
|
|
.B \-samprate
|
|
|
|
Sampling rate
|
|
|
|
.TP
|
|
|
|
.B \-seed
|
|
|
|
Seed for random number generator; if less than zero, pick our own
|
|
|
|
.TP
|
|
|
|
.B \-sendump
|
|
|
|
dump (compressed mixture weights) input file
|
|
|
|
.TP
|
|
|
|
.B \-senin
|
|
|
|
Input is senone score dump files
|
|
|
|
.TP
|
|
|
|
.B \-senlogdir
|
|
|
|
to log senone score files to
|
|
|
|
.TP
|
|
|
|
.B \-senmgau
|
|
|
|
to codebook mapping input file (usually not needed)
|
|
|
|
.TP
|
|
|
|
.B \-silprob
|
|
|
|
Silence word transition probability
|
|
|
|
.TP
|
|
|
|
.B \-smoothspec
|
|
|
|
Write out cepstral-smoothed logspectral files
|
|
|
|
.TP
|
|
|
|
.B \-svspec
|
|
|
|
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
|
|
|
|
.TP
|
|
|
|
.B \-tmat
|
|
|
|
state transition matrix input file
|
|
|
|
.TP
|
|
|
|
.B \-tmatfloor
|
|
|
|
HMM state transition probability floor (applied to \fB\-tmat\fR file)
|
|
|
|
.TP
|
|
|
|
.B \-topn
|
|
|
|
Maximum number of top Gaussians to use in scoring.
|
|
|
|
.TP
|
|
|
|
.B \-topn_beam
|
|
|
|
Beam width used to determine top-N Gaussians (or a list, per-feature)
|
|
|
|
.TP
|
|
|
|
.B \-toprule
|
|
|
|
rule for JSGF (first public rule is default)
|
|
|
|
.TP
|
|
|
|
.B \-transform
|
|
|
|
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
|
|
|
|
.TP
|
|
|
|
.B \-unit_area
|
|
|
|
Normalize mel filters to unit area
|
|
|
|
.TP
|
|
|
|
.B \-upperf
|
|
|
|
Upper edge of filters
|
|
|
|
.TP
|
|
|
|
.B \-uw
|
|
|
|
Unigram weight
|
|
|
|
.TP
|
|
|
|
.B \-vad_postspeech
|
|
|
|
Num of silence frames to keep after from speech to silence.
|
|
|
|
.TP
|
|
|
|
.B \-vad_prespeech
|
|
|
|
Num of speech frames to keep before silence to speech.
|
|
|
|
.TP
|
|
|
|
.B \-vad_startspeech
|
|
|
|
Num of speech frames to trigger vad from silence to speech.
|
|
|
|
.TP
|
|
|
|
.B \-vad_threshold
|
|
|
|
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
|
|
|
|
.TP
|
|
|
|
.B \-var
|
|
|
|
gaussian variances input file
|
|
|
|
.TP
|
|
|
|
.B \-varfloor
|
|
|
|
Mixture gaussian variance floor (applied to data from \fB\-var\fR file)
|
|
|
|
.TP
|
|
|
|
.B \-varnorm
|
|
|
|
Variance normalize each utterance (only if CMN == current)
|
|
|
|
.TP
|
|
|
|
.B \-verbose
|
|
|
|
Show input filenames
|
|
|
|
.TP
|
|
|
|
.B \-warp_params
|
|
|
|
defining the warping function
|
|
|
|
.TP
|
|
|
|
.B \-warp_type
|
|
|
|
Warping function type (or shape)
|
|
|
|
.TP
|
|
|
|
.B \-wbeam
|
|
|
|
Beam width applied to word exits
|
|
|
|
.TP
|
|
|
|
.B \-wip
|
|
|
|
Word insertion penalty
|
|
|
|
.TP
|
|
|
|
.B \-wlen
|
|
|
|
Hamming window length
|
2015-10-19 19:45:08 +00:00
|
|
|
.PP
|
2016-06-19 18:53:24 +00:00
|
|
|
To do batchmode recognition, you
|
2015-10-19 19:45:08 +00:00
|
|
|
will need to specify a control file, using
|
|
|
|
.B \-ctl
|
|
|
|
This is a simple text file containing one entry per line. Each entry
|
|
|
|
is the name of an input file relative to the
|
|
|
|
.B \-cepdir
|
|
|
|
directory, and without the filename extension (which is given in the
|
|
|
|
.B \-cepext
|
|
|
|
argument).
|
|
|
|
.PP
|
|
|
|
If you are using acoustic feature files as input (see
|
|
|
|
.BR sphinx_fe (1)
|
|
|
|
for information on how to generate these), you can also specify a subpart
|
|
|
|
of a file, using the following format:
|
|
|
|
.PP
|
|
|
|
.RS
|
|
|
|
.B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID
|
|
|
|
.RE
|
|
|
|
.SH AUTHOR
|
|
|
|
Written by numerous people at CMU from 1994 onwards. This manual page
|
|
|
|
by David Huggins-Daines <dhuggins@cs.cmu.edu>
|
|
|
|
.SH COPYRIGHT
|
2016-06-19 18:53:24 +00:00
|
|
|
Copyright \(co 1994-2016 Carnegie Mellon University. See the file
|
|
|
|
\fILICENSE\fR included with this package for more information.
|
2015-10-19 19:45:08 +00:00
|
|
|
.br
|
|
|
|
.SH "SEE ALSO"
|
|
|
|
.BR pocketsphinx_continuous (1),
|
|
|
|
.BR sphinx_fe (1).
|
|
|
|
.br
|