rhubarb-lip-sync/rhubarb/lib/pocketsphinx-rev13216/doc/pocketsphinx_batch.1

472 lines
10 KiB
Groff
Raw Normal View History

2015-10-19 19:45:08 +00:00
.TH POCKETSPHINX_BATCH 1 "2007-08-27"
.SH NAME
pocketsphinx_batch \- Run speech recognition in batch mode
.SH SYNOPSIS
.B pocketsphinx_batch
.RI \fB\-ctl\fR
\fIctlfile\fR
\fB\-cepdir\fR
\fIcepdir\fR
\fB\-cepext\fR
\fI.mfc\fR
2015-10-19 19:45:08 +00:00
[\fI options \fR]...
.SH DESCRIPTION
.PP
Run speech recognition over a list of utterances in batchmode. A list
of arguments follows:
.TP
.B \-adchdr
Size of audio file header in bytes (headers are ignored)
.TP
.B \-adcin
Input is raw audio data
.TP
.B \-agc
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
.TP
.B \-agcthresh
Initial threshold for automatic gain control
.TP
.B \-allphone
phoneme decoding with phonetic lm
.TP
.B \-allphone_ci
Perform phoneme decoding with phonetic lm and context-independent units only
.TP
.B \-alpha
Preemphasis parameter
.TP
.B \-argfile
file giving extra arguments.
.TP
.B \-ascale
Inverse of acoustic model scale for confidence score calculation
.TP
.B \-aw
Inverse weight applied to acoustic scores.
.TP
.B \-backtrace
Print results and backtraces to log file.
.TP
.B \-beam
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
.TP
.B \-bestpath
Run bestpath (Dijkstra) search over word lattice (3rd pass)
.TP
.B \-bestpathlw
Language model probability weight for bestpath search
.TP
.B \-build_outdirs
Create missing subdirectories in output directory
.TP
.B \-cepdir
files directory (prefixed to filespecs in control file)
.TP
.B \-cepext
Input files extension (suffixed to filespecs in control file)
.TP
.B \-ceplen
Number of components in the input feature vector
.TP
.B \-cmn
Cepstral mean normalization scheme ('current', 'prior', or 'none')
.TP
.B \-cmninit
Initial values (comma-separated) for cepstral mean when 'prior' is used
.TP
.B \-compallsen
Compute all senone scores in every frame (can be faster when there are many senones)
.TP
.B \-ctl
file listing utterances to be processed
.TP
.B \-ctlcount
No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries)
.TP
.B \-ctlincr
Do every Nth line in the control file
.TP
.B \-ctloffset
No. of utterances at the beginning of \fB\-ctl\fR file to be skipped
.TP
.B \-ctm
output in CTM file format (may require post-sorting)
.TP
.B \-debug
level for debugging messages
.TP
.B \-dict
pronunciation dictionary (lexicon) input file
.TP
.B \-dictcase
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
.TP
.B \-dither
Add 1/2-bit noise
.TP
.B \-doublebw
Use double bandwidth filters (same center freq)
.TP
.B \-ds
Frame GMM computation downsampling ratio
.TP
.B \-fdict
word pronunciation dictionary input file
.TP
.B \-feat
Feature stream type, depends on the acoustic model
.TP
.B \-featparams
containing feature extraction parameters.
.TP
.B \-fillprob
Filler word transition probability
.TP
.B \-frate
Frame rate
.TP
.B \-fsg
format finite state grammar file
.TP
.B \-fsgctl
file listing FSG file to use for each utterance
.TP
.B \-fsgdir
directory for FSG files
.TP
.B \-fsgext
extension for FSG files (including leading dot)
.TP
.B \-fsgusealtpron
Add alternate pronunciations to FSG
.TP
.B \-fsgusefiller
Insert filler words at each state.
.TP
.B \-fwdflat
Run forward flat-lexicon search over word lattice (2nd pass)
.TP
.B \-fwdflatbeam
Beam width applied to every frame in second-pass flat search
.TP
.B \-fwdflatefwid
Minimum number of end frames for a word to be searched in fwdflat search
.TP
.B \-fwdflatlw
Language model probability weight for flat lexicon (2nd pass) decoding
.TP
.B \-fwdflatsfwin
Window of frames in lattice to search for successor words in fwdflat search
.TP
.B \-fwdflatwbeam
Beam width applied to word exits in second-pass flat search
.TP
.B \-fwdtree
Run forward lexicon-tree search (1st pass)
.TP
.B \-hmm
containing acoustic model files.
.TP
.B \-hyp
output file name
.TP
.B \-hypseg
output with segmentation file name
.TP
.B \-input_endian
Endianness of input data, big or little, ignored if NIST or MS Wav
.TP
.B \-jsgf
grammar file
.TP
.B \-keyphrase
to spot
.TP
.B \-kws
file with keyphrases to spot, one per line
.TP
.B \-kws_delay
Delay to wait for best detection score
.TP
.B \-kws_plp
Phone loop probability for keyword spotting
.TP
.B \-kws_threshold
Threshold for p(hyp)/p(alternatives) ratio
.TP
.B \-latsize
Initial backpointer table size
.TP
.B \-lda
containing transformation matrix to be applied to features (single-stream features only)
.TP
.B \-ldadim
Dimensionality of output of feature transformation (0 to use entire matrix)
.TP
.B \-lifter
Length of sin-curve for liftering, or 0 for no liftering.
.TP
.B \-lm
trigram language model input file
.TP
.B \-lmctl
a set of language model
.TP
.B \-lmname
language model in \fB\-lmctl\fR to use by default
.TP
.B \-lmnamectl
file listing LM name to use for each utterance
.TP
.B \-logbase
Base in which all log-likelihoods calculated
.TP
.B \-logfn
to write log messages in
.TP
.B \-logspec
Write out logspectral files instead of cepstra
.TP
.B \-lowerf
Lower edge of filters
.TP
.B \-lpbeam
Beam width applied to last phone in words
.TP
.B \-lponlybeam
Beam width applied to last phone in single-phone words
.TP
.B \-lw
Language model probability weight
.TP
.B \-maxhmmpf
Maximum number of active HMMs to maintain at each frame (or \fB\-1\fR for no pruning)
.TP
.B \-maxwpf
Maximum number of distinct word exits at each frame (or \fB\-1\fR for no pruning)
.TP
.B \-mdef
definition input file
.TP
.B \-mean
gaussian means input file
.TP
.B \-mfclogdir
to log feature files to
.TP
.B \-min_endfr
Nodes ignored in lattice construction if they persist for fewer than N frames
.TP
.B \-mixw
mixture weights input file (uncompressed)
.TP
.B \-mixwfloor
Senone mixture weights floor (applied to data from \fB\-mixw\fR file)
.TP
.B \-mllr
transformation to apply to means and variances
.TP
.B \-mllrctl
file listing MLLR transforms to use for each utterance
.TP
.B \-mllrdir
directory for MLLR transforms
.TP
.B \-mllrext
extension for MLLR transforms (including leading dot)
.TP
.B \-mmap
Use memory-mapped I/O (if possible) for model files
.TP
.B \-nbest
Number of N-best hypotheses to write to \fB\-nbestdir\fR (0 for no N-best)
.TP
.B \-nbestdir
for writing N-best hypothesis lists
.TP
.B \-nbestext
Extension for N-best hypothesis list files
.TP
.B \-ncep
Number of cep coefficients
.TP
.B \-nfft
Size of FFT
.TP
.B \-nfilt
Number of filter banks
.TP
.B \-nwpen
New word transition penalty
.TP
.B \-outlatbeam
Minimum posterior probability for output lattice nodes
.TP
.B \-outlatdir
for dumping word lattices
.TP
.B \-outlatext
Filename extension for dumping word lattices
.TP
.B \-outlatfmt
Format for dumping word lattices (s3 or htk)
.TP
.B \-pbeam
Beam width applied to phone transitions
.TP
.B \-pip
Phone insertion penalty
.TP
.B \-pl_beam
Beam width applied to phone loop search for lookahead
.TP
.B \-pl_pbeam
Beam width applied to phone loop transitions for lookahead
.TP
.B \-pl_pip
Phone insertion penalty for phone loop
.TP
.B \-pl_weight
Weight for phoneme lookahead penalties
.TP
.B \-pl_window
Phoneme lookahead window size, in frames
.TP
.B \-rawlogdir
to log raw audio files to
.TP
.B \-remove_dc
Remove DC offset from each frame
.TP
.B \-remove_noise
Remove noise with spectral subtraction in mel-energies
.TP
.B \-remove_silence
Enables VAD, removes silence frames from processing
.TP
.B \-round_filters
Round mel filter frequencies to DFT points
.TP
.B \-samprate
Sampling rate
.TP
.B \-seed
Seed for random number generator; if less than zero, pick our own
.TP
.B \-sendump
dump (compressed mixture weights) input file
.TP
.B \-senin
Input is senone score dump files
.TP
.B \-senlogdir
to log senone score files to
.TP
.B \-senmgau
to codebook mapping input file (usually not needed)
.TP
.B \-silprob
Silence word transition probability
.TP
.B \-smoothspec
Write out cepstral-smoothed logspectral files
.TP
.B \-svspec
specification (e.g., 24,0-11/25,12-23/26-38 or 0-12/13-25/26-38)
.TP
.B \-tmat
state transition matrix input file
.TP
.B \-tmatfloor
HMM state transition probability floor (applied to \fB\-tmat\fR file)
.TP
.B \-topn
Maximum number of top Gaussians to use in scoring.
.TP
.B \-topn_beam
Beam width used to determine top-N Gaussians (or a list, per-feature)
.TP
.B \-toprule
rule for JSGF (first public rule is default)
.TP
.B \-transform
Which type of transform to use to calculate cepstra (legacy, dct, or htk)
.TP
.B \-unit_area
Normalize mel filters to unit area
.TP
.B \-upperf
Upper edge of filters
.TP
.B \-uw
Unigram weight
.TP
.B \-vad_postspeech
Num of silence frames to keep after from speech to silence.
.TP
.B \-vad_prespeech
Num of speech frames to keep before silence to speech.
.TP
.B \-vad_startspeech
Num of speech frames to trigger vad from silence to speech.
.TP
.B \-vad_threshold
Threshold for decision between noise and silence frames. Log-ratio between signal level and noise level.
.TP
.B \-var
gaussian variances input file
.TP
.B \-varfloor
Mixture gaussian variance floor (applied to data from \fB\-var\fR file)
.TP
.B \-varnorm
Variance normalize each utterance (only if CMN == current)
.TP
.B \-verbose
Show input filenames
.TP
.B \-warp_params
defining the warping function
.TP
.B \-warp_type
Warping function type (or shape)
.TP
.B \-wbeam
Beam width applied to word exits
.TP
.B \-wip
Word insertion penalty
.TP
.B \-wlen
Hamming window length
2015-10-19 19:45:08 +00:00
.PP
To do batchmode recognition, you
2015-10-19 19:45:08 +00:00
will need to specify a control file, using
.B \-ctl
This is a simple text file containing one entry per line. Each entry
is the name of an input file relative to the
.B \-cepdir
directory, and without the filename extension (which is given in the
.B \-cepext
argument).
.PP
If you are using acoustic feature files as input (see
.BR sphinx_fe (1)
for information on how to generate these), you can also specify a subpart
of a file, using the following format:
.PP
.RS
.B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID
.RE
.SH AUTHOR
Written by numerous people at CMU from 1994 onwards. This manual page
by David Huggins-Daines <dhuggins@cs.cmu.edu>
.SH COPYRIGHT
Copyright \(co 1994-2016 Carnegie Mellon University. See the file
\fILICENSE\fR included with this package for more information.
2015-10-19 19:45:08 +00:00
.br
.SH "SEE ALSO"
.BR pocketsphinx_continuous (1),
.BR sphinx_fe (1).
.br