251 lines
5.6 KiB
Groff
251 lines
5.6 KiB
Groff
|
.TH POCKETSPHINX_BATCH 1 "2007-08-27"
|
||
|
.SH NAME
|
||
|
pocketsphinx_batch \- Run speech recognition in batch mode
|
||
|
.SH SYNOPSIS
|
||
|
.B pocketsphinx_batch
|
||
|
.RI \fB\-hmm\fR
|
||
|
\fIhmmdir\fR
|
||
|
\fB\-dict\fR
|
||
|
\fIdictfile\fR
|
||
|
[\fI options \fR]...
|
||
|
.SH DESCRIPTION
|
||
|
.PP
|
||
|
Run speech recognition over a list of utterances in batchmode. A list
|
||
|
of arguments follows:
|
||
|
.TP
|
||
|
.B \-adchdr
|
||
|
Size of audio file header in bytes (headers are ignored)
|
||
|
.TP
|
||
|
.B \-adcin
|
||
|
Input is raw audio data
|
||
|
.TP
|
||
|
.B \-agc
|
||
|
Automatic gain control for c0 ('max', 'emax', 'noise', or 'none')
|
||
|
.TP
|
||
|
.B \-agcthresh
|
||
|
Initial threshold for automatic gain control
|
||
|
.TP
|
||
|
.B \-allphone
|
||
|
phoneme decoding with phonetic lm
|
||
|
.TP
|
||
|
.B \-allphone_ci
|
||
|
Perform phoneme decoding with phonetic lm and context-independent units only
|
||
|
.TP
|
||
|
.B \-alpha
|
||
|
Preemphasis parameter
|
||
|
.TP
|
||
|
.B \-argfile
|
||
|
file giving extra arguments.
|
||
|
.TP
|
||
|
.B \-ascale
|
||
|
Inverse of acoustic model scale for confidence score calculation
|
||
|
.TP
|
||
|
.B \-aw
|
||
|
Inverse weight applied to acoustic scores.
|
||
|
.TP
|
||
|
.B \-backtrace
|
||
|
Print results and backtraces to log file.
|
||
|
.TP
|
||
|
.B \-beam
|
||
|
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
|
||
|
.TP
|
||
|
.B \-bestpath
|
||
|
Run bestpath (Dijkstra) search over word lattice (3rd pass)
|
||
|
.TP
|
||
|
.B \-bestpathlw
|
||
|
Language model probability weight for bestpath search
|
||
|
.TP
|
||
|
.B \-build_outdirs
|
||
|
Create missing subdirectories in output directory
|
||
|
.TP
|
||
|
.B \-cepdir
|
||
|
files directory (prefixed to filespecs in control file)
|
||
|
.TP
|
||
|
.B \-cepext
|
||
|
Input files extension (suffixed to filespecs in control file)
|
||
|
.TP
|
||
|
.B \-ceplen
|
||
|
Number of components in the input feature vector
|
||
|
.TP
|
||
|
.B \-cmn
|
||
|
Cepstral mean normalization scheme ('current', 'prior', or 'none')
|
||
|
.TP
|
||
|
.B \-cmninit
|
||
|
Initial values (comma-separated) for cepstral mean when 'prior' is used
|
||
|
.TP
|
||
|
.B \-compallsen
|
||
|
Compute all senone scores in every frame (can be faster when there are many senones)
|
||
|
.TP
|
||
|
.B \-ctl
|
||
|
file listing utterances to be processed
|
||
|
.TP
|
||
|
.B \-ctlcount
|
||
|
No. of utterances to be processed (after skipping \fB\-ctloffset\fR entries)
|
||
|
.TP
|
||
|
.B \-ctlincr
|
||
|
Do every Nth line in the control file
|
||
|
.TP
|
||
|
.B \-ctloffset
|
||
|
No. of utterances at the beginning of \fB\-ctl\fR file to be skipped
|
||
|
.TP
|
||
|
.B \-ctm
|
||
|
output in CTM file format (may require post-sorting)
|
||
|
.TP
|
||
|
.B \-debug
|
||
|
level for debugging messages
|
||
|
.TP
|
||
|
.B \-dict
|
||
|
pronunciation dictionary (lexicon) input file
|
||
|
.TP
|
||
|
.B \-dictcase
|
||
|
Dictionary is case sensitive (NOTE: case insensitivity applies to ASCII characters only)
|
||
|
.TP
|
||
|
.B \-dither
|
||
|
Add 1/2-bit noise
|
||
|
.TP
|
||
|
.B \-doublebw
|
||
|
Use double bandwidth filters (same center freq)
|
||
|
.TP
|
||
|
.B \-ds
|
||
|
Frame GMM computation downsampling ratio
|
||
|
.TP
|
||
|
.B \-fdict
|
||
|
word pronunciation dictionary input file
|
||
|
.TP
|
||
|
.B \-feat
|
||
|
Feature stream type, depends on the acoustic model
|
||
|
.TP
|
||
|
.B \-featparams
|
||
|
containing feature extraction parameters.
|
||
|
.TP
|
||
|
.B \-fillprob
|
||
|
Filler word transition probability
|
||
|
.TP
|
||
|
.B \-frate
|
||
|
Frame rate
|
||
|
.TP
|
||
|
.B \-fsg
|
||
|
format finite state grammar file
|
||
|
.TP
|
||
|
.B \-fsgctl
|
||
|
file listing FSG file to use for each utterance
|
||
|
.TP
|
||
|
.B \-fsgdir
|
||
|
directory for FSG files
|
||
|
.TP
|
||
|
.B \-fsgext
|
||
|
extension for FSG files (including leading dot)
|
||
|
.TP
|
||
|
.B \-fsgusealtpron
|
||
|
Add alternate pronunciations to FSG
|
||
|
.TP
|
||
|
.B \-fsgusefiller
|
||
|
Insert filler words at each state.
|
||
|
.TP
|
||
|
.B \-fwdflat
|
||
|
Run forward flat-lexicon search over word lattice (2nd pass)
|
||
|
.TP
|
||
|
.B \-fwdflatbeam
|
||
|
Beam width applied to every frame in second-pass flat search
|
||
|
.TP
|
||
|
.B \-fwdflatefwid
|
||
|
Minimum number of end frames for a word to be searched in fwdflat search
|
||
|
.TP
|
||
|
.B \-fwdflatlw
|
||
|
Language model probability weight for flat lexicon (2nd pass) decoding
|
||
|
.TP
|
||
|
.B \-fwdflatsfwin
|
||
|
Window of frames in lattice to search for successor words in fwdflat search
|
||
|
.TP
|
||
|
.B \-fwdflatwbeam
|
||
|
Beam width applied to word exits in second-pass flat search
|
||
|
.TP
|
||
|
.B \-fwdtree
|
||
|
Run forward lexicon-tree search (1st pass)
|
||
|
.TP
|
||
|
.B \-hmm
|
||
|
containing acoustic model files.
|
||
|
.TP
|
||
|
.B \-hyp
|
||
|
output file name
|
||
|
.TP
|
||
|
.B \-hypseg
|
||
|
output with segmentation file name
|
||
|
.TP
|
||
|
.B \-input_endian
|
||
|
Endianness of input data, big or little, ignored if NIST or MS Wav
|
||
|
.TP
|
||
|
.B \-jsgf
|
||
|
grammar file
|
||
|
.TP
|
||
|
.B \-keyphrase
|
||
|
to spot
|
||
|
.TP
|
||
|
.B \-kws
|
||
|
file with keyphrases to spot, one per line
|
||
|
.TP
|
||
|
.B \-kws_delay
|
||
|
Delay to wait for best detection score
|
||
|
.TP
|
||
|
.B \-kws_plp
|
||
|
Phone loop probability for keyword spotting
|
||
|
.TP
|
||
|
.B \-kws_threshold
|
||
|
Threshold for p(hyp)/p(alternatives) ratio
|
||
|
.TP
|
||
|
.B \-latsize
|
||
|
Initial backpointer table size
|
||
|
.TP
|
||
|
.B \-lda
|
||
|
containing transformation matrix to be applied to features (single-stream features only)
|
||
|
.TP
|
||
|
.B \-ldadim
|
||
|
Dimensionality of output of feature transformation (0 to use entire matrix)
|
||
|
.TP
|
||
|
.B \-lifter
|
||
|
Length of sin-curve for liftering, or 0 for no liftering.
|
||
|
.TP
|
||
|
.B \-lm
|
||
|
trigram language model input file
|
||
|
.TP
|
||
|
.B \-lmctl
|
||
|
a set of language model
|
||
|
.PP
|
||
|
The
|
||
|
.B \-hmm
|
||
|
and
|
||
|
.B \-dict
|
||
|
arguments are always required. Either
|
||
|
.B \-lm
|
||
|
or
|
||
|
.B \-fsg
|
||
|
is required, depending on whether you are using a statistical language
|
||
|
model or a finite-state grammar. To do batchmode recognition, you
|
||
|
will need to specify a control file, using
|
||
|
.B \-ctl
|
||
|
This is a simple text file containing one entry per line. Each entry
|
||
|
is the name of an input file relative to the
|
||
|
.B \-cepdir
|
||
|
directory, and without the filename extension (which is given in the
|
||
|
.B \-cepext
|
||
|
argument).
|
||
|
.PP
|
||
|
If you are using acoustic feature files as input (see
|
||
|
.BR sphinx_fe (1)
|
||
|
for information on how to generate these), you can also specify a subpart
|
||
|
of a file, using the following format:
|
||
|
.PP
|
||
|
.RS
|
||
|
.B FILENAME START\-FRAME END\-FRAME UTTERANCE-ID
|
||
|
.RE
|
||
|
.SH AUTHOR
|
||
|
Written by numerous people at CMU from 1994 onwards. This manual page
|
||
|
by David Huggins-Daines <dhuggins@cs.cmu.edu>
|
||
|
.SH COPYRIGHT
|
||
|
Copyright \(co 1994-2007 Carnegie Mellon University. See the file
|
||
|
\fICOPYING\fR included with this package for more information.
|
||
|
.br
|
||
|
.SH "SEE ALSO"
|
||
|
.BR pocketsphinx_continuous (1),
|
||
|
.BR sphinx_fe (1).
|
||
|
.br
|