Algorithm::Viterbi
NAME
Algorithm::Viterbi - Decoding HMMs
DESCRIPTION
This module is a fairly straightforward implementation of Viterbi's algorithm for decoding hidden Markov models. The code is based on a Common Lisp implementation I wrote as coursework, itself based on pseudo-code from Jurafsky & Martin - Speech and language processing (2nd ed).
SYNOPSIS
use Algorithm::Viterbi;
my Algorithm::Viterbi $hmm .= new(:alphabet<H C>);
$hmm.train("training-data.tt"); # Train from file
$hmm.train([ [a => 1, b => 2, a => 1],
[b => 3, c => 1, a => 2] ]); # Train from hardcoded data
$hmm.decode(<a b c>);
FIELDS
=over 4
%.p-transition
The transition probabilities. A hash of hashes, indexed by tag name.
%.p-emission
The emission probabilities for a given tag. A hash of hashes, indexed first by tag, then by observation.
=back
METHODS
=over 4
method new(:@alphabet!, :%p-transition, :%p-emission)
The alphabet parameter is required (an alphabet-less HMM doesn't make too much
sense). The transition and emission probabilities are also required for
correct operation of decode
, but can be specified either on construction,
with the train
method, or by manual specification via the corresponding
fields.
method decode(Str @input)
The decode
method decodes the input according to the probabilities
specified in the %.p-transition
and %.p-emission
fields.
method train(Str $file)
Computes unsmoothed bigram probabilities from an input file. The input format is described by this grammar:
grammar G {
token TOP { <chunk>+ }
token chunk { <record>+ \n }
token record { \w+ \t \w+ \n }
}
The records are observation, then the associated tag.
method train(Array of Pair @data)
Computes unsmoothed bigram probabilities from an Array of Array of Pairs. Each pair is a single observation-tag pair, and each element of the top-level array is a sequence that is learnt.
=back
AUTHOR
Arne Skjærholt - mailto:[email protected].