Algorithm::Viterbi
NAME
Algorithm::Viterbi - Decoding HMMs
DESCRIPTION
This module is a fairly straightforward implementation of Viterbi's algorithm for decoding hidden Markov models. The code is based on a Common Lisp implementation I wrote as coursework, itself based on pseudo-code from Jurafsky & Martin - Speech and language processing (2nd ed).
SYNOPSIS
use Algorithm::Viterbi;
my Algorithm::Viterbi $hmm .= new(:alphabet<H C>);
$hmm.train("training-data.tt"); # Train from file
$hmm.train([ [a => 1, b => 2, a => 1],
[b => 3, c => 1, a => 2] ]); # Train from hardcoded data
$hmm.decode(<a b c>);
FIELDS
=over 4
%.p-transition
The transition probabilities. A hash of hashes, indexed by tag name.
%.p-emission
The emission probabilities for a given tag. A hash of hashes, indexed first by tag, then by observation.
=back
METHODS
=over 4
method new(:@alphabet!, :%p-transition, :%p-emission)
The alphabet parameter is required (an alphabet-less HMM doesn't make too much
sense). The transition and emission probabilities are also required for
correct operation of decode, but can be specified either on construction,
with the train method, or by manual specification via the corresponding
fields.
method decode(Str @input)
The decode method decodes the input according to the probabilities
specified in the %.p-transition and %.p-emission fields.
method train(Str $file)
Computes unsmoothed bigram probabilities from an input file. The input format is described by this grammar:
grammar G {
token TOP { <chunk>+ }
token chunk { <record>+ \n }
token record { \w+ \t \w+ \n }
}
The records are observation, then the associated tag.
method train(Array of Pair @data)
Computes unsmoothed bigram probabilities from an Array of Array of Pairs. Each pair is a single observation-tag pair, and each element of the top-level array is a sequence that is learnt.
=back
AUTHOR
Arne Skjærholt - mailto:[email protected].