Algorithm::Viterbi

HMM decoding with the Viterbi algorithm.

NAME

Algorithm::Viterbi - Decoding HMMs

DESCRIPTION

This module is a fairly straightforward implementation of Viterbi's algorithm for decoding hidden Markov models. The code is based on a Common Lisp implementation I wrote as coursework, itself based on pseudo-code from Jurafsky & Martin - Speech and language processing (2nd ed).

SYNOPSIS


    use Algorithm::Viterbi;

    my Algorithm::Viterbi $hmm .= new(:alphabet<H C>);
    $hmm.train("training-data.tt"); # Train from file
    $hmm.train([ [a => 1, b => 2, a => 1],
                 [b => 3, c => 1, a => 2] ]); # Train from hardcoded data
    $hmm.decode(<a b c>);

FIELDS

=over 4

  • %.p-transition

The transition probabilities. A hash of hashes, indexed by tag name.

  • %.p-emission

The emission probabilities for a given tag. A hash of hashes, indexed first by tag, then by observation.

=back

METHODS

=over 4

  • method new(:@alphabet!, :%p-transition, :%p-emission)

The alphabet parameter is required (an alphabet-less HMM doesn't make too much sense). The transition and emission probabilities are also required for correct operation of decode, but can be specified either on construction, with the train method, or by manual specification via the corresponding fields.

  • method decode(Str @input)

The decode method decodes the input according to the probabilities specified in the %.p-transition and %.p-emission fields.

  • method train(Str $file)

Computes unsmoothed bigram probabilities from an input file. The input format is described by this grammar:


    grammar G {
        token TOP { <chunk>+ }
        token chunk { <record>+ \n }
        token record { \w+ \t \w+ \n }
    }

The records are observation, then the associated tag.

  • method train(Array of Pair @data)

Computes unsmoothed bigram probabilities from an Array of Array of Pairs. Each pair is a single observation-tag pair, and each element of the top-level array is a sequence that is learnt.

=back

AUTHOR

Arne Skjærholt - mailto:[email protected].

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.