Lingua::EN::Sentence

Module for splitting text into sentences

Lingua::EN::Sentence - Module for splitting text into sentences.

SYNOPSIS

use Lingua::EN::Sentence;
add_acronyms('lt','gen');  ## adding support for 'Lt. Gen.'

$text =
Q[First sentence with some abbreviations,  Mr. J. Smith, 2 Jones St. SomeTown Ariz. U.S.A. is an address.
Sentence 2: Sequences like ellipsis ... are handled. Sentence 3, numbered sections such as point 1. are ok.];
my @sentences = $text.sentences;
for @sentences -> $sub-element {
    say $sub-element;
}
#`[
Output is:
First sentence with some abbreviations,  Mr. J. Smith, 2 Jones St. SomeTown Ariz. U.S.A. is an address.
Sentence 2: Sequences like ellipsis ...  are handled.
Sentence 3, numbered sections such as point 1. are ok.
]

DESCRIPTION

The Lingua::EN::Sentence module contains the method sentences, which splits text into its constituent sentences, based on regular expressions, a list of abbreviations (built in and given) and other rules.

Certain well know exceptions, such as abbreviations like Mr., Calif. and Ave. will cause incorrect segmentations. But many of these are already integrated into this code and are being taken care of. Note that abbreviations are case sensitive.

The add_acronyms method alows you to add custom abbreviations.

Lingua::EN::Sentence v0.1.4

Module for splitting text into sentences

Authors

    License

    Artistic-2.0

    Dependencies

    Test Dependencies

    Provides

    • Lingua::EN::Sentence

    The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.