README-work

Lingua::Stem::Portuguese Raku package

Introduction

This Raku package is for stemming Portuguese words. It implements the Snowball algorithm presented in [SNa1].

Usage examples

The PortugueseStem function is used to find stems:

use Lingua::Stem::Portuguese;
say PortugueseStem('brotação')

PortugueseStem also works with lists of words:

say PortugueseStem('Os brotos são aguardados com paciência, bebida e bacon.'.words)

The function portuguese-word-stem can be used as a synonym of PortugueseStem.

Command Line Interface (CLI)

The package provides the CLI function PortugueseStem. Here is its usage message:

PortugueseStem --help

Here are example shell commands of using the CLI function PortugueseStem:

PortugueseStem Boataria
PortugueseStem --format=raku "Módulo Raku que fornece um procedimento para a língua portuguesa."
PortugueseStem Verificar a exatidão da seleção usando dicionários e regras

Here is a pipeline example using the CLI function get-tokens of the package "Grammar::TokenProcessing", [AAp1]:

get-tokens ./DataQueryPhrases-template | PortugueseStem --format=raku

Remark: These kind of tokens (literals) transformations are used in the packages "DSL::Bulgarian", [AAp2], "DSL::Portuguese", [AAp3], and "DSL::Russian", [AAp4],

Implementation notes

TODO

  • TODO Respect the word case in the returned result.

    • PortugueseStem('TABLADO') should return 'TABL'.

    • (Not 'tabl' as it currently does.)

  • DONE CLI that can be inserted in UNIX pipelines.

  • TODO Gallician stemmer.

  • TODO Performance statistics.

  • TODO More detailed documentation.

References

Articles

[SNa1] Snowball Team, Portuguese stemming algorithm, (2002), snowball.tartarus.org.

Packages

[AAp1] Anton Antonov, Grammar::TokenProcessing Raku package, (2022), GitHub/antononcube.

[AAp2] Anton Antonov, DSL::Bulgarian Raku package, (2022), GitHub/antononcube.

[AAp3] Anton Antonov, DSL::Portuguese Raku package, (2023), GitHub/antononcube.

[AAp3] Anton Antonov, DSL::Russian Raku package, (2022), GitHub/antononcube.

Lingua::Stem::Portuguese v0.1.0

Package for stemming Portuguese words.

Authors

  • Anton Antonov

License

Artistic-2.0

Dependencies

Test Dependencies

Provides

  • Lingua::Stem::Portuguese

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.