README-work
Lingua::Stem::Portuguese Raku package
Introduction
This Raku package is for stemming Portuguese words. It implements the Snowball algorithm presented in [SNa1].
Usage examples
The PortugueseStem
function is used to find stems:
use Lingua::Stem::Portuguese;
say PortugueseStem('brotação')
PortugueseStem
also works with lists of words:
say PortugueseStem('Os brotos são aguardados com paciência, bebida e bacon.'.words)
The function portuguese-word-stem
can be used as a synonym of PortugueseStem
.
Command Line Interface (CLI)
The package provides the CLI function PortugueseStem
. Here is its usage message:
PortugueseStem --help
Here are example shell commands of using the CLI function PortugueseStem
:
PortugueseStem Boataria
PortugueseStem --format=raku "Módulo Raku que fornece um procedimento para a língua portuguesa."
PortugueseStem Verificar a exatidão da seleção usando dicionários e regras
Here is a pipeline example using the CLI function get-tokens
of the package
"Grammar::TokenProcessing",
[AAp1]:
get-tokens ./DataQueryPhrases-template | PortugueseStem --format=raku
Remark: These kind of tokens (literals) transformations are used in the packages "DSL::Bulgarian", [AAp2], "DSL::Portuguese", [AAp3], and "DSL::Russian", [AAp4],
Implementation notes
Reprogrammed to Raku from : https://github.com/neilb/Lingua-PT-Stemmer/blob/master/lib/Lingua/PT/Stemmer.pm .
TODO
TODO Respect the word case in the returned result.
PortugueseStem('TABLADO')
should return'TABL'
.(Not
'tabl'
as it currently does.)
DONE CLI that can be inserted in UNIX pipelines.
TODO Gallician stemmer.
TODO Performance statistics.
TODO More detailed documentation.
References
Articles
[SNa1] Snowball Team, Portuguese stemming algorithm, (2002), snowball.tartarus.org.
Packages
[AAp1] Anton Antonov, Grammar::TokenProcessing Raku package, (2022), GitHub/antononcube.
[AAp2] Anton Antonov, DSL::Bulgarian Raku package, (2022), GitHub/antononcube.
[AAp3] Anton Antonov, DSL::Portuguese Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, DSL::Russian Raku package, (2022), GitHub/antononcube.