README-work
Lingua::Stem::Russian Raku package
Introduction
This Raku package is for stemming Russian words. It implements the Snowball algorithm presented in [SNa1].
Usage examples
The RussianStem
function is used to find stems:
use Lingua::Stem::Russian;
say RussianStem('всходы')
RussianStem
also works with lists of words:
say RussianStem('Всходы урожая ожидаются с терпением, питьем и беконом.'.words)
The function russian-word-stem
can be used as a synonym of RussianStem
.
Command Line Interface (CLI)
The package provides the CLI function RussianStem
. Here is its usage message:
RussianStem --help
Here are example shell commands of using the CLI function RussianStem
:
RussianStem Какие
RussianStem --format=raku "Модуль Raku, предоставляющий процедуру для русского языка."
RussianStem Проверить корректность подбора по словарям и правилам
Here is a pipeline example using the CLI function get-tokens
of the package
"Grammar::TokenProcessing",
[AAp1]:
get-tokens ./DataQueryPhrases-template | RussianStem --format=raku
# ("ассоциац", "ассоциирован", "ассоциирова", "безопасн", "восходя", "выбер", "заказа", "комбайн", "крестообразн",
# "поверхност", "мутирова", "обзор", "обобщ", "переименова", "пол", "просмотрет", "разгруппирова", "разделител",
# "распла", "расстав", "символ", "слит", "слиян", "сплит", "табулирова", "тольк", "убыва", "уверен", "форм",
# "формат", "формирова", "формул", "широк")
Remark: These kind of tokens (literals) transformations are used in the packages "DSL::Bulgarian", [AAp2], and "DSL::Russian", [AAp3],
Implementation notes
Reprogrammed to Raku from : https://github.com/neilb/Lingua-Stem-Ru/blob/master/lib/Lingua/Stem/Ru.pm .
TODO
DONE Respect the word case in the returned result.
RussianStem('ТАБЛА')
should return'ТАБЛ'
.(Not
'табл'
as it currently does.)
DONE CLI that can be inserted in UNIX pipelines.
TODO Performance statistics.
TODO More detailed documentation.
References
Articles
[SNa1] Snowball Team, Russian stemming algorithm, (2002), snowball.tartarus.org.
Packages
[AAp1] Anton Antonov, Grammar::TokenProcessing Raku package, (2022), GitHub/antononcube.
[AAp2] Anton Antonov, DSL::Bulgarian Raku package, (2022), GitHub/antononcube.
[AAp3] Anton Antonov, DSL::Russian Raku package, (2023), GitHub/antononcube.