README-work
Raku DSL::Bulgarian
In brief
This Raku package facilitates the specification computational workflows using natural language commands in Bulgarian.
Using the Domain Specific Languages (DSLs) executable code is generated for different programming languages: Julia, Python, R, Raku, Wolfram Language.
Translation to other natural languages is also done: English, Korean, Russian, Spanish.
Data query (wrangling) workflows
Translate Bulgarian data wrangling specifications to different natural and programming languages:
use DSL::English::DataQueryWorkflows;
my $command = '
Π·Π°ΡΠ΅Π΄ΠΈ Π΄Π°Π½Π½ΠΈΡΠ΅ iris;
Π²Π·Π΅ΠΌΠΈ Π΅Π»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ΅ ΠΎΡ 1 Π΄ΠΎ 120;
ΡΠΈΠ»ΡΡΠΈΡΠ°ΠΉ ΡΡΠ΅Π· Sepal.Width Π΅ ΠΏΠΎ-Π³ΠΎΠ»ΡΠΌΠΎ ΠΎΡ 2.4 ΠΈ Petal.Length Π΅ ΠΏΠΎ-ΠΌΠ°Π»ΠΊΠΎ ΠΎΡ 5.5;
Π³ΡΡΠΏΠΈΡΠ°ΠΉ Ρ ΠΊΠΎΠ»ΠΎΠ½Π°ΡΠ° Species;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π·ΠΌΠ΅ΡΠΈΡΠ΅
';
for <English Python::pandas Raku::Reshapers Spanish Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToDataQueryWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
Classification workflows
use DSL::English::ClassificationWorkflows;
my $command = '
ΠΈΠ·ΠΏΠΎΠ»Π·Π²Π°ΠΉ dfTitanic;
ΡΠ°Π·Π΄Π΅Π»ΠΈ Π΄Π°Π½Π½ΠΈΡΠ΅ Ρ ΡΠ΅ΠΏΠ΅ΡΠΎ ΡΡΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠ΅ 0.82;
Π½Π°ΠΏΡΠ°Π²ΠΈ gradient boosted trees ΠΊΠ»Π°ΡΠΈΡΠΈΠΊΠ°ΡΠΎΡ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ TruePositiveRate ΠΈ FalsePositiveRate;
';
for <English Russian WL::ClCon> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToClassificationWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
Latent Semantic Analysis
use DSL::English::LatentSemanticAnalysisWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ ΡΡΡ textHamlet;
Π½Π°ΠΏΡΠ°Π²ΠΈ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ-ΡΠ΅ΡΠΌΠΈΠ½ ΠΌΠ°ΡΡΠΈΡΠ° ΡΡΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ½ΠΈ ΡΡΠΎΠΏ Π΄ΡΠΌΠΈ;
ΠΏΡΠΈΠ»ΠΎΠΆΠΈ LSI ΡΡΠ½ΠΊΡΠΈΠΈΡΠ΅ IDF, TermFrequency, ΠΈ Cosine;
ΠΈΠ·Π²Π°Π΄ΠΈ 12 ΡΠ΅ΠΌΠΈ ΡΡΠ΅Π· NNMF ΠΈ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»Π΅Π½ Π±ΡΠΎΠΉ ΡΡΡΠΏΠΊΠΈ 12;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π±Π»ΠΈΡΠ° Π½Π° ΡΠ΅ΠΌΠΈΡΠ΅ Ρ 12 ΡΠ΅ΡΠΌΠΈΠ½Π°;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English Python::LSAMon R::LSAMon Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToLatentSemanticAnalysisWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
Quantile Regression Workflows
use DSL::English::QuantileRegressionWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ Ρ dfTemperatureData;
ΠΏΡΠ΅ΠΌΠ°Ρ
Π½ΠΈ Π»ΠΈΠΏΡΠ²Π°ΡΠΈΡΠ΅ ΡΡΠΎΠΉΠ½ΠΎΡΡΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄Π°Π½Π½ΠΎΠ²ΠΎ ΠΎΠ±ΠΎΠ±ΡΠ΅Π½ΠΈΠ΅;
ΠΏΡΠ΅ΠΌΠ°ΡΠ°Π±ΠΈΡΠ°ΠΉ Π΄Π²Π΅ΡΠ΅ ΠΎΡΠΈ;
ΠΈΠ·ΡΠΈΡΠ»ΠΈ ΠΊΠ²Π°Π½ΡΠΈΠ»Π½Π° ΡΠ΅Π³ΡΠ΅ΡΠΈΡ Ρ 20 Π²ΡΠ·Π΅Π»Π° ΠΈ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠΈ ΠΎΡ 0.1 Π΄ΠΎ 0.9 ΡΡΡ ΡΡΡΠΏΠΊΠ° 0.1;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄ΠΈΠ°Π³ΡΠ°ΠΌΠ° Ρ Π΄Π°ΡΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΡΡΠ΅ΠΆ Π½Π° Π°Π±ΡΠΎΠ»ΡΡΠ½ΠΈΡΠ΅ Π³ΡΠ΅ΡΠΊΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English R::QRMon Russian WL::QRMon> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToQuantileRegressionWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
Recommender workflows
use DSL::English::RecommenderWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ ΡΡΠ΅Π· dfTitanic;
ΠΏΡΠ΅ΠΏΠΎΡΡΡΠ°ΠΉ ΡΡΡ ΠΏΡΠΎΡΠΈΠ»Π° "male" ΠΈ "died";
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English Python::SMRMon R::SMRMon Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToRecommenderWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
Implementation notes
The rules in the file "DataQueryPhrases.rakumod" are derived from file "DataQueryPhrases-template" using the package "Grammar::TokenProcessing" , [AAp3].
In order to have Bulgarian commands parsed and interpreted into code the steps taken were split into four phases:
Utilities preparation
Bulgarian words and phrases addition and preparation
Preliminary functionality experiments
Packages code refactoring
Utilities preparation
Since the beginning of the work on translation of the computational DSLs into programming code it was clear that some the required code transformations have to be automated.
While doing the preparation work -- and in general, while the DSL-translation work matured -- it became clear that there are several directives to follow:
Make and use Command Line Interface (CLI) scripts that do code transformation or generation.
Adhere to of the Eric Raymond's 17 Unix Rules, [Wk1]:
Make data complicated when required, not the program
Write abstract programs that generate code instead of writing code by hand
In order to facilitate the "from Bulgarian" project the package "Grammar::TokenProcessing", [AAp3], was "finalized." The initial versions of that package were used from the very beginning of the DSLs grammar development in order to facilitate handling of misspellings.
(Current) recipe
This sub-section lists the steps for endowing a certain already developed workflows DSL package with Bulgarian translations.
Denote the DSL workflows we focus on as DOMAIN (workflows.)
For example, DOMAIN can stand for DataQueryWorkflows
, or RecommenderWorkflows
.
Remark: In the recipe steps below DOMAIN would be DataQueryWorkflows
It is assumed that:
DOMAIN in English are already developed.
Since both English and Bulgarian are analytical, non-agglutinative languages "just" replacing English words with Bulgarian words in DOMAIN would produce good enough parsers of Bulgarian.
Here are the steps:
Add global Bulgarian words (optional)
Add Bulgarian words and phrases in the DSL::Shared file "Roles/Bulgarian/CommonSpeechParts-template".
Generate the file Roles/Bulgarian/CommonSpeechParts.rakumod using the CLI script AddFuzzyMatching
Consider translating, changing, or refactoring global files, like, Roles/English/TimeIntervalSpec.rakumod
Translate DOMAIN English words and phrases into Bulgarian
Take the file DOMAIN/Grammar/DOMAIN-template and translate its words into Bulgarian
Add the corresponding files into DSL::Bulgarian, [AAp1].
Use the
DOMAIN/Grammarish.rakumod
role.The English DOMAIN package should have such rule. If do not do the corresponding code refactoring.
Test with implemented DOMAIN languages.
See the example grammar and role in DataQueryWorkflows in DSL::Bulgarian.
References
Articles
[AA1] Anton Antonov, "Introduction to data wrangling with Raku", (2021), RakuForPrediction at WordPress.
[Wk1] Wikipedia entry, UNIX-philosophy rules.
Packages
[AAp1] Anton Antonov, DSL::Bulgarian, Raku package, (2022), GitHub/antononcube.
[AAp2] Anton Antonov, DSL::Shared, Raku package, (2018-2022), GitHub/antononcube.
[AAp3] Anton Antonov, Grammar::TokenProcessing, Raku project (2022), GitHub/antononcube.
[AAp4] Anton Antonov, DSL::English::ClassificationWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp5] Anton Antonov, DSL::English::DataQueryWorkflows, Raku package, (2020-2022), GitHub/antononcube.
[AAp6] Anton Antonov, DSL::English::LatentSemanticAnalysisWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp7] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp8] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.