README-work

Raku DSL::Bulgarian

In brief

This Raku package facilitates the specification computational workflows using natural language commands in Bulgarian.

Using the Domain Specific Languages (DSLs) executable code is generated for different programming languages: Julia, Python, R, Raku, Wolfram Language.

Translation to other natural languages is also done: English, Korean, Russian, Spanish.

Data query (wrangling) workflows

Translate Bulgarian data wrangling specifications to different natural and programming languages:

use DSL::English::DataQueryWorkflows;

my $command = '
Π·Π°Ρ€Π΅Π΄ΠΈ Π΄Π°Π½Π½ΠΈΡ‚Π΅ iris;
Π²Π·Π΅ΠΌΠΈ Π΅Π»Π΅ΠΌΠ΅Π½Ρ‚ΠΈΡ‚Π΅ ΠΎΡ‚ 1 Π΄ΠΎ 120;
Ρ„ΠΈΠ»Ρ‚Ρ€ΠΈΡ€Π°ΠΉ Ρ‡Ρ€Π΅Π· Sepal.Width Π΅ ΠΏΠΎ-голямо ΠΎΡ‚ 2.4 ΠΈ Petal.Length Π΅ ΠΏΠΎ-ΠΌΠ°Π»ΠΊΠΎ ΠΎΡ‚ 5.5;
Π³Ρ€ΡƒΠΏΠΈΡ€Π°ΠΉ с ΠΊΠΎΠ»ΠΎΠ½Π°Ρ‚Π° Species;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ€Π°Π·ΠΌΠ΅Ρ€ΠΈΡ‚Π΅
';
for <English Python::pandas Raku::Reshapers Spanish Russian> -> $t {
   say '=' x 60, "\n", $t, "\n", '-' x 60;
   say ToDataQueryWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}

Classification workflows

use DSL::English::ClassificationWorkflows;

my $command = '
ΠΈΠ·ΠΏΠΎΠ»Π·Π²Π°ΠΉ dfTitanic;
Ρ€Π°Π·Π΄Π΅Π»ΠΈ Π΄Π°Π½Π½ΠΈΡ‚Π΅ с Ρ†Π΅ΠΏΠ΅Ρ‰ΠΎ ΡΡŠΠΎΡ‚Π½ΠΎΡˆΠ΅Π½ΠΈΠ΅ 0.82;
Π½Π°ΠΏΡ€Π°Π²ΠΈ gradient boosted trees класификатор;
ΠΏΠΎΠΊΠ°ΠΆΠΈ TruePositiveRate ΠΈ FalsePositiveRate;
';

for <English Russian WL::ClCon> -> $t {
    say '=' x 60, "\n", $t, "\n", '-' x 60;
    say ToClassificationWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}

Latent Semantic Analysis

use DSL::English::LatentSemanticAnalysisWorkflows;

my $command = '
създай със textHamlet;
Π½Π°ΠΏΡ€Π°Π²ΠΈ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚-Ρ‚Π΅Ρ€ΠΌΠΈΠ½ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Π° със Π°Π²Ρ‚ΠΎΠΌΠ°Ρ‚ΠΈΡ‡Π½ΠΈ стоп Π΄ΡƒΠΌΠΈ;
ΠΏΡ€ΠΈΠ»ΠΎΠΆΠΈ LSI Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈΡ‚Π΅ IDF, TermFrequency, ΠΈ Cosine;
ΠΈΠ·Π²Π°Π΄ΠΈ 12 Ρ‚Π΅ΠΌΠΈ Ρ‡Ρ€Π΅Π· NNMF ΠΈ максималСн Π±Ρ€ΠΎΠΉ ΡΡ‚ΡŠΠΏΠΊΠΈ 12;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Π°  Π½Π° Ρ‚Π΅ΠΌΠΈΡ‚Π΅ с 12 Ρ‚Π΅Ρ€ΠΌΠΈΠ½Π°;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΊΡƒΡ‰Π°Ρ‚Π° Π»Π΅Π½Ρ‚ΠΎΠ²Π° стойност
';

for <English Python::LSAMon R::LSAMon Russian> -> $t {
    say '=' x 60, "\n", $t, "\n", '-' x 60;
    say ToLatentSemanticAnalysisWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}

Quantile Regression Workflows

use DSL::English::QuantileRegressionWorkflows;

my $command = '
създай с dfTemperatureData;
ΠΏΡ€Π΅ΠΌΠ°Ρ…Π½ΠΈ липсващитС стойности;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄Π°Π½Π½ΠΎΠ²ΠΎ ΠΎΠ±ΠΎΠ±Ρ‰Π΅Π½ΠΈΠ΅;
ΠΏΡ€Π΅ΠΌΠ°Ρ‰Π°Π±ΠΈΡ€Π°ΠΉ Π΄Π²Π΅Ρ‚Π΅ оси;
изчисли ΠΊΠ²Π°Π½Ρ‚ΠΈΠ»Π½Π° рСгрСсия с 20 възСла ΠΈ вСроятности ΠΎΡ‚ 0.1 Π΄ΠΎ 0.9 със ΡΡ‚ΡŠΠΏΠΊΠ° 0.1;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄ΠΈΠ°Π³Ρ€Π°ΠΌΠ° с Π΄Π°Ρ‚ΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‡Π΅Ρ€Ρ‚Π΅ΠΆ Π½Π° Π°Π±ΡΠΎΠ»ΡŽΡ‚Π½ΠΈΡ‚Π΅ Π³Ρ€Π΅ΡˆΠΊΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΊΡƒΡ‰Π°Ρ‚Π° Π»Π΅Π½Ρ‚ΠΎΠ²Π° стойност
';

for <English R::QRMon Russian WL::QRMon> -> $t {
    say '=' x 60, "\n", $t, "\n", '-' x 60;
    say ToQuantileRegressionWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}

Recommender workflows

use DSL::English::RecommenderWorkflows;

my $command = '
създай Ρ‡Ρ€Π΅Π· dfTitanic;
ΠΏΡ€Π΅ΠΏΠΎΡ€ΡŠΡ‡Π°ΠΉ със ΠΏΡ€ΠΎΡ„ΠΈΠ»Π° "male" ΠΈ "died";
ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π΅ΠΊΡƒΡ‰Π°Ρ‚Π° Π»Π΅Π½Ρ‚ΠΎΠ²Π° стойност
';

for <English Python::SMRMon R::SMRMon Russian> -> $t {
    say '=' x 60, "\n", $t, "\n", '-' x 60;
    say ToRecommenderWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}

Implementation notes

The rules in the file "DataQueryPhrases.rakumod" are derived from file "DataQueryPhrases-template" using the package "Grammar::TokenProcessing" , [AAp3].

In order to have Bulgarian commands parsed and interpreted into code the steps taken were split into four phases:

  1. Utilities preparation

  2. Bulgarian words and phrases addition and preparation

  3. Preliminary functionality experiments

  4. Packages code refactoring

Utilities preparation

Since the beginning of the work on translation of the computational DSLs into programming code it was clear that some the required code transformations have to be automated.

While doing the preparation work -- and in general, while the DSL-translation work matured -- it became clear that there are several directives to follow:

  1. Make and use Command Line Interface (CLI) scripts that do code transformation or generation.

  2. Adhere to of the Eric Raymond's 17 Unix Rules, [Wk1]:

    • Make data complicated when required, not the program

    • Write abstract programs that generate code instead of writing code by hand

In order to facilitate the "from Bulgarian" project the package "Grammar::TokenProcessing", [AAp3], was "finalized." The initial versions of that package were used from the very beginning of the DSLs grammar development in order to facilitate handling of misspellings.

(Current) recipe

This sub-section lists the steps for endowing a certain already developed workflows DSL package with Bulgarian translations.

Denote the DSL workflows we focus on as DOMAIN (workflows.) For example, DOMAIN can stand for DataQueryWorkflows, or RecommenderWorkflows.

Remark: In the recipe steps below DOMAIN would be DataQueryWorkflows

It is assumed that:

  • DOMAIN in English are already developed.

  • Since both English and Bulgarian are analytical, non-agglutinative languages "just" replacing English words with Bulgarian words in DOMAIN would produce good enough parsers of Bulgarian.

Here are the steps:

  1. Add global Bulgarian words (optional)

    1. Add Bulgarian words and phrases in the DSL::Shared file "Roles/Bulgarian/CommonSpeechParts-template".

    2. Generate the file Roles/Bulgarian/CommonSpeechParts.rakumod using the CLI script AddFuzzyMatching

    3. Consider translating, changing, or refactoring global files, like, Roles/English/TimeIntervalSpec.rakumod

  2. Translate DOMAIN English words and phrases into Bulgarian

    1. Take the file DOMAIN/Grammar/DOMAIN-template and translate its words into Bulgarian

  3. Add the corresponding files into DSL::Bulgarian, [AAp1].

    1. Use the DOMAIN/Grammarish.rakumod role.

      • The English DOMAIN package should have such rule. If do not do the corresponding code refactoring.

    2. Test with implemented DOMAIN languages.

    3. See the example grammar and role in DataQueryWorkflows in DSL::Bulgarian.

References

Articles

[AA1] Anton Antonov, "Introduction to data wrangling with Raku", (2021), RakuForPrediction at WordPress.

[Wk1] Wikipedia entry, UNIX-philosophy rules.

Packages

[AAp1] Anton Antonov, DSL::Bulgarian, Raku package, (2022), GitHub/antononcube.

[AAp2] Anton Antonov, DSL::Shared, Raku package, (2018-2022), GitHub/antononcube.

[AAp3] Anton Antonov, Grammar::TokenProcessing, Raku project (2022), GitHub/antononcube.

[AAp4] Anton Antonov, DSL::English::ClassificationWorkflows, Raku package, (2018-2022), GitHub/antononcube.

[AAp5] Anton Antonov, DSL::English::DataQueryWorkflows, Raku package, (2020-2022), GitHub/antononcube.

[AAp6] Anton Antonov, DSL::English::LatentSemanticAnalysisWorkflows, Raku package, (2018-2022), GitHub/antononcube.

[AAp7] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.

[AAp8] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.

DSL::Bulgarian v0.1.0

Computational workflows building by natural language commands in Bulgarian.

Authors

  • Anton Antonov

License

GPL-3.0-or-later

Dependencies

DSL::Shared:ver<0.1.2+>DSL::English::ClassificationWorkflows:<0.8.0+>DSL::English::DataQueryWorkflows:ver<0.5.9+>DSL::English::LatentSemanticAnalysisWorkflows:ver<0.8.0+>DSL::English::QuantileRegressionWorkflows:<0.8.0+>DSL::English::RecommenderWorkflows:<0.8.0+>

Test Dependencies

Provides

  • DSL::Bulgarian::ClassificationWorkflows::Grammar
  • DSL::Bulgarian::ClassificationWorkflows::Grammar::ClassificationPhrases
  • DSL::Bulgarian::DataQueryWorkflows::Grammar
  • DSL::Bulgarian::DataQueryWorkflows::Grammar::DataQueryPhrases
  • DSL::Bulgarian::LatentSemanticAnalysisWorkflows::Grammar
  • DSL::Bulgarian::LatentSemanticAnalysisWorkflows::Grammar::LatentSemanticAnalysisPhrases
  • DSL::Bulgarian::QuantileRegressionWorkflows::Grammar
  • DSL::Bulgarian::QuantileRegressionWorkflows::Grammar::TimeSeriesAndRegressionPhrases
  • DSL::Bulgarian::RecommenderWorkflows::Grammar
  • DSL::Bulgarian::RecommenderWorkflows::Grammar::RecommenderPhrases

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.