DSL::Bulgarian
Raku DSL::Bulgarian
In brief
This Raku package facilitates the specification computational workflows using natural language commands in Bulgarian.
Using the Domain Specific Languages (DSLs) executable code is generated for different programming languages: Julia, Python, R, Raku, Wolfram Language.
Translation to other natural languages is also done: English, Korean, Russian, Spanish.
Data query (wrangling) workflows
Translate Bulgarian data wrangling specifications to different natural and programming languages:
use DSL::English::DataQueryWorkflows;
my $command = '
Π·Π°ΡΠ΅Π΄ΠΈ Π΄Π°Π½Π½ΠΈΡΠ΅ iris;
Π²Π·Π΅ΠΌΠΈ Π΅Π»Π΅ΠΌΠ΅Π½ΡΠΈΡΠ΅ ΠΎΡ 1 Π΄ΠΎ 120;
ΡΠΈΠ»ΡΡΠΈΡΠ°ΠΉ ΡΡΠ΅Π· Sepal.Width Π΅ ΠΏΠΎ-Π³ΠΎΠ»ΡΠΌΠΎ ΠΎΡ 2.4 ΠΈ Petal.Length Π΅ ΠΏΠΎ-ΠΌΠ°Π»ΠΊΠΎ ΠΎΡ 5.5;
Π³ΡΡΠΏΠΈΡΠ°ΠΉ Ρ ΠΊΠΎΠ»ΠΎΠ½Π°ΡΠ° Species;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π·ΠΌΠ΅ΡΠΈΡΠ΅
';
for <English Python::pandas Raku::Reshapers Spanish Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToDataQueryWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
# ============================================================
# English
# ------------------------------------------------------------
# load the data table: "iris"
# take elements from 1 to 120
# filter with the predicate: ((Sepal.Width greater than 2.4) ΠΈ (Petal.Length less than 5.5))
# group by the columns: Species
# show the count(s)
# ============================================================
# Python::pandas
# ------------------------------------------------------------
# obj = example_dataset('iris')
# obj = obj.iloc[1-1:120]
# obj = obj[((obj["Sepal.Width"]> 2.4) & (obj["Petal.Length"]< 5.5))]
# obj = obj.groupby(["Species"])
# print(obj.size())
# ============================================================
# Raku::Reshapers
# ------------------------------------------------------------
# my $obj = example-dataset('iris') ;
# $obj = $obj[ (1 - 1) ... (120 - 1 ) ] ;
# $obj = $obj.grep({ $_{"Sepal.Width"} > 2.4 and $_{"Petal.Length"} < 5.5 }).Array ;
# $obj = group-by($obj, "Species") ;
# say "counts: ", $obj>>.elems
# ============================================================
# Spanish
# ------------------------------------------------------------
# cargar la tabla: "iris"
# tomar los elementos de 1 a 120
# filtrar con la condicion: ((Sepal.Width mΓ‘s grande 2.4) y (Petal.Length menos 5.5))
# agrupar con columnas: "Species"
# mostrar recuentos
# ============================================================
# Russian
# ------------------------------------------------------------
# Π·Π°Π³ΡΡΠ·ΠΈΡΡ ΡΠ°Π±Π»ΠΈΡΡ: "iris"
# Π²Π·ΡΡΡ ΡΠ»Π΅ΠΌΠ΅Π½ΡΡ Ρ 1 ΠΏΠΎ 120
# ΡΠΈΠ»ΡΡΡΠΎΠ²Π°ΡΡ Ρ ΠΏΡΠ΅Π΄ΠΈΠΊΠ°ΡΠΎΠΌ: ((Sepal.Width Π±ΠΎΠ»ΡΡΠ΅ 2.4) ΠΈ (Petal.Length ΠΌΠ΅Π½ΡΡΠ΅ 5.5))
# Π³ΡΡΠΏΠΈΡΠΎΠ²Π°ΡΡ Ρ ΠΊΠΎΠ»ΠΎΠ½ΠΊΠ°ΠΌΠΈ: Species
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠΈΡΠ»ΠΎ
Classification workflows
use DSL::English::ClassificationWorkflows;
my $command = '
ΠΈΠ·ΠΏΠΎΠ»Π·Π²Π°ΠΉ dfTitanic;
ΡΠ°Π·Π΄Π΅Π»ΠΈ Π΄Π°Π½Π½ΠΈΡΠ΅ Ρ ΡΠ΅ΠΏΠ΅ΡΠΎ ΡΡΠΎΡΠ½ΠΎΡΠ΅Π½ΠΈΠ΅ 0.82;
Π½Π°ΠΏΡΠ°Π²ΠΈ gradient boosted trees ΠΊΠ»Π°ΡΠΈΡΠΈΠΊΠ°ΡΠΎΡ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ TruePositiveRate ΠΈ FalsePositiveRate;
';
for <English Russian WL::ClCon> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToClassificationWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
# ============================================================
# English
# ------------------------------------------------------------
# use the data: dfTitanic
# split into training and testing data with the proportion 0.82
# train classifier with method: gradient boosted trees
# ============================================================
# Russian
# ------------------------------------------------------------
# ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°ΡΡ Π΄Π°Π½Π½ΡΠ΅: dfTitanic
# ΡΠ°Π·Π΄Π΅Π»ΠΈΡΡ Π΄Π°Π½Π½ΡΠ΅ Π½Π° ΠΏΡΠΎΠΏΠΎΡΡΠΈΡ 0.82
# ΠΎΠ±ΡΡΠΈΡΡ ΠΊΠ»Π°ΡΡΠΈΡΠΈΠΊΠ°ΡΠΎΡ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠΌ: gradient boosted trees
# ============================================================
# WL::ClCon
# ------------------------------------------------------------
# ClConUnit[ dfTitanic ] \[DoubleLongRightArrow]
# ClConSplitData[ 0.82 ] \[DoubleLongRightArrow]
# ClConMakeClassifier[ "GradientBoostedTrees" ] \[DoubleLongRightArrow]
# ClConClassifierMeasurements[ {"Recall", "FalsePositiveRate"} ] \[DoubleLongRightArrow] ClConEchoValue[]
Latent Semantic Analysis
use DSL::English::LatentSemanticAnalysisWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ ΡΡΡ textHamlet;
Π½Π°ΠΏΡΠ°Π²ΠΈ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ-ΡΠ΅ΡΠΌΠΈΠ½ ΠΌΠ°ΡΡΠΈΡΠ° ΡΡΡ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ½ΠΈ ΡΡΠΎΠΏ Π΄ΡΠΌΠΈ;
ΠΏΡΠΈΠ»ΠΎΠΆΠΈ LSI ΡΡΠ½ΠΊΡΠΈΠΈΡΠ΅ IDF, TermFrequency, ΠΈ Cosine;
ΠΈΠ·Π²Π°Π΄ΠΈ 12 ΡΠ΅ΠΌΠΈ ΡΡΠ΅Π· NNMF ΠΈ ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»Π΅Π½ Π±ΡΠΎΠΉ ΡΡΡΠΏΠΊΠΈ 12;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π±Π»ΠΈΡΠ° Π½Π° ΡΠ΅ΠΌΠΈΡΠ΅ Ρ 12 ΡΠ΅ΡΠΌΠΈΠ½Π°;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English Python::LSAMon R::LSAMon Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToLatentSemanticAnalysisWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
#ERROR: Possible misspelling of 'ΡΠ΅ΡΠΌΠΈΠ½ΠΈ' as 'ΡΠ΅ΡΠΌΠΈΠ½Π°'.
#ERROR: Possible misspelling of 'ΡΠ΅ΡΠΌΠΈΠ½ΠΈ' as 'ΡΠ΅ΡΠΌΠΈΠ½Π°'.
#ERROR: Possible misspelling of 'ΡΠ΅ΡΠΌΠΈΠ½ΠΈ' as 'ΡΠ΅ΡΠΌΠΈΠ½Π°'.
#ERROR: Possible misspelling of 'ΡΠ΅ΡΠΌΠΈΠ½ΠΈ' as 'ΡΠ΅ΡΠΌΠΈΠ½Π°'.
# ============================================================
# English
# ------------------------------------------------------------
# create LSA object with the data: textHamlet
# make the document-term matrix with the parameters: use the stop words: NULL
# apply the latent semantic analysis (LSI) functions: global weight function : "IDF", local weight function : "None", normalizer function : "Cosine"
# extract 12 topics using the parameters: method : Non-Negative Matrix Factorization (NNMF), max number of steps : 12
# show topics table using the parameters: numberOfTerms = 12
# show the pipeline value
# ============================================================
# Python::LSAMon
# ------------------------------------------------------------
# LatentSemanticAnalyzer(textHamlet).make_document_term_matrix( stop_words = None).apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine").extract_topics(number_of_topics = 12, method = "NNMF", max_steps = 12).echo_topics_table(numberOfTerms = 12).echo_value()
# ============================================================
# R::LSAMon
# ------------------------------------------------------------
# LSAMonUnit(textHamlet) %>%
# LSAMonMakeDocumentTermMatrix( stopWords = NULL) %>%
# LSAMonApplyTermWeightFunctions(globalWeightFunction = "IDF", localWeightFunction = "None", normalizerFunction = "Cosine") %>%
# LSAMonExtractTopics( numberOfTopics = 12, method = "NNMF", maxSteps = 12) %>%
# LSAMonEchoTopicsTable(numberOfTerms = 12) %>%
# LSAMonEchoValue()
# ============================================================
# Russian
# ------------------------------------------------------------
# ΡΠΎΠ·Π΄Π°ΡΡ Π»Π°ΡΠ΅Π½ΡΠ½ΡΠΉ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ Ρ Π΄Π°Π½Π½ΡΡ
: textHamlet
# ΡΠ΄Π΅Π»Π°ΡΡ ΠΌΠ°ΡΡΠΈΡΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ²-ΡΠ΅ΡΠΌΠΈΠ½ΠΎΠ² Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: ΡΡΠΎΠΏ-ΡΠ»ΠΎΠ²Π°: null
# ΠΏΡΠΈΠΌΠ΅Π½ΡΡΡ ΡΡΠ½ΠΊΡΠΈΠΈ Π»Π°ΡΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΈΠ½Π΄Π΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ (LSI): Π³Π»ΠΎΠ±Π°Π»ΡΠ½Π°Ρ Π²Π΅ΡΠΎΠ²Π°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "IDF", Π»ΠΎΠΊΠ°Π»ΡΠ½Π°Ρ Π²Π΅ΡΠΎΠ²Π°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "None", Π½ΠΎΡΠΌΠ°Π»ΠΈΠ·ΡΡΡΠ°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "Cosine"
# ΠΈΠ·Π²Π»Π΅ΡΡ 12 ΡΠ΅ΠΌ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: ΠΌΠ΅ΡΠΎΠ΄: Π Π°Π·Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅ ΠΠ΅ΠΎΡΡΠΈΡΠ°ΡΠ΅Π»ΡΠ½ΡΡ
ΠΠ°ΡΡΠΈΡΠ½ΡΡ
Π€Π°ΠΊΡΠΎΡΠΎΠ² (NNMF), ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ΅ ΡΠΈΡΠ»ΠΎ ΡΠ°Π³ΠΎΠ²: 12
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ°Π±Π»ΠΈΡΡ ΡΠ΅ΠΌΡ ΠΏΠΎ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌ: numberOfTerms = 12
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ΅ΠΊΡΡΠ΅Π΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΊΠΎΠ½Π²Π΅ΠΉΠ΅ΡΠ°
Quantile Regression Workflows
use DSL::English::QuantileRegressionWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ Ρ dfTemperatureData;
ΠΏΡΠ΅ΠΌΠ°Ρ
Π½ΠΈ Π»ΠΈΠΏΡΠ²Π°ΡΠΈΡΠ΅ ΡΡΠΎΠΉΠ½ΠΎΡΡΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄Π°Π½Π½ΠΎΠ²ΠΎ ΠΎΠ±ΠΎΠ±ΡΠ΅Π½ΠΈΠ΅;
ΠΏΡΠ΅ΠΌΠ°ΡΠ°Π±ΠΈΡΠ°ΠΉ Π΄Π²Π΅ΡΠ΅ ΠΎΡΠΈ;
ΠΈΠ·ΡΠΈΡΠ»ΠΈ ΠΊΠ²Π°Π½ΡΠΈΠ»Π½Π° ΡΠ΅Π³ΡΠ΅ΡΠΈΡ Ρ 20 Π²ΡΠ·Π΅Π»Π° ΠΈ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΠΈ ΠΎΡ 0.1 Π΄ΠΎ 0.9 ΡΡΡ ΡΡΡΠΏΠΊΠ° 0.1;
ΠΏΠΎΠΊΠ°ΠΆΠΈ Π΄ΠΈΠ°Π³ΡΠ°ΠΌΠ° Ρ Π΄Π°ΡΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΡΡΠ΅ΠΆ Π½Π° Π°Π±ΡΠΎΠ»ΡΡΠ½ΠΈΡΠ΅ Π³ΡΠ΅ΡΠΊΠΈ;
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English R::QRMon Russian WL::QRMon> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToQuantileRegressionWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
#ERROR: Possible misspelling of 'Π²ΡΠ·Π»ΠΈ' as 'Π²ΡΠ·Π΅Π»Π°'.
#ERROR: Possible misspelling of 'Π²ΡΠ·Π»ΠΈ' as 'Π²ΡΠ·Π΅Π»Π°'.
#ERROR: Possible misspelling of 'Π²ΡΠ·Π»ΠΈ' as 'Π²ΡΠ·Π΅Π»Π°'.
#ERROR: Possible misspelling of 'Π²ΡΠ·Π»ΠΈ' as 'Π²ΡΠ·Π΅Π»Π°'.
# ============================================================
# English
# ------------------------------------------------------------
# create quantile regression object with the data: dfTemperatureData
# delete missing values
# show data summary
# rescale: over both regressor and value axes
# compute quantile regression with parameters: degrees of freedom (knots): 20, automatic probabilities
# show plot with parameters: use date axis
# show plot of relative errors
# show the pipeline value
# ============================================================
# R::QRMon
# ------------------------------------------------------------
# QRMonUnit( data = dfTemperatureData) %>%
# QRMonDeleteMissing() %>%
# QRMonEchoDataSummary() %>%
# QRMonRescale(regressorAxisQ = TRUE, valueAxisQ = TRUE) %>%
# QRMonQuantileRegression(df = 20, probabilities = seq(0.1, 0.9, 0.1)) %>%
# QRMonPlot( datePlotQ = TRUE) %>%
# QRMonErrorsPlot( relativeErrorsQ = TRUE) %>%
# QRMonEchoValue()
# ============================================================
# Russian
# ------------------------------------------------------------
# ΡΠΎΠ·Π΄Π°ΡΡ ΠΎΠ±ΡΠ΅ΠΊΡ ΠΊΠ²Π°Π½ΡΠΈΠ»ΡΠ½ΠΎΠΉ ΡΠ΅Π³ΡΠ΅ΡΡΠΈΠΈ Ρ Π΄Π°Π½Π½ΡΠΌΠΈ: dfTemperatureData
# ΡΠ΄Π°Π»ΠΈΡΡ ΠΏΡΠΎΠΏΡΡΠ΅Π½Π½ΡΠ΅ Π·Π½Π°ΡΠ΅Π½ΠΈΡ
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ²ΠΎΠ΄ΠΊΡ Π΄Π°Π½Π½ΡΡ
# ΠΏΠ΅ΡΠ΅ΠΌΠ°ΡΡΡΠ°Π±ΠΈΡΠΎΠ²Π°ΡΡ: ΠΏΠΎ ΠΎΡΡΠΌ ΡΠ΅Π³ΡΠ΅ΡΡΠΈΠΈ ΠΈ Π·Π½Π°ΡΠ΅Π½ΠΈΠΉ
# ΡΠ°ΡΡΡΠΈΡΠ°ΡΡ ΠΊΠ²Π°Π½ΡΠΈΠ»ΡΠ½ΡΡ ΡΠ΅Π³ΡΠ΅ΡΡΠΈΡ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: ΡΡΠ΅ΠΏΠ΅Π½ΠΈ ΡΠ²ΠΎΠ±ΠΎΠ΄Ρ (ΡΠ·Π»Ρ): 20, Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΈΠΌΠΈ Π²Π΅ΡΠΎΡΡΠ½ΠΎΡΡΡΠΌΠΈ
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ Π΄ΠΈΠ°Π³ΡΠ°ΠΌΠΌΡ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: ΠΈΡΠΏΠΎΠ»ΡΠ·ΠΎΠ²Π°Π½ΠΈΠ΅ΠΌ ΠΎΡΠΈ Π΄Π°Ρ
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ Π΄ΠΈΠ°Π³ΡΠ°ΠΌΡ Π½Π° ΠΎΡΠ½ΠΎΡΠΈΡΠ΅Π»ΡΠ½ΡΡ
ΠΎΡΠΈΠ±ΠΎΠΊ
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ΅ΠΊΡΡΠ΅Π΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΊΠΎΠ½Π²Π΅ΠΉΠ΅ΡΠ°
# ============================================================
# WL::QRMon
# ------------------------------------------------------------
# QRMonUnit[dfTemperatureData] \[DoubleLongRightArrow]
# QRMonDeleteMissing[] \[DoubleLongRightArrow]
# QRMonEchoDataSummary[] \[DoubleLongRightArrow]
# QRMonRescale["Axes"->{True, True}] \[DoubleLongRightArrow]
# QRMonQuantileRegression["Knots" -> 20, "Probabilities" -> Range[0.1, 0.9, 0.1]] \[DoubleLongRightArrow]
# QRMonDateListPlot[] \[DoubleLongRightArrow]
# QRMonErrorPlots[ "RelativeErrors" -> True] \[DoubleLongRightArrow]
# QRMonEchoValue[]
Recommender workflows
use DSL::English::RecommenderWorkflows;
my $command = '
ΡΡΠ·Π΄Π°ΠΉ ΡΡΠ΅Π· dfTitanic;
ΠΏΡΠ΅ΠΏΠΎΡΡΡΠ°ΠΉ ΡΡΡ ΠΏΡΠΎΡΠΈΠ»Π° "male" ΠΈ "died";
ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ΅ΠΊΡΡΠ°ΡΠ° Π»Π΅Π½ΡΠΎΠ²Π° ΡΡΠΎΠΉΠ½ΠΎΡΡ
';
for <English Python::SMRMon R::SMRMon Russian> -> $t {
say '=' x 60, "\n", $t, "\n", '-' x 60;
say ToRecommenderWorkflowCode($command, $t, language => 'Bulgarian', format => 'code');
}
# ============================================================
# English
# ------------------------------------------------------------
# create with data table: dfTitanic
# recommend with the profile: ["male", "died"]
# show the pipeline value
# ============================================================
# Python::SMRMon
# ------------------------------------------------------------
# obj = SparseMatrixRecommender().create_from_wide_form(data = dfTitanic).recommend_by_profile( profile = ["male", "died"]).echo_value()
# ============================================================
# R::SMRMon
# ------------------------------------------------------------
# SMRMonCreate(data = dfTitanic) %>%
# SMRMonRecommendByProfile( profile = c("male", "died")) %>%
# SMRMonEchoValue()
# ============================================================
# Russian
# ------------------------------------------------------------
# ΡΠΎΠ·Π΄Π°ΡΡ Ρ ΡΠ°Π±Π»ΠΈΡΡ: dfTitanic
# ΡΠ΅ΠΊΠΎΠΌΠ΅Π½Π΄ΡΠΉ Ρ ΠΏΡΠΎΡΠΈΠ»Ρ: ["male", "died"]
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ΅ΠΊΡΡΠ΅Π΅ Π·Π½Π°ΡΠ΅Π½ΠΈΠ΅ ΠΊΠΎΠ½Π²Π΅ΠΉΠ΅ΡΠ°
Implementation notes
The rules in the file "DataQueryPhrases.rakumod" are derived from file "DataQueryPhrases-template" using the package "Grammar::TokenProcessing" , [AAp3].
In order to have Bulgarian commands parsed and interpreted into code the steps taken were split into four phases:
Utilities preparation
Bulgarian words and phrases addition and preparation
Preliminary functionality experiments
Packages code refactoring
Utilities preparation
Since the beginning of the work on translation of the computational DSLs into programming code it was clear that some the required code transformations have to be automated.
While doing the preparation work -- and in general, while the DSL-translation work matured -- it became clear that there are several directives to follow:
Make and use Command Line Interface (CLI) scripts that do code transformation or generation.
Adhere to of the Eric Raymond's 17 Unix Rules, [Wk1]:
Make data complicated when required, not the program
Write abstract programs that generate code instead of writing code by hand
In order to facilitate the "from Bulgarian" project the package "Grammar::TokenProcessing", [AAp3], was "finalized." The initial versions of that package were used from the very beginning of the DSLs grammar development in order to facilitate handling of misspellings.
(Current) recipe
This sub-section lists the steps for endowing a certain already developed workflows DSL package with Bulgarian translations.
Denote the DSL workflows we focus on as DOMAIN (workflows.)
For example, DOMAIN can stand for DataQueryWorkflows
, or RecommenderWorkflows
.
Remark: In the recipe steps below DOMAIN would be DataQueryWorkflows
It is assumed that:
DOMAIN in English are already developed.
Since both English and Bulgarian are analytical, non-agglutinative languages "just" replacing English words with Bulgarian words in DOMAIN would produce good enough parsers of Bulgarian.
Here are the steps:
Add global Bulgarian words (optional)
Add Bulgarian words and phrases in the DSL::Shared file "Roles/Bulgarian/CommonSpeechParts-template".
Generate the file Roles/Bulgarian/CommonSpeechParts.rakumod using the CLI script AddFuzzyMatching
Consider translating, changing, or refactoring global files, like, Roles/English/TimeIntervalSpec.rakumod
Translate DOMAIN English words and phrases into Bulgarian
Take the file DOMAIN/Grammar/DOMAIN-template and translate its words into Bulgarian
Add the corresponding files into DSL::Bulgarian, [AAp1].
Use the
DOMAIN/Grammarish.rakumod
role.The English DOMAIN package should have such rule. If do not do the corresponding code refactoring.
Test with implemented DOMAIN languages.
See the example grammar and role in DataQueryWorkflows in DSL::Bulgarian.
References
Articles
[AA1] Anton Antonov, "Introduction to data wrangling with Raku", (2021), RakuForPrediction at WordPress.
[Wk1] Wikipedia entry, UNIX-philosophy rules.
Packages
[AAp1] Anton Antonov, DSL::Bulgarian, Raku package, (2022), GitHub/antononcube.
[AAp2] Anton Antonov, DSL::Shared, Raku package, (2018-2022), GitHub/antononcube.
[AAp3] Anton Antonov, Grammar::TokenProcessing, Raku project (2022), GitHub/antononcube.
[AAp4] Anton Antonov, DSL::English::ClassificationWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp5] Anton Antonov, DSL::English::DataQueryWorkflows, Raku package, (2020-2022), GitHub/antononcube.
[AAp6] Anton Antonov, DSL::English::LatentSemanticAnalysisWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp7] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.
[AAp8] Anton Antonov, DSL::English::QuantileRegressionWorkflows, Raku package, (2018-2022), GitHub/antononcube.