README
Latent Semantic Analysis Workflows
In brief
This Raku (Perl 6) package has grammar classes and action classes for the parsing and interpretation of natural Domain Specific Language (DSL) commands that specify Latent Semantic Analysis (LSA) workflows.
The interpreters (actions) target different programming languages: R, Mathematica, Python, Raku. Also, different natural languages.
The interpreters (actions) target different programming languages: R, Mathematica, Python, Raku. Also, different natural languages.
The generated pipelines are for the software monads "LSAMon-R" and "LSAMon-WL", [AA1, AA2], and the object oriented Python implementation [AAp4].
Installation
Zef ecosystem:
zef install DSL::English::LatentSemanticAnalysisWorkflows
GitHub:
zef install https://github.com/antononcube/Raku-DSL-English-LatentSemanticAnalysisWorkflows.git
Examples
Programming languages
Here is a simple invocation:
use DSL::English::LatentSemanticAnalysisWorkflows;
ToLatentSemanticAnalysisWorkflowCode("extract 12 topics using method NNMF and max steps 12", "R::LSAMon");
# LSAMonExtractTopics( numberOfTopics = 12, method = "NNMF", maxSteps = 12)
Here is a more complicated pipeline specification used to generate the code for recommender systems implemented in different languages:
my $command = q:to/END/;
create from textHamlet;
make document term matrix with stemming FALSE and automatic stop words;
apply LSI functions global weight function IDF, local term weight function TermFrequency, normalizer function Cosine;
extract 12 topics using method NNMF and max steps 12 and 20 min number of documents per term;
show topics table with 12 terms;
show thesaurus table for king, castle, denmark
END
say $_.key, "\n", $_.value, "\n" for ($_ => ToLatentSemanticAnalysisWorkflowCode($command, $_ ) for <R::LSAMon WL::LSAMon Python::LSAMon>);
# R::LSAMon
# LSAMonUnit(textHamlet) %>%
# LSAMonMakeDocumentTermMatrix( stemWordsQ = FALSE, stopWords = NULL) %>%
# LSAMonApplyTermWeightFunctions(globalWeightFunction = "IDF", localWeightFunction = "None", normalizerFunction = "Cosine") %>%
# LSAMonExtractTopics( numberOfTopics = 12, method = "NNMF", maxSteps = 12, minNumberOfDocumentsPerTerm = 20) %>%
# LSAMonEchoTopicsTable(numberOfTerms = 12) %>%
# LSAMonEchoStatisticalThesaurus(words = c("king", "castle", "denmark"))
#
# WL::LSAMon
# LSAMonUnit[textHamlet] \[DoubleLongRightArrow]
# LSAMonMakeDocumentTermMatrix[ "StemmingRules" -> False, "StopWords" -> Automatic] \[DoubleLongRightArrow]
# LSAMonApplyTermWeightFunctions["GlobalWeightFunction" -> "IDF", "LocalWeightFunction" -> "None", "NormalizerFunction" -> "Cosine"] \[DoubleLongRightArrow]
# LSAMonExtractTopics["NumberOfTopics" -> 12, Method -> "NNMF", "MaxSteps" -> 12, "MinNumberOfDocumentsPerTerm" -> 20] \[DoubleLongRightArrow]
# LSAMonEchoTopicsTable["NumberOfTerms" -> 12] \[DoubleLongRightArrow]
# LSAMonEchoStatisticalThesaurus["Words" -> {"king", "castle", "denmark"}]
#
# Python::LSAMon
# LatentSemanticAnalyzer(textHamlet).make_document_term_matrix( stemming_rules = False, stop_words = None).apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine").extract_topics(number_of_topics = 12, method = "NNMF", max_steps = 12, min_number_of_documents_per_term = 20).echo_topics_table(numberOfTerms = 12).echo_statistical_thesaurus(["king", "castle", "denmark"])
Natural languages
say $_.key, "\n", $_.value, "\n" for ($_ => ToLatentSemanticAnalysisWorkflowCode($command, $_ ) for <Bulgarian English Russian>);
# Bulgarian
# ΡΡΠ·Π΄Π°ΠΉ Π»Π°ΡΠ΅Π½ΡΠ½ΠΎ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅Π½ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ Ρ Π΄Π°Π½Π½ΠΈΡΠ΅: textHamlet
# Π½Π°ΠΏΡΠ°Π²ΠΈ Π΄ΠΎΠΊΡΠΌΠ΅Π½Ρ-ΡΠ΅ΡΠΌΠΈΠ½ ΠΌΠ°ΡΡΠΈΡΠ°ΡΠ° Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΈ: Π½Π°ΠΌΠΈΡΠ°Π½Π΅ Π½a ΡΡΡΠ±Π»Π°ΡΠ° Π½Π° Π΄ΡΠΌΠΈΡΠ΅: false, ΡΠΏΠΈΡΠ°ΡΠΈ Π΄ΡΠΌΠΈ: null
# ΠΏΡΠΈΠ»ΠΎΠΆΠΈ Π»Π°ΡΠ΅Π½ΡΠ½ΠΎ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ½ΠΎ ΠΈΠ΄Π΅ΠΊΡΠΈΡΠ°ΡΠΈΡΠ΅ (LSI) ΡΡΠ½ΠΊΡΠΈΠΈ:Π³Π»ΠΎΠ±Π°Π»Π½ΠΎ ΡΠ΅Π³Π»ΠΎΠ²Π° ΡΡΠ½ΠΊΡΠΈΡ: "IDF", Π»ΠΎΠΊΠ°Π»Π½ΠΎ ΡΠ΅Π³Π»ΠΎΠ²Π° ΡΡΠ½ΠΊΡΠΈΡ: "None", Π½ΠΎΡΠΌΠ°Π»ΠΈΠ·ΠΈΡΠ°ΡΠ° ΡΡΠ½ΠΊΡΠΈΡ: "Cosine"
# Π΄ΠΎΠ±ΠΈΠΉ 12 ΡΠ΅ΠΌΠΈ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΈ: ΠΌΠ΅ΡΠΎΠ΄: Π Π°Π·Π»Π°Π³Π°Π½Π΅ ΠΏΠΎ ΠΠ΅ΠΎΡΡΠΈΡΠ°ΡΠ΅Π»Π½ΠΈ ΠΠ°ΡΡΠΈΡΠ½ΠΈ Π€Π°ΠΊΡΠΎΡΠΈ (NNMF), ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»Π΅Π½ Π±ΡΠΎΠΉ ΡΡΡΠΏΠΊΠΈ: 12, ΠΌΠΈΠ½ΠΈΠΌΠ°Π»Π΅Π½ Π±ΡΠΎΠΉ ΠΎΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΈ Π·Π° ΡΠ΅ΡΠΌΠΈΠ½: 20
# ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π±Π»ΠΈΡΠ°ΡΠ° Π½Π° ΡΠ΅ΠΌΠΈΡΠ΅ ΡΡΠ΅Π· ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠΈΡΠ΅: numberOfTerms = 12
# ΠΏΠΎΠΊΠ°ΠΆΠΈ ΡΠ°Π±Π»ΠΈΡΠ° ΡΡΡ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΈΡ ΡΡΠ»ΠΊΠΎΠ²Π΅Π½ ΡΠ΅ΡΠ½ΠΈΠΊ: Π·Π° Π΄ΡΠΌΠΈΡΠ΅: ["king", "castle", "denmark"]
#
# English
# create LSA object with the data: textHamlet
# make the document-term matrix with the parameters: use stemming rules: FALSE, use the stop words: NULL
# apply the latent semantic analysis (LSI) functions: global weight function : "IDF", local weight function : "None", normalizer function : "Cosine"
# extract 12 topics using the parameters: method : Non-Negative Matrix Factorization (NNMF), max number of steps : 12, min number of documents per term : 20
# show topics table using the parameters: numberOfTerms = 12
# show statistical thesaurus: for the words : ["king", "castle", "denmark"]
#
# Russian
# ΡΠΎΠ·Π΄Π°ΡΡ Π»Π°ΡΠ΅Π½ΡΠ½ΡΠΉ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΈΠΉ Π°Π½Π°Π»ΠΈΠ·Π°ΡΠΎΡ Ρ Π΄Π°Π½Π½ΡΡ
: textHamlet
# ΡΠ΄Π΅Π»Π°ΡΡ ΠΌΠ°ΡΡΠΈΡΡ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ²-ΡΠ΅ΡΠΌΠΈΠ½ΠΎΠ² Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: Π½Π°ΠΉΡΠΈ ΠΎΡΠ½ΠΎΠ²Ρ ΡΠ»ΠΎΠ²: false, ΡΡΠΎΠΏ-ΡΠ»ΠΎΠ²Π°: null
# ΠΏΡΠΈΠΌΠ΅Π½ΡΡΡ ΡΡΠ½ΠΊΡΠΈΠΈ Π»Π°ΡΠ΅Π½ΡΠ½ΠΎΠ³ΠΎ ΡΠ΅ΠΌΠ°Π½ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΈΠ½Π΄Π΅ΠΊΡΠΈΡΠΎΠ²Π°Π½ΠΈΡ (LSI): Π³Π»ΠΎΠ±Π°Π»ΡΠ½Π°Ρ Π²Π΅ΡΠΎΠ²Π°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "IDF", Π»ΠΎΠΊΠ°Π»ΡΠ½Π°Ρ Π²Π΅ΡΠΎΠ²Π°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "None", Π½ΠΎΡΠΌΠ°Π»ΠΈΠ·ΡΡΡΠ°Ρ ΡΡΠ½ΠΊΡΠΈΡ: "Cosine"
# ΠΈΠ·Π²Π»Π΅ΡΡ 12 ΡΠ΅ΠΌ Ρ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌΠΈ: ΠΌΠ΅ΡΠΎΠ΄: Π Π°Π·Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅ ΠΠ΅ΠΎΡΡΠΈΡΠ°ΡΠ΅Π»ΡΠ½ΡΡ
ΠΠ°ΡΡΠΈΡΠ½ΡΡ
Π€Π°ΠΊΡΠΎΡΠΎΠ² (NNMF), ΠΌΠ°ΠΊΡΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ΅ ΡΠΈΡΠ»ΠΎ ΡΠ°Π³ΠΎΠ²: 12, ΠΌΠΈΠ½ΠΈΠΌΠ°Π»ΡΠ½ΠΎΠ΅ ΠΊΠΎΠ»ΠΈΡΠ΅ΡΡΠ²ΠΎ Π΄ΠΎΠΊΡΠΌΠ΅Π½ΡΠΎΠ² Π² ΡΠ»ΠΎΠ²ΠΎ: 20
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ°Π±Π»ΠΈΡΡ ΡΠ΅ΠΌΡ ΠΏΠΎ ΠΏΠ°ΡΠ°ΠΌΠ΅ΡΡΠ°ΠΌ: numberOfTerms = 12
# ΠΏΠΎΠΊΠ°Π·Π°ΡΡ ΡΠ°Π±Π»ΠΈΡΡ ΡΠΎ ΡΡΠ°ΡΠΈΡΡΠΈΡΠ΅ΡΠΊΠΎΠΉ ΠΈΠ½ΡΠ΅ΡΠΏΡΠ΅ΡΠ°ΡΠΈΠ΅ΠΉ ΡΠ»ΠΎΠ²: Π΄Π»Ρ ΡΠ»ΠΎΠ²: ["king", "castle", "denmark"]
Versions
The original version of this Raku package was developed/hosted at [ AAp1 ].
A dedicated GitHub repository was made in order to make the installation with Raku's zef
more direct.
(As shown above.)
References
[AAp1] Anton Antonov, Latent Semantic Analysis Workflows Raku Package, (2019), ConversationalAgents at GitHub.
[AAp2] Anton Antonov, Latent Semantic Analysis Monad in R, (2019), R-packages at GitHub.
[AAp3] Anton Antonov, Monadic Latent Semantic Analysis Mathematica package, (2017), MathematicaForPrediction at GitHub.
[AAp4] Anton Antonov, LatentSemanticAnalyzer Python package, (2021), Python-packages at GitHub.