DSL::English::LatentSemanticAnalysisWorkflows

Latent Semantic Analysis workflows building by natural language commands.

Latent Semantic Analysis Workflows

In brief

This Raku (Perl 6) package has grammar classes and action classes for the parsing and interpretation of natural Domain Specific Language (DSL) commands that specify Latent Semantic Analysis (LSA) workflows.

The interpreters (actions) target different programming languages: R, Mathematica, Python, Raku. Also, different natural languages.

The interpreters (actions) target different programming languages: R, Mathematica, Python, Raku. Also, different natural languages.

The generated pipelines are for the software monads "LSAMon-R" and "LSAMon-WL", [AA1, AA2], and the object oriented Python implementation [AAp4].

Installation

Zef ecosystem:

zef install DSL::English::LatentSemanticAnalysisWorkflows

GitHub:

zef install https://github.com/antononcube/Raku-DSL-English-LatentSemanticAnalysisWorkflows.git

Examples

Programming languages

Here is a simple invocation:

use DSL::English::LatentSemanticAnalysisWorkflows;

ToLatentSemanticAnalysisWorkflowCode("extract 12 topics using method NNMF and max steps 12", "R::LSAMon");
# LSAMonExtractTopics( numberOfTopics = 12, method = "NNMF",  maxSteps = 12)

Here is a more complicated pipeline specification used to generate the code for recommender systems implemented in different languages:

my $command = q:to/END/;
create from textHamlet;
make document term matrix with stemming FALSE and automatic stop words;
apply LSI functions global weight function IDF, local term weight function TermFrequency, normalizer function Cosine;
extract 12 topics using method NNMF and max steps 12 and 20 min number of documents per term;
show topics table with 12 terms;
show thesaurus table for king, castle, denmark
END

say $_.key, "\n", $_.value, "\n"  for ($_ => ToLatentSemanticAnalysisWorkflowCode($command, $_ ) for <R::LSAMon WL::LSAMon Python::LSAMon>);
# R::LSAMon
# LSAMonUnit(textHamlet) %>%
# LSAMonMakeDocumentTermMatrix( stemWordsQ = FALSE, stopWords = NULL) %>%
# LSAMonApplyTermWeightFunctions(globalWeightFunction = "IDF", localWeightFunction = "None", normalizerFunction = "Cosine") %>%
# LSAMonExtractTopics( numberOfTopics = 12, method = "NNMF",  maxSteps = 12, minNumberOfDocumentsPerTerm = 20) %>%
# LSAMonEchoTopicsTable(numberOfTerms = 12) %>%
# LSAMonEchoStatisticalThesaurus(words = c("king", "castle", "denmark"))
#
# WL::LSAMon
# LSAMonUnit[textHamlet] \[DoubleLongRightArrow]
# LSAMonMakeDocumentTermMatrix[ "StemmingRules" -> False, "StopWords" -> Automatic] \[DoubleLongRightArrow]
# LSAMonApplyTermWeightFunctions["GlobalWeightFunction" -> "IDF", "LocalWeightFunction" -> "None", "NormalizerFunction" -> "Cosine"] \[DoubleLongRightArrow]
# LSAMonExtractTopics["NumberOfTopics" -> 12, Method -> "NNMF", "MaxSteps" -> 12, "MinNumberOfDocumentsPerTerm" -> 20] \[DoubleLongRightArrow]
# LSAMonEchoTopicsTable["NumberOfTerms" -> 12] \[DoubleLongRightArrow]
# LSAMonEchoStatisticalThesaurus["Words" -> {"king", "castle", "denmark"}]
#
# Python::LSAMon
# LatentSemanticAnalyzer(textHamlet).make_document_term_matrix( stemming_rules = False, stop_words = None).apply_term_weight_functions(global_weight_func = "IDF", local_weight_func = "None", normalizer_func = "Cosine").extract_topics(number_of_topics = 12, method = "NNMF", max_steps = 12, min_number_of_documents_per_term = 20).echo_topics_table(numberOfTerms = 12).echo_statistical_thesaurus(["king", "castle", "denmark"])

Natural languages

say $_.key, "\n", $_.value, "\n"  for ($_ => ToLatentSemanticAnalysisWorkflowCode($command, $_ ) for <Bulgarian English Russian>);
# Bulgarian
# създай Π»Π°Ρ‚Π΅Π½Ρ‚Π½ΠΎ сСмантичСн Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ с Π΄Π°Π½Π½ΠΈΡ‚Π΅: textHamlet
# Π½Π°ΠΏΡ€Π°Π²ΠΈ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚-Ρ‚Π΅Ρ€ΠΌΠΈΠ½ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Π°Ρ‚Π° с ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΈ: Π½Π°ΠΌΠΈΡ€Π°Π½Π΅ Π½a ΡΡ‚ΡŠΠ±Π»Π°Ρ‚Π° Π½Π° Π΄ΡƒΠΌΠΈΡ‚Π΅: false, спиращи Π΄ΡƒΠΌΠΈ: null
# ΠΏΡ€ΠΈΠ»ΠΎΠΆΠΈ Π»Π°Ρ‚Π΅Π½Ρ‚Π½ΠΎ сСмантично идСксиращитС (LSI) Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ:Π³Π»ΠΎΠ±Π°Π»Π½ΠΎ Ρ‚Π΅Π³Π»ΠΎΠ²Π° функция: "IDF", Π»ΠΎΠΊΠ°Π»Π½ΠΎ Ρ‚Π΅Π³Π»ΠΎΠ²Π° функция: "None", Π½ΠΎΡ€ΠΌΠ°Π»ΠΈΠ·ΠΈΡ€Π°Ρ‰Π° функция: "Cosine"
# Π΄ΠΎΠ±ΠΈΠΉ 12 Ρ‚Π΅ΠΌΠΈ с ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΈ: ΠΌΠ΅Ρ‚ΠΎΠ΄: Π Π°Π·Π»Π°Π³Π°Π½Π΅ ΠΏΠΎ НСотрицатСлни ΠœΠ°Ρ‚Ρ€ΠΈΡ‡Π½ΠΈ Π€Π°ΠΊΡ‚ΠΎΡ€ΠΈ (NNMF), максималСн Π±Ρ€ΠΎΠΉ ΡΡ‚ΡŠΠΏΠΊΠΈ: 12, ΠΌΠΈΠ½ΠΈΠΌΠ°Π»Π΅Π½ Π±Ρ€ΠΎΠΉ ΠΎΡ‚ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΈ Π·Π° Ρ‚Π΅Ρ€ΠΌΠΈΠ½: 20
# ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Π°Ρ‚Π° Π½Π° Ρ‚Π΅ΠΌΠΈΡ‚Π΅ Ρ‡Ρ€Π΅Π· ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€ΠΈΡ‚Π΅: numberOfTerms = 12
# ΠΏΠΎΠΊΠ°ΠΆΠΈ Ρ‚Π°Π±Π»ΠΈΡ†Π° със статистичСския Ρ‚ΡŠΠ»ΠΊΠΎΠ²Π΅Π½ Ρ€Π΅Ρ‡Π½ΠΈΠΊ: Π·Π° Π΄ΡƒΠΌΠΈΡ‚Π΅: ["king", "castle", "denmark"]
#
# English
# create LSA object with the data: textHamlet
# make the document-term matrix with the parameters: use stemming rules: FALSE, use the stop words: NULL
# apply the latent semantic analysis (LSI) functions: global weight function : "IDF", local weight function : "None", normalizer function : "Cosine"
# extract 12 topics using the parameters: method : Non-Negative Matrix Factorization (NNMF), max number of steps : 12, min number of documents per term : 20
# show topics table using the parameters: numberOfTerms = 12
# show statistical thesaurus: for the words : ["king", "castle", "denmark"]
#
# Russian
# ΡΠΎΠ·Π΄Π°Ρ‚ΡŒ Π»Π°Ρ‚Π΅Π½Ρ‚Π½Ρ‹ΠΉ сСмантичСский Π°Π½Π°Π»ΠΈΠ·Π°Ρ‚ΠΎΡ€ с Π΄Π°Π½Π½Ρ‹Ρ…: textHamlet
# ΡΠ΄Π΅Π»Π°Ρ‚ΡŒ ΠΌΠ°Ρ‚Ρ€ΠΈΡ†Ρƒ Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ²-Ρ‚Π΅Ρ€ΠΌΠΈΠ½ΠΎΠ² с ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π°ΠΌΠΈ: Π½Π°ΠΉΡ‚ΠΈ основы слов: false, стоп-слова: null
# ΠΏΡ€ΠΈΠΌΠ΅Π½ΡΡ‚ΡŒ Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ Π»Π°Ρ‚Π΅Π½Ρ‚Π½ΠΎΠ³ΠΎ сСмантичСского индСксирования (LSI): глобальная вСсовая функция: "IDF", локальная вСсовая функция: "None", Π½ΠΎΡ€ΠΌΠ°Π»ΠΈΠ·ΡƒΡŽΡ‰Π°Ρ функция: "Cosine"
# ΠΈΠ·Π²Π»Π΅Ρ‡ΡŒ 12 Ρ‚Π΅ΠΌ с ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π°ΠΌΠΈ: ΠΌΠ΅Ρ‚ΠΎΠ΄: Π Π°Π·Π»ΠΎΠΆΠ΅Π½ΠΈΠ΅ ΠΠ΅ΠΎΡ‚Ρ€ΠΈΡ†Π°Ρ‚Π΅Π»ΡŒΠ½Ρ‹Ρ… ΠœΠ°Ρ‚Ρ€ΠΈΡ‡Π½Ρ‹Ρ… Π€Π°ΠΊΡ‚ΠΎΡ€ΠΎΠ² (NNMF), максимальноС число шагов: 12, минимальноС количСство Π΄ΠΎΠΊΡƒΠΌΠ΅Π½Ρ‚ΠΎΠ² Π² слово: 20
# ΠΏΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ Ρ‚Π°Π±Π»ΠΈΡ†Ρƒ Ρ‚Π΅ΠΌΡ‹ ΠΏΠΎ ΠΏΠ°Ρ€Π°ΠΌΠ΅Ρ‚Ρ€Π°ΠΌ: numberOfTerms = 12
# ΠΏΠΎΠΊΠ°Π·Π°Ρ‚ΡŒ Ρ‚Π°Π±Π»ΠΈΡ†Ρƒ со статистичСской ΠΈΠ½Ρ‚Π΅Ρ€ΠΏΡ€Π΅Ρ‚Π°Ρ†ΠΈΠ΅ΠΉ слов: для слов: ["king", "castle", "denmark"]

Versions

The original version of this Raku package was developed/hosted at [ AAp1 ].

A dedicated GitHub repository was made in order to make the installation with Raku's zef more direct. (As shown above.)

References

[AAp1] Anton Antonov, Latent Semantic Analysis Workflows Raku Package, (2019), ConversationalAgents at GitHub.

[AAp2] Anton Antonov, Latent Semantic Analysis Monad in R, (2019), R-packages at GitHub.

[AAp3] Anton Antonov, Monadic Latent Semantic Analysis Mathematica package, (2017), MathematicaForPrediction at GitHub.

[AAp4] Anton Antonov, LatentSemanticAnalyzer Python package, (2021), Python-packages at GitHub.

DSL::English::LatentSemanticAnalysisWorkflows v0.8.0

Latent Semantic Analysis workflows building by natural language commands.

Authors

  • Anton Antonov

License

GPL-3.0-or-later

Dependencies

DSL::Shared

Test Dependencies

Provides

  • DSL::English::LatentSemanticAnalysisWorkflows
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::Bulgarian::Standard
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::English::Standard
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::Python::LSAMon
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::R::LSAMon
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::Russian::Standard
  • DSL::English::LatentSemanticAnalysisWorkflows::Actions::WL::LSAMon
  • DSL::English::LatentSemanticAnalysisWorkflows::Grammar
  • DSL::English::LatentSemanticAnalysisWorkflows::Grammar::LSIApplyCommand
  • DSL::English::LatentSemanticAnalysisWorkflows::Grammar::LatentSemanticAnalysisPhrases
  • DSL::English::LatentSemanticAnalysisWorkflows::Grammarish

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.