README
ML::NLPTemplateEngine
A Raku package is available that provides an NLP template engine to create various computational workflows.
Package's data and implementation make a Natural Language Processing (NLP) Template Engine (TE), [Wk1], that incorporates Question Answering Systems (QAS'), [Wk2], and Machine Learning (ML) classifiers.
The current version of the NLP-TE of the package heavily relies on Large Language Models (LLMs) for its QAS component.
Future plans involve incorporating other types of QAS implementations.
The Raku package implementation close follows the Wolfram Language (WL) implementations in "NLP Template Engine", [AAr1, AAv1], and the WL paclet "NLPTemplateEngine", [AAp2, AAv2].
An alternative, more comprehensive approach to building workflows code is given in [AAp2].
Problem formulation
We want to have a system (i.e. TE) that:
Generates relevant, correct, executable programming code based on natural language specifications of computational workflows
Can automatically recognize the workflow types
Can generate code for different programming languages and related software packages
The points above are given in order of importance; the most important are placed first.
Installation
From Zef ecosystem:
zef install ML::NLPTemplateEngine;
From GitHub:
zef install https://github.com/antononcube/Raku-ML-NLPTemplateEngine.git
Usage examples
Quantile Regression (WL)
Here the template is automatically determined:
use ML::NLPTemplateEngine;
my $qrCommand = q:to/END/;
Compute quantile regression with probabilities 0.4 and 0.6, with interpolation order 2, for the dataset dfTempBoston.
END
concretize($qrCommand);
# qrObj=
# QRMonUnit[dfTempBoston]ā¹
# QRMonEchoDataSummary[]ā¹
# QRMonQuantileRegression[N/A, {0.4, 0.6}, InterpolationOrder->2]ā¹
# QRMonPlot["DateListPlot"->True,PlotTheme->"Detailed"]ā¹
# QRMonErrorPlots["RelativeErrors"->False,"DateListPlot"->True,PlotTheme->"Detailed"];
Remark: In the code above the template type, "QuantileRegression", was determined using an LLM-based classifier.
Latent Semantic Analysis (R)
my $lsaCommand = q:to/END/;
Extract 20 topics from the text corpus aAbstracts using the method NNMF.
Show statistical thesaurus with the words neural, function, and notebook.
END
concretize($lsaCommand, template => 'LatentSemanticAnalysis', lang => 'R');
# lsaObj <-
# LSAMonUnit(aAbstracts) %>%
# LSAMonMakeDocumentTermMatrix(stemWordsQ = FALSE, stopWords = $*stopWords) %>%
# LSAMonEchoDocumentTermMatrixStatistics(logBase = 10) %>%
# LSAMonApplyTermWeightFunctions(globalWeightFunction = "$*globalWeightFunction", localWeightFunction = "$*localWeightFunction", normalizerFunction = "$*normalizerFunction") %>%
# LSAMonExtractTopics(numberOfTopics = 20, method = "NNMF", maxSteps = $*maxSteps, minNumberOfDocumentsPerTerm = $*minNumberOfDocumentsPerTerm) %>%
# LSAMonEchoTopicsTable(numberOfTerms = FALSE, wideFormQ = TRUE) %>%
# LSAMonEchoStatisticalThesaurus(words = c("neural", "function", "notebook"))
Random tabular data generation (Raku)
my $command = q:to/END/;
Make random table with 6 rows and 4 columns with the names <A1 B2 C3 D4>.
END
concretize($command, template => 'RandomTabularDataset', lang => 'Raku', llm => 'gemini');
# random-tabular-dataset(6, 4, "column-names-generator" => <A1, B2, C3, D4>, "form" => "Random table", "max-number-of-values" => 24, "min-number-of-values" => 24, "row-names" => $*rowKeys)
Remark: In the code above it was specified to use Google's Gemini LLM service.
How it works?
The following flowchart describes how the NLP Template Engine involves a series of steps for processing a computation specification and executing code to obtain results:
Here's a detailed narration of the process:
Computation Specification:
The process begins with a "Computation spec", which is the initial input defining the requirements or parameters for the computation task.
Workflow Type Decision:
A decision node asks if the workflow type is specified.
Guess Workflow Type:
If the workflow type is not specified, the system utilizes a classifier to guess relevant workflow type.
Raw Answers:
Regardless of how the workflow type is determined (directly specified or guessed), the system retrieves "raw answers", crucial for further processing.
Processing and Templating:
The raw answers undergo processing ("Process raw answers") to organize or refine the data into a usable format.
Processed data is then utilized to "Complete computation template", preparing for executable operations.
Executable Code and Results:
The computation template is transformed into "Executable code", which when run, produces the final "Computation results".
LLM-Based Functionalities:
The classifier and the answers finder are LLM-based.
Data and Templates:
Code templates are selected based on the specifics of the initial spec and the processed data.
Bring your own templates
0. Load the NLP-Template-Engine package (and others):
use ML::NLPTemplateEngine;
use Data::Importers;
use Data::Summarizers;
# (Any)
1. Get the "training" templates data (from CSV file you have created or changed) for a new workflow ("SendMail"):
my $url = 'https://raw.githubusercontent.com/antononcube/NLP-Template-Engine/main/TemplateData/dsQASParameters-SendMail.csv';
my @dsSendMail = data-import($url, headers => 'auto');
records-summary(@dsSendMail, field-names => <DataType WorkflowType Group Key Value>);
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+
# | DataType | WorkflowType | Group | Key | Value |
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+
# | Questions => 48 | SendMail => 60 | All => 9 | TypePattern => 12 | 0.35 => 9 |
# | Defaults => 7 | | What subject => 4 | Parameter => 12 | {_String..} => 8 |
# | Templates => 3 | | Which email address => 4 | Threshold => 12 | {"to", "email", "mail", "send", "it", "recipient", "addressee", "address"} => 4 |
# | Shortcuts => 2 | | Who is the receiver => 4 | ContextWordsToRemove => 12 | to => 4 |
# | | | Who to send it to => 4 | Template => 3 | None => 4 |
# | | | Who is it from => 4 | from => 1 | _String => 4 |
# | | | Who the email is from => 4 | SendMail => 1 | {"content", "body"} => 3 |
# | | | (Other) => 27 | (Other) => 7 | (Other) => 24 |
# +-----------------+----------------+-----------------------------+----------------------------+----------------------------------------------------------------------------------+
2. Add the ingested data for the new workflow (from the CSV file) into the NLP-Template-Engine:
add-template-data(@dsSendMail);
# (Defaults ParameterTypePatterns Templates ParameterQuestions Questions Shortcuts)
3. Parse natural language specification with the newly ingested and onboarded workflow ("SendMail"):
"Send email to [email protected] with content RandomReal[343], and the subject this is a random real call."
==> concretize(template => "SendMail")
# SendMail[<|"To"->{"[email protected]"},"Subject"->"this is a random real call","Body"->RandomReal[343],"AttachedFiles"->`attachedFiles`|>]
4. Experiment with running the generated code!
References
Articles
[Wk1] Wikipedia entry, Template processor.
[Wk2] Wikipedia entry, Question answering.
Functions, packages, repositories
[AAr1] Anton Antonov, "NLP Template Engine", (2021-2022), GitHub/antononcube.
[AAp1] Anton Antonov, NLPTemplateEngine WL paclet, (2023), Wolfram Language Paclet Repository.
[AAp2] Anton Antonov, DSL::Translators Raku package, (2020-2024), GitHub/antononcube.
[WRI1] Wolfram Research, FindTextualAnswer, (2018), Wolfram Language function, (updated 2020).
Videos
[AAv1] Anton Antonov, "NLP Template Engine, Part 1", (2021), YouTube/@AAA4Prediction.
[AAv2] Anton Antonov, "Natural Language Processing Template Engine" presentation given at WTC-2022, (2023), YouTube/@Wolfram.