Data::Translators
Data::Translators
Raku package for translation of JSON specs or JSON-like data structures into other formats.
It is envisioned this package to have translators to multiple formats. For example:
DONE HTML
DONE R
DONE JSON
TODO Plain text
TODO Python
TODO Mermaid-JS
TODO Julia
TODO WL
TODO SQL
The main motivation for making the package is to have convenient way of making tables while doing Literate programming with Raku using:
Computational Markdown documents, [AAp4]
Jupyter notebooks, [BDp1]
Mathematica notebooks, [AAp4]
The use of JSON came to focus, since when working Large Language Model (LLM) functions, [AAp3], very often it is requested from LLMs to produce output in JSON format, [AA1, AA2].
The package "Data::Reshapers", [AAp1], would complement nicely "Data::Translators" and vice versa. The package "Data::TypeSystem", [AAp2], is used for "translation decisions" and for conversions into more regular datasets.
The package "Mathematica::Serializer", [AAp5], has very similar mission -- it is for translating Raku data structures into Mathematica (aka Wolfram Language or WL) code.
Remark: The provided converters are made for communication purposes, so they might not be very performant. I have used or tested them with datasets that have less than 5000 rows.
Installation
Package installations from both sources use zef installer (which should be bundled with the "standard" Rakudo installation file.)
To install the package from Zef ecosystem use the shell command:
zef install Data::Translators
To install the package from the GitHub repository use the shell command:
zef install https://github.com/antononcube/Raku-JSON-Translators.git
Basic usage
Main use case
Here is a "main use case" example:
Get a dataset that is an array of hashes
Filter or sample the records
Make an HTML table with those records
The HTML table outputs can be used to present datasets nicely in:
Markdown documents
Jupyter notebooks
Here we get the Titanic dataset and sample it:
use Data::Reshapers;
use Data::TypeSystem;
use Data::Translators;
my $tbl = get-titanic-dataset.pick(3);
# ({id => 85, passengerAge => 40, passengerClass => 1st, passengerSex => male, passengerSurvival => died} {id => 1185, passengerAge => -1, passengerClass => 3rd, passengerSex => male, passengerSurvival => died} {id => 503, passengerAge => 40, passengerClass => 2nd, passengerSex => female, passengerSurvival => survived})
Here is the corresponding dataset type:
deduce-type($tbl);
# Vector(Assoc(Atom((Str)), Atom((Str)), 5), 3)
Here is the corresponding HTML table:
$tbl ==> data-translation
We can specify field names and HTML table attributes:
$tbl ==> data-translation(field-names => <id passengerSurvival>, table-attributes => 'id="info-table" class="table table-bordered table-hover" text-align="center"');
Here is how the transposed dataset is tabulated:
$tbl ==> transpose() ==> data-translation;
From JSON strings
Here is a JSON string translation to HTML:
my $json1 = q:to/END/;
{
"sample": [
{"name": "json2html", "desc": "coverts json 2 html table format", "lang": "python"},
{"name": "testing", "desc": "clubbing same keys of array of objects", "lang": "python"}
]
}
END
data-translation($json1);
Cross-tabulated data
Here is a more involved data example:
data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
Compare the HTML table above with the following plain text table:
to-pretty-table(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'))
# +--------+------+----------+
# | | died | survived |
# +--------+------+----------+
# | female | 127 | 339 |
# | male | 682 | 161 |
# +--------+------+----------+
Generation of R code
Here is the R code version of the Titanic data sample:
$tbl ==> data-translation(target => 'R', ield-names => <id passengerClass passengerSex passengerAge passengerSurvival>)
data.frame(`passengerSex` = c("male", "male", "female"),
`id` = c("85", "1185", "503"),
`passengerSurvival` = c("died", "died", "survived"),
`passengerAge` = c("40", "-1", "40"),
`passengerClass` = c("1st", "3rd", "2nd"))
Here is the R code version of the contingency table:
data-translation(cross-tabulate(get-titanic-dataset, 'passengerSex', 'passengerSurvival'). target => 'R')
#ERROR: No such method 'target' for invocant of type 'Hash'
Nil
Nicer datasets
In order to obtain datasets or more regular datasets the function to-dataset
can be used.
Here a rugged dataset is made regular and converted to an HTML table:
my @tbl2 = get-titanic-dataset.pick(6);
@tbl2 = @tbl2.map({ $_.pick((1..5).pick).Hash });
@tbl2 ==> to-dataset(missing-value=>'ć»') ==> data-translation
Here a hash is transformed into dataset with columns <Key Value>
and then converted into an HTML table:
{ 4 => 'a', 5 => 'b', 8 => 'c'} ==> to-dataset() ==> data-translation
Implementation notes
The "need" for this package became evident while working on the notebooks/articles [AA1, AA2].
Initially, I translated plain text tables into HTML.
Using LLMs or
md-interpret
provided by "Markdown::Grammar".
I considered re-using the code behind
to-pretty-table
provided by "Data::Reshapers", [AAp1].This is "too much work" and I wanted a lighter weight package.
Having a solution for the more general problem translating JSON to HTML seemed a much better and easier option.
For example, I hoped that someone has already solved that problem for Raku.
Since I did not find Raku packages for the translation I wanted, I looked for solutions into the Python ecosystem.
... And found "json2html".
Using ChatGPT-4.0 I translated the only class of that package from Python into Raku.
The obtained translation could be executed with relatively minor changes.
I further refactored and enhanced the HTML translator to fit my most frequent Raku workflows.
The ingestion of JSON strings is done with the package "JSON::Fast".
Hence the conversion to JSON "comes for free" using
to-json
from that package.
The initial versions of the package did not have the "umbrella" function
data-translation
.Only the "lower level" functions
json-to-html
andjson-to-r
were provided. (Still available.)
CLI
The package provides a Command Line Interface (CLI) script. Here is its usage message:
data-translation --help
# Usage:
# data-translation <data> [-t|--target=<Str>] [--encode] [--escape] [--field-names=<Str>] -- Convert data into another format.
#
# <data> Data to convert.
# -t|--target=<Str> Target to convert to, one of <JSON HTML R>. [default: 'HTML']
# --encode Whether to encode or not. [default: False]
# --escape Whether to escape or not. [default: False]
# --field-names=<Str> Field names to use for Map objects, separated with ';'. [default: '']
Here is an example application (to this file):
data-translation ./resources/professionals.json --field-names='data;id;name;age;profession'
References
Articles
[AA1] Anton Antonov, "Workflows with LLM functions", (2023), RakuForPrediction at WordPress.
[AA2] Anton Antonov, "TLDR LLM solutions for software manuals", (2023), RakuForPrediction at WordPress.
Packages
[AAp1] Anton Antonov, Data::Reshapers Raku package, (2021-2023), GitHub/antononcube.
[AAp2] Anton Antonov, Data::TypeSystem Raku package, (2023), GitHub/antononcube.
[AAp3] Anton Antonov, LLM::Functions Raku package, (2023), GitHub/antononcube.
[AAp4] Anton Antonov, Text::CodeProcessing Raku package, (2021-2023), GitHub/antononcube.
[AAp5] Anton Antonov, Mathematica::Serializer Raku package, (2021-2022), GitHub/antononcube.
[BDp1] Brian Duggan, Jupyter:Kernel Raku package, (2017-2023), GitHub/bduggan.
[VMp1] Varun Malhotra, json2html Python package, (2013-2021), GitHub/softvar.