Data::Summarizers
Raku Data::Summarizers
This Raku package has data summarizing functions for different data structures that are coercible to full arrays.
The supported data structures (so far) are:
1D Arrays
1D Lists
Positional-of-hashes
Positional-of-arrays
Usage examples
Setup
Here we load the Raku modules Data::Generators, Data::Reshapers and this module, Data::Summarizers:
use Data::Generators;
use Data::Reshapers;
use Data::Summarizers;
# (Any)
Summarize vectors
Here we generate a numerical vector, place some NaN's or Whatever's in it:
my @vec = [^1001].roll(12);
@vec = @vec.append( [NaN, Whatever, Nil]);
@vec .= pick(@vec.elems);
@vec
# [740 311 434 300 (Whatever) 192 705 202 576 561 544 NaN (Any) 744 133]
Here we summarize the vector generated above:
records-summary(@vec)
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāO
# ā numerical ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāO
# ā 1st-Qu => 251 ā
# ā Max => 744 ā
# ā Median => 489 ā
# ā (Any-Nan-Nil-or-Whatever) => 3 ā
# ā Mean => 453.5 ā
# ā Min => 133 ā
# ā 3rd-Qu => 640.5 ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāO
Summarize tabular datasets
Here we generate a random tabular dataset with 16 rows and 3 columns and display it:
srand(32);
my $tbl = random-tabular-dataset(16,
<Pet Ref Code>,
generators=>[random-pet-name(4), -> $n { ((^20).rand xx $n).List }, random-string(6)]);
to-pretty-table($tbl)
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
# ā Code ā Ref ā Pet ā
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
# ā A2Ue69EWAMtJCi ā 0.050176 ā Guinness ā
# ā KNwmt0QmoqABwR ā 0.731900 ā Truffle ā
# ā A2Ue69EWAMtJCi ā 0.739763 ā Jumba ā
# ā aY ā 7.342107 ā Guinness ā
# ā xgZjtSP6VrKbH ā 19.868591 ā Jumba ā
# ā 20CO9FGD ā 12.956172 ā Jumba ā
# ā 20CO9FGD ā 15.854088 ā Guinness ā
# ā A2Ue69EWAMtJCi ā 4.774780 ā Guinness ā
# ā A2Ue69EWAMtJCi ā 18.729798 ā Guinness ā
# ā xgZjtSP6VrKbH ā 13.383997 ā Guinness ā
# ā aY ā 9.837488 ā Jumba ā
# ā 20CO9FGD ā 2.912506 ā Truffle ā
# ā xgZjtSP6VrKbH ā 11.782221 ā Truffle ā
# ā KNwmt0QmoqABwR ā 9.825102 ā Truffle ā
# ā xgZjtSP6VrKbH ā 16.277717 ā Jumba ā
# ā CQmrQcQ4YkXvaD ā 1.740695 ā Guinness ā
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
Remark: The values of the column "Pet" is sampled from a set of four pet names, and the values of the column and "Code" is sampled from a set of 6 strings.
Here we summarize the tabular dataset generated above:
records-summary($tbl)
# OāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Pet ā Ref ā Code ā
# OāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Guinness => 7 ā Min => 0.0501758995572299 ā xgZjtSP6VrKbH => 4 ā
# ā Jumba => 5 ā 1st-Qu => 2.3266005718178704 ā A2Ue69EWAMtJCi => 4 ā
# ā Truffle => 4 ā Mean => 9.175443804770861 ā 20CO9FGD => 3 ā
# ā ā Median => 9.831294839627123 ā KNwmt0QmoqABwR => 2 ā
# ā ā 3rd-Qu => 14.619042446877677 ā aY => 2 ā
# ā ā Max => 19.868590809216744 ā CQmrQcQ4YkXvaD => 1 ā
# OāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
Summarize collections of tabular datasets
Here is a hash of tabular datasets:
my %group = group-by($tbl, 'Pet');
%group.pairs.map({ say("{$_.key} =>"); say to-pretty-table($_.value) });
# Guinness =>
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
# ā Code ā Ref ā Pet ā
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
# ā A2Ue69EWAMtJCi ā 0.050176 ā Guinness ā
# ā aY ā 7.342107 ā Guinness ā
# ā 20CO9FGD ā 15.854088 ā Guinness ā
# ā A2Ue69EWAMtJCi ā 4.774780 ā Guinness ā
# ā A2Ue69EWAMtJCi ā 18.729798 ā Guinness ā
# ā xgZjtSP6VrKbH ā 13.383997 ā Guinness ā
# ā CQmrQcQ4YkXvaD ā 1.740695 ā Guinness ā
# OāāāāāāāāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāO
# Truffle =>
# OāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāāāāāāāO
# ā Pet ā Ref ā Code ā
# OāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāāāāāāāO
# ā Truffle ā 0.731900 ā KNwmt0QmoqABwR ā
# ā Truffle ā 2.912506 ā 20CO9FGD ā
# ā Truffle ā 11.782221 ā xgZjtSP6VrKbH ā
# ā Truffle ā 9.825102 ā KNwmt0QmoqABwR ā
# OāāāāāāāāāOāāāāāāāāāāāOāāāāāāāāāāāāāāāāO
# Jumba =>
# OāāāāāāāāāāāOāāāāāāāāāāāāāāāāOāāāāāāāO
# ā Ref ā Code ā Pet ā
# OāāāāāāāāāāāOāāāāāāāāāāāāāāāāOāāāāāāāO
# ā 0.739763 ā A2Ue69EWAMtJCi ā Jumba ā
# ā 19.868591 ā xgZjtSP6VrKbH ā Jumba ā
# ā 12.956172 ā 20CO9FGD ā Jumba ā
# ā 9.837488 ā aY ā Jumba ā
# ā 16.277717 ā xgZjtSP6VrKbH ā Jumba ā
# OāāāāāāāāāāāOāāāāāāāāāāāāāāāāOāāāāāāāO
Here is the summary of that collection of datasets:
records-summary(%group)
# summary of Guinness =>
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāO
# ā Ref ā Code ā Pet ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāO
# ā Min => 0.0501758995572299 ā A2Ue69EWAMtJCi => 3 ā Guinness => 7 ā
# ā 1st-Qu => 1.7406953436440742 ā CQmrQcQ4YkXvaD => 1 ā ā
# ā Mean => 8.839377375678543 ā 20CO9FGD => 1 ā ā
# ā Median => 7.34210706081909 ā xgZjtSP6VrKbH => 1 ā ā
# ā 3rd-Qu => 15.854088005472917 ā aY => 1 ā ā
# ā Max => 18.72979803423013 ā ā ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāO
# summary of Truffle =>
# OāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Pet ā Ref ā Code ā
# OāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Truffle => 4 ā Min => 0.7318998724597869 ā KNwmt0QmoqABwR => 2 ā
# ā ā 1st-Qu => 1.822202836225727 ā 20CO9FGD => 1 ā
# ā ā Mean => 6.312932174017679 ā xgZjtSP6VrKbH => 1 ā
# ā ā Median => 6.368803873269801 ā ā
# ā ā 3rd-Qu => 10.803661511809633 ā ā
# ā ā Max => 11.782221077071329 ā ā
# OāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# summary of Jumba =>
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Ref ā Pet ā Code ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
# ā Min => 0.7397628145038704 ā Jumba => 5 ā xgZjtSP6VrKbH => 2 ā
# ā 1st-Qu => 5.28862527360509 ā ā 20CO9FGD => 1 ā
# ā Mean => 11.935946110102654 ā ā A2Ue69EWAMtJCi => 1 ā
# ā Median => 12.956171789492936 ā ā aY => 1 ā
# ā 3rd-Qu => 18.073154106905072 ā ā ā
# ā Max => 19.868590809216744 ā ā ā
# OāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāOāāāāāāāāāāāāOāāāāāāāāāāāāāāāāāāāāāO
Skim
TBD...
TODO
User specified
NA
markerTabular dataset summarization tests
Skimmer
Peek-er
References
Functions, repositories
[AAf1] Anton Antonov, RecordsSummary, (2019), Wolfram Function Repository.