Dan::Polars

Bridge Dan to Rust Polars

WORK IN PROGRESS

THIS MODULE IS EXPERIMENTAL AND SUBJECT TO CHANGE WITHOUT NOTICE

raku Dan::Polars - WIP

This new module binds Raku Dan to Polars via Raku NativeCall / Rust FFI.

The following broad capabilities are included:

  • Polars structures (Series & DataFrames)

  • Polars lazy queries & expressions

  • raku Dan features (accessors, dtypes, base methods)

  • broad datatype support

  • concurrency

The aim is to emulate the examples in the Polars User Guide

Installation

Based on the Dockerfile chain (1) FROM (2)

  • docker run -it p6steve/raku-dan:polars

  • cd ~/raku-Dan-Polars/bin (this repo was cloned to load test_data)

  • ./synopsis-dan-polars.raku or ./nutshell.raku

(see bin/setup.raku for manual / development install steps)

You are welcome to plunder the Dockerfiles for how to set up your own environment.

Nutshell

use Dan;
use Dan::Polars;

sub starwars {
    my \sw = DataFrame.new;
    sw.read_csv("test_data/dfStarwars.csv");
    sw
}

my $obj = starwars;
$obj .= select( [ <species mass height>>>.&col ] ) ;
$obj .= groupby([ <species> ]) ;
$obj .= sort( {$obj[$++]<species>, $obj[$++]<mass>} )[*].reverse^;

$obj.show;

shape: (87, 3)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ species        ā”† mass ā”† height ā”‚
ā”‚ ---            ā”† ---  ā”† ---    ā”‚
ā”‚ str            ā”† str  ā”† str    ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ Zabrak         ā”† NA   ā”† 171    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Zabrak         ā”† 80   ā”† 175    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Yoda's species ā”† 17   ā”† 66     ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Xexto          ā”† NA   ā”† 122    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ ...            ā”† ...  ā”† ...    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Chagrian       ā”† NA   ā”† 196    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Cerean         ā”† 82   ā”† 198    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Besalisk       ā”† 102  ā”† 198    ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Aleena         ā”† 15   ā”† 79     ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

#say ~$obj.Dan-DataFrame;       # coerce to a vanilla Dan::DataFrame (e.g. to say all rows)
#say ~$obj.to-dataset;          # or export to a dataset for ad hoc data munging

datasets are used by raku Data::... modules

Synopsis

Dan::Polars is a specialization of raku Dan. Checkout the Dan synopsis for base Series and DataFrame operations. The following covers the additional features that are specific to Dan::Polars.

  • Each Dan::Polars object (Series or DataFrame) contains a pointer to its Rust Polars "shadow".

  • Polars does not implement indexes, so any attempt to set a row index will be ignored.

  • Dan::Polars only exposes the Polars lazy API and quietly calls .lazy and .collect as needed.

use Dan;
use Dan::Polars;

my \df = DataFrame.new;
df.read_csv("../dan/src/iris.csv");

# ---------------------------------------

my $se = df.column("sepal.length");
$se.head;

# a Series...
shape: (5,)
Series: 'sepal.length' [f64]
[
	5.1
	4.9
	4.7
	4.6
	5.0
]

# ---------------------------------------

df.select([col("sepal.length"), col("variety")]).head;

# a DataFrame...
shape: (5, 2)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ sepal.length ā”† variety ā”‚
ā”‚ ---          ā”† ---     ā”‚
ā”‚ f64          ā”† str     ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ 5.1          ā”† Setosa  ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.9          ā”† Setosa  ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.7          ā”† Setosa  ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.6          ā”† Setosa  ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 5.0          ā”† Setosa  ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

# ---------------------------------------

df.groupby(["variety"]).agg([col("petal.length").sum]).head;

# -- or --
my $expr;
$expr  = col("petal.length");
$expr .= sum;
df.groupby(["variety"]).agg([$expr]).head;

# An Expression takes the form Fn(Series --> Series) {} ...

# a DataFrame...
shape: (2, 2)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ variety    ā”† petal.length ā”‚
ā”‚ ---        ā”† ---          ā”‚
ā”‚ str        ā”† f64          ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ Versicolor ā”† 141.4        ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Setosa     ā”† 73.1         ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

# ---------------------------------------
# Here are some unary expressions; the ```.alias``` method can be used to rename cols...

my @exprs;
@exprs.push: col("petal.length").sum;
#@exprs.push: col("sepal.length").mean;
#@exprs.push: col("sepal.length").min;
#@exprs.push: col("sepal.length").max;
#@exprs.push: col("sepal.length").first;
#@exprs.push: col("sepal.length").last;
#@exprs.push: col("sepal.length").unique;
#@exprs.push: col("sepal.length").count;
#@exprs.push: col("sepal.length").forward_fill;
#@exprs.push: col("sepal.length").backward_fill;
@exprs.push: col("sepal.length").reverse;
@exprs.push: col("sepal.length").std.alias("std");
#@exprs.push: col("sepal.length").var;
df.groupby(["variety"]).agg(@exprs).head;

shape: (2, 4)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ variety    ā”† petal.length ā”† sepal.length        ā”† std      ā”‚
ā”‚ ---        ā”† ---          ā”† ---                 ā”† ---      ā”‚
ā”‚ str        ā”† f64          ā”† list[f64]           ā”† f64      ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ Versicolor ā”† 141.4        ā”† [5.8, 5.5, ... 7.0] ā”† 0.539255 ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ Setosa     ā”† 73.1         ā”† [5.0, 5.3, ... 5.1] ā”† 0.3524   ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

# ---------------------------------------
# use col("*") to select all...

df.select([col("*").exclude(["sepal.width"])]).head;
df.select([col("*").sum]).head;

# ---------------------------------------
# Can do Expression math...

df.select([
    col("sepal.length"),
    col("petal.length"),
    (col("petal.length") + (col("sepal.length"))).alias("add"),
    (col("petal.length") - (col("sepal.length"))).alias("sub"),
    (col("petal.length") * (col("sepal.length"))).alias("mul"),
    (col("petal.length") / (col("sepal.length"))).alias("div"),
    (col("petal.length") % (col("sepal.length"))).alias("mod"),
    (col("petal.length") div (col("sepal.length"))).alias("floordiv"),
]).head;

shape: (5, 8)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ sepal.length ā”† petal.length ā”† add ā”† sub  ā”† mul  ā”† div      ā”† mod ā”† floordiv ā”‚
ā”‚ ---          ā”† ---          ā”† --- ā”† ---  ā”† ---  ā”† ---      ā”† --- ā”† ---      ā”‚
ā”‚ f64          ā”† f64          ā”† f64 ā”† f64  ā”† f64  ā”† f64      ā”† f64 ā”† f64      ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ 5.1          ā”† 1.4          ā”† 6.5 ā”† -3.7 ā”† 7.14 ā”† 0.2745   ā”† 1.4 ā”† 0.2745   ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.9          ā”† 1.4          ā”† 6.3 ā”† -3.5 ā”† 6.86 ā”† 0.285714 ā”† 1.4 ā”† 0.285714 ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.7          ā”† 1.3          ā”† 6.0 ā”† -3.4 ā”† 6.11 ā”† 0.276596 ā”† 1.3 ā”† 0.276596 ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.6          ā”† 1.5          ā”† 6.1 ā”† -3.1 ā”† 6.9  ā”† 0.326087 ā”† 1.5 ā”† 0.326087 ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 5.0          ā”† 1.4          ā”† 6.4 ā”† -3.6 ā”† 7.0  ā”† 0.28     ā”† 1.4 ā”† 0.28     ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

# ---------------------------------------
# And use literals...

df.select([
    col("sepal.length"),
    col("petal.length"),
    (col("petal.length") + 7).alias("add7"),
    (7 - col("petal.length")).alias("sub7"),
    (col("petal.length") * 2.2).alias("mul"),
    (2.2 / (col("sepal.length"))).alias("div"),
    (col("sepal.length") % 2).alias("mod"),
    (col("sepal.length") div 0.1).alias("floordiv"),
]).head;

shape: (5, 8)
ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”
ā”‚ sepal.length ā”† petal.length ā”† add7 ā”† sub7 ā”† mul  ā”† div      ā”† mod ā”† floordiv ā”‚
ā”‚ ---          ā”† ---          ā”† ---  ā”† ---  ā”† ---  ā”† ---      ā”† --- ā”† ---      ā”‚
ā”‚ f64          ā”† f64          ā”† f64  ā”† f64  ā”† f64  ā”† f64      ā”† f64 ā”† f64      ā”‚
ā•žā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•Ŗā•ā•ā•ā•ā•ā•ā•ā•ā•ā•ā•”
ā”‚ 5.1          ā”† 1.4          ā”† 8.4  ā”† 5.6  ā”† 3.08 ā”† 0.431373 ā”† 1.1 ā”† 51.0     ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.9          ā”† 1.4          ā”† 8.4  ā”† 5.6  ā”† 3.08 ā”† 0.4489   ā”† 0.9 ā”† 49.0     ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.7          ā”† 1.3          ā”† 8.3  ā”† 5.7  ā”† 2.86 ā”† 0.468085 ā”† 0.7 ā”† 47.0     ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 4.6          ā”† 1.5          ā”† 8.5  ā”† 5.5  ā”† 3.3  ā”† 0.478261 ā”† 0.6 ā”† 46.0     ā”‚
ā”œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā”¼ā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā•Œā”¤
ā”‚ 5.0          ā”† 1.4          ā”† 8.4  ā”† 5.6  ā”† 3.08 ā”† 0.44     ā”† 1.0 ā”† 50.0     ā”‚
ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”“ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

# ---------------------------------------
# There is a variant of with_column (for Series) and with_columns (for Expressions)

df.with_column($se.rename("newcol")).head;
df.with_columns([col("variety").alias("newnew")]).head;

Notes:

  • Most methods such as queries on the Raku object are applied to the shadow and the data remains on the Rust side for performance reasons. Exceptions are accessors and map operations which require the data to be synched manually to the Raku side using the .flood method and, when done, to be sent back Rustwards with .flush. Sort & grep methods (in their current incarnations) also implicitly use .flood/.flush to sync.

  • On reflection, the vanilla Dan splice & concat methods are not a good fit for Polars which has the simpler Series.append, DataFrame.vstack|.hstack and DataFrame.join methods. The new plan is to implement these Polars methods here (v0.2) and then to replace Dan and Dan::Pandas splice & concat with the Polars equivalent in the forthcoming As::Query role.

  • To avoid synching for say ~df operations, the .show method applies Rust println! to STDOUT.

  • For import and export, the se.Dan-Series and df.Dan-DataFrame methods will coerce to the raku-only Dan equivalent. You can go Series.new(Dan::Series:D --> Dan::Polars::Series) and DataFrame.new(Dan::DataFrame:D --> Dan::Polars::DataFrame).

  • The method df.to-dataset is provided to, err, make a dataset for various raku Data:: library compatibility

Design Notes

  1. lazy

Polars implements both lazy and eager APIs, these are functionally similar. For simplicity, Dan::Polars offers only the most efficient: lazy API. It has better query optimisation with low additional overhead.

  1. auto-lazy

In Rust & Python Polars, lazy must be explicitly requested with .lazy .. .collect methods around expressions. In contrast, Dan::Polars auto-generates the .lazy .. .collect quietly for concise syntax.

  1. pure

Polars Expressions are a function mapping from a series to a series (or mathematically Fn(Series) -> Series). As expressions have a Series as an input and a Series as an output then it is straightforward to do a pipeline of expressions.

  1. opaque

In general each raku object (Dan::Polars::Series, Dan::Polars::DataFrame) maintains a unique pointer to a rust container (SeriesC, DataFrameC) and they contain a shadow Rust Polars Struct. Methods invoked on the raku object are then proxied over to the Rust Polars shadow.

  1. dynamic lib.so

A connection is made via Raku Nativecall to Rust FFI using a ``lib.so` dymanic library or equivalent.

  1. data transfer

Usually no data needs to be transferred from Raku to Rust (or vice versa). For example, a raku script can command a Rust Polars DataFrame to be read from a csv file, apply expressions and output the result. The data items all remain on the Rust side of the connection.

TODOs

v0.1

  1. Dan API

    • Dan::Series base methods

    • Dan::DataFrame base methods

    • Dan Accessors

    • Dan sort & grep (s3)

  2. Polars Structs / Modules

    • Polars::Series base methods

    • Polars::DataFrame base methods

    • .push/.pull (set-new/get-data)

    • better value return

  3. Polars Exprs (s2)

    • unary exprs

    • operators

  4. Documentation

    • synopsis

    • refactor /bin

  5. Test

This will then provide a basis for ...

v0.2

  • Dan splice & concat (s1) as hstack, vstack, join

  • drop col

  • non-null, etc.

This will then provide a basis for design Dan::As::Query v0.1 for Dan and Dan::Pandas and review Dan API slice & concat, immutability, refactor...

v0.3...

  • datetime

  • unique_stable

  • expr arity > 1

  • 'over' expr

  • clone (then retest h2o-par)

  • immutability

  • reset @data after load rc (also to Pandas)

v0.4...

  • strip / fold Index

  • cross join (aka cross product)

  • pivot / cross-tabluate

  • map & apply (jit DSL style)

  • apply over multiple cols

  • ternary if-then-else (Dan::As::Ternary)

  • str operations (Dan::As::Str)

  • chunked transfer

  • serde

(c) Henley Cloud Consulting Ltd.

Dan::Polars v0.0.1

Bridge Dan to Rust Polars

Authors

  • p6steve

License

Artistic-2.0

Dependencies

DanTimer

Test Dependencies

Provides

  • Dan::Polars
  • Dan::Polars::Containers

Documentation

The Camelia image is copyright 2009 by Larry Wall. "Raku" is trademark of the Yet Another Society. All rights reserved.