ParaSeq
NAME
ParaSeq - Parallel execution of Iterables
SYNOPSIS
use ParaSeq;
DESCRIPTION
ParaSeq provides the functional equivalent of hyper and race, but re-implemented from scratch with all of the experience from the initial implementation of hyper
and in 2014, and using features that have since been added to the Raku Programming Language.
As such it exports two subroutines hyperize
and racify
, to make them plug-in compatible with the hyperize distribution.
IMPROVEMENTS
Automatic batch size adaptation
One of the main issues with the current implemementation of .hyper
and .race
in the Rakudo core is that the batch size is fixed. Worse, there is no way to dynamically adapt the batch size depending on the load.
Batch sizes that are too big, have a tendency to not use all of the CPUs (because they have a tendency to eat all of the source items too soon, thus removing the chance to start up more threads).
Batch sizes that are too small, have a tendency to have their resource usage drowned out by the overhead of batching and dispatching to threads.
This implementation aims to adapt batch sizes from the originally (implicitely) specified one for better throughput and resource usage.
Unnecessary parallelization
If the degree
specified is 1, then there is no point in batching or parallelization. In that case, this implementation will take itself completely out of the flow.
Alternately, if the initial batch size is large enough to exhaust the source, it is clearly too large. Which is interpreted as not making any sense at parallelization either. So it won't.
Note that the default initial batch size is 10, rather than 64 in the current implementation of .hyper
and .race
, making the chance smaller that parallelization is abandoned too soon.
Infectiousness
The .serial
method or .Seq
coercer can be typically be used to "unhyper" a hypered sequence. However many other interface methods do the same in the current implementation of .hyper
and .race
, thereby giving the impression that the flow is still parallelized. When in fact they aren't anymore.
Also, hyperized sequences in the current implementation are considered to be non-lazy, even if the source is lazy.
This implementation aims to make all interface methods pass on the hypered nature and laziness of the sequence.
Loop control statements
Some loop control statements may affect the final result. Specifically the last
statement does. In the current implementation of .hyper
and .race
, this will only affect the batch in which it occurs.
This implementation aims to make last
stop any processing of current and not create anymore batches.
Support more interface methods
Currently only the .map
and .grep
methods are completely supported by the current implementation of .hyper
and .race
. Other methods, such as .first
, will also be supported.
Use of phasers
When an interface method takes a Callable
, then that Callable
can contain phasers that may need to be called (or not called) depending on the situation. The current implementation of .hyper
and .race
do not allow phasers at all.
This implementation aims to support phasers in a sensible manner:
ENTER
Called before each iteration.
FIRST
Called on the first iteration in the first batch.
NEXT
Called at the end of each iteration.
LAST
Called on the last iteration in the last batch. Note that this can be short-circuited with a last
control statement.
LEAVE
Called after each iteration.
AUTHOR
Elizabeth Mattijsen [email protected]
Source can be located at: https://github.com/lizmat/ParaSeq . Comments and Pull Requests are welcome.
If you like this module, or what Iām doing more generally, committing to a small sponsorship would mean a great deal to me!
COPYRIGHT AND LICENSE
Copyright 2024 Elizabeth Mattijsen
This library is free software; you can redistribute it and/or modify it under the Artistic License 2.0.