Abbreviations
NAME
Abbreviations - Provides abbreviations for an input set of one or more words
Note: This version uses API 2 and is not compatible with previous versions.
SYNOPSIS
use Abbreviations;
my $words = 'A ab Abcde';
# The main exported routine:
my %abbrevs = abbreviations $words;
There are two shorter routine name abbreviations one can use that are always exported:
abbrevs
abbrev
In the sprit of the module, one can use Abbreviations :ALL;
and have these additional short forms available:
abbre
abbr
abb
ab
a
Each of those is individually available by adding its name as an adverb, e.g.:
use Abbreviations :abb;
my %abb = abb $words;
DESCRIPTION
Abbreviations is a module with one automatically exported subroutine, abbreviations
, which takes as input a set of words and returns the original set with added unique abbreviations for the set. (Note the input words are also abbreviations in the context of this module.)
A word satisfies the Raku regex $word ~~ /\S+/
which is quite loose. Using programs can of course further restrict that if need be. For example, for use with module Opt::Handler words must satisfy this regex: $word ~~ /<ident>/
.
The input word set can be in one of two forms: a list (recommended) or a string containing the words separated by spaces. Duplicate words will be automatically and quietly eliminated. An empty word set will cause an exception.
Note the input word set will not be modified unless the :lower-case
option is used. In that case, all characters will be transformed to lower-case.
One will normally get the result as a hash, but the return type can be specified via an enum
if desired by selecting one of the :out-type
options: AH
(AbbrevHash), AL
(AbbrevList), H
(Hash), L
(List), or S
(String). For example,
my %abbrevs = abbrevs @words, :out-type(AH);
Output types by enum Out-type
H
(Hash)
The default Hash (H
) returned will have input words as keys whose value will be a sorted list of one or more valid abbreviations (sorted by length, shortest first).
AH
(AbbrevHash)
An AbbrevHash (AH
) is keyed by all of the valid abbreviations for the input word list and whose values are the word from which that abbreviation is defined.
AL
(AbbrevList)
An AbbrevList (AL
) is special in that the returned list is the one, shortest abbreviation for each of the input words in input order. For example,
my @w = <Monday Tuesday Wednesday Thursday Friday Saturday Sunday>;
my @abb = abbrevs @w, :lower-case, :out-type(AL);
say @abb; # OUTPUT: Ā«[m tu w th f sa su]ā¤Ā»
L
(List)
A List (L
) contains all of the valid abbreviations for the input word list, including the words themselves, sorted first by the default Raku sort and then by length (shortest first).
S
(String)
A String (S
) is the string formed by joining the List by a single space between words.
Improved abbreviation search
The abbreviation algorithm has been improved in the following way: The input word set is first formed into subgroups based on the the first character of each word, next the subgroups have their abbreviation sets formed, then all those sets are combined into one set. The result will be a larger number of available abbeviations in many cases than were available under the original API.
For example, given an input set consisting of the words A ab Abcde
, the min-abbrev-len is one or two for each subgroup and the default output hash of abbreviations (with the original words as keys) is now
A => ['A'],
ab => ['a', 'ab'],
Abcde => ['Ab', 'Abc', 'Abcd', 'Abcde'],
In contrast, without the initial subgrouping, the min-abbrev-len is three for the entire set and the result will be:
A => ['A'],
ab => ['ab'], # <- one less abbreviation
Abcde => ['Abc', 'Abcd', 'Abcde'], # <- one less abbreviation
If the :lower-case
option is used, we get a slightly different result since we no longer have any subgroups and the min-abbrev-len is again three.
my $words = 'A ab Abcde':
my %abbr = abbrevs $words, :lower-case;
The result is
a => ['a'],
ab => ['ab],
abcde => ['abc', 'abcd', 'abcde'],
Notice the input word ab now has only one abbreviation and abcde has only three.
Other exported symbols
sub sort-list
sub sort-list(@list, :longest-first --> List) is export(:sort) {...}
This routine sorts the input list first by the default Raku sort and then by word length. The order by length is by shortest abbreviation first unless the :longest-first
option is used. This is the routine used for all the lists produced as output in this module except for the AbbrevList (AL
) which keeps the original word set order.
enum Out-type
The enum Out-type
is exported automatically as it is required for use of sub abbreviations
.
sub auto-abbreviate
sub auto-abbreviate(@words) is export(:auto) {...}
This routine, slightly modified, is taken from the Rosetta Code website. Given a string consisting of space-separated words, it returns the minimum number of characters to abbreviate the set. It will fail on either an empty word list or one with duplicate words, so the user is fore-warned.
AUTHOR
Tom Browder [email protected]
CREDITS
Leon Timmermans (aka @Leont) for inspiration from his Raku module
Getopt::Long
.@thundergnat, the original author of the Raku
auto-abbreviate
algorithm on Rosetta Code.The Raku community for help with subroutine signatures.
COPYRIGHT and LICENSE
Copyright Ā© 2020-2021 Tom Browder
This library is free software; you may redistribute or modify it under the Artistic License 2.0.