There’s a relatively common pattern I see with people writing code that counts… say, DNA bases in a string:
my%counts;%counts{$_}++for'AGTCAGTCAGTCTTTCCCAAAAT'.comb;say%counts<A T G C>; # (7 7 3 6)
Make a Hash
. For each thing you want to count, ++
that key in that Hash
. So what’s the problem?
Perl 6 actually has specialized types that are more appropriate for this operation; for example, the Bag
:
'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)
Let’s talk about these types and all the fancy operators that come with them!
A Note on Unicode
I’ll be using fancy-pants Unicode versions of operators and symbols in this post, because they look purty. However, all of them have what we call “Texas” equivalents you can use instead.
Ready. Set. Go.
The simplest of these types is a Set
. It will keep exactly one of each item, so if you have multiple objects that are the same, the duplicates will be discarded:
say set 1, 2, 2, "foo", "a", "a", "a", "a", "b";# OUTPUT: set(a, foo, b, 1, 2)
As you can see, the result has only one a
and only one 2
. We can use the ∈
, U+2208 ELEMENT OF
, set membership operator to check if an item is in a set:
my$mah-peeps= set <babydrop iBakeCake Zoffix viki>;say'Weeee \o/'if'Zoffix'∈$mah-peeps;# OUTPUT: Weeee \o/
The set operators are coercive, so we don’t need to explicitly create a set; they’ll do it for us:
say'Weeee \o/'if'Zoffix'∈<babydrop iBakeCake Zoffix viki>;# OUTPUT: Weeee \o/
But pay attention when using allomorphs:
say'Weeee \o/'if42∈<1 42 72>;# No outputsay'Weeee \o/'if42∈+«<1 42 72>; # coerce allomorphs to Numeric# OUTPUT: Weeee \o/
The angle brackets create allomorphs for numerics, so in the first case above, our set contains a bunch of IntStr
objects, while the left hand side of the operator has a regular Int
, and so the comparison fails. In the second case, we coerce allomorphs to their numeric component with a hyper operator and the test succeeds.
While testing membership is super exciting, we can do more with our sets! How about some intersections?
my$admins= set <Zoffix mst [Coke] lizmat>;my$live-in-North-America= set <Zoffix [Coke] TimToady huggable>;my$North-American-admins=$admins∩$live-in-North-America;say$North-American-admins;# OUTPUT: set(Zoffix, [Coke])
We intersected two sets with the ∩
, U+2229 INTERSECTION
, intersection operator and received a set that contains only the elements present in both original sets. You can chain these operations too, so membership will be checked in all of the provided sets in the chain:
say<Zoffix lizmat>∩<huggable Zoffix>∩<TimToady huggable Zoffix>;# OUTPUT: set(Zoffix)
Another handy operator is the set difference operator, whose Unicode look I find somewhat annoying: ∖
No, it’s not a backslash (\
), but a U+2216 SET MINUS
character (luckily, you can use the much more obvious (-)
Texas version).
The usefulness of the operator compensates its shoddy looks:
my@spammers=<spammety@sam.com spam@in-a-can.com yum@spam.com>;my@senders=<perl6@perl6.org spammety@sam.com good@guy.com>;forkeys@senders∖@spammers->$non-spammer {say"Message from $non-spammer";
}# OUTPUT:# Message from perl6@perl6.org# Message from good@guy.com
We have two arrays: one contains a list of spammers’ addresses and another contains a list of senders. How to get a list of senders, without any spammers in it? Just use the ∖
(fine, fine, the (-)
) operator!
We then use the for
loop to iterate over the results, and as you can see from the output, all spammers were omitted… But why is keys
there?
The reason is Setty
and Mixy
types are a lot like hashes, in a sense that they have keys and values for those keys. Set
types always have True
as values, and since we don’t care about iterating over Pair
objects in our loop, we use the keys
to get just the keys of the set: the email addresses.
However, hash-like semantics aren’t useless on Set
s. For example, we can take a slice, and with :k
adverb return just the elements that the set contains:
my$meows= set <
Abyssinian Aegean Manx Siamese Siberian Snowshoe
Sokoke Sphynx Suphalak Thai>;say$meows<Sphynx Raas Ragamuffin Thai>:k;# OUTPUT: (Sphynx Thai)
But what happens if we try to delete an item from a set?
say$meows<Siamese>:delete;# Cannot call 'DELETE-KEY' on an immutable 'Set'# in block <unit> at z.p6 line 6
We can’t! The Set
type is immutable. However, just like Map
type has a mutable version Hash
, so does the Set
type has a mutable version: the SetHash
. There isn’t a cutesy helper sub to create one, so we’ll use the constructor instead:
my$s=SetHash.new:<a a a b c d>;say$s;$s<a d>:delete;say$s;# SetHash.new(a, c, b, d)# SetHash.new(c, b)
Voilà! We successfully deleted a slice. So, what other goodies does Santa have in his… bag?
Gag ’em ‘n’ Bag ’em
Related to Sets is another type: a Bag
, and yes, it’s also immutable, with the corresponding mutable type being BagHash
. We already saw at the start of this article we can use a Bag
to count stuff, and just like a Set
, a Bag
is hash-like, which is why we could view a slice of the four DNA amino acids:
'AGTCAGTCAGTCTTTCCCAAAAT'.comb.Bag<A T G C>.say; # (7 7 3 6)
While a Set
has all values set to True
, a Bag
‘s values are integer weights. If you put two things that are the same into a Bag
there’ll be just one key for them, but the value will be 2
:
my$recipe= bag 'egg', 'egg', 'cup of milk', 'cup of flour';say$recipe;# OUTPUT: bag(cup of flour, egg(2), cup of milk)
And of course, we can use our handy operators to combine bags! Here, we’ll be using ⊎
, U+228E MULTISET UNION
, operator, which looks a lot clearer in its Texas version: (+)
my$pancakes= bag 'egg', 'egg', 'cup of milk', 'cup of flour';my$omelette= bag 'egg', 'egg', 'egg', 'cup of milk';my$shopping-bag=$pancakes⊎$omelette⊎<gum chocolate>;say$shopping-bag;# bag(gum, cup of flour, egg(5), cup of milk(2), chocolate)
We used two of our Bag
s along with a 2-item list, which got correctly coerced for us, so we didn’t have to do anything.
A more impressive operator is ≼
, U+227C PRECEDES OR EQUAL TO
, and it’s mirror ≽
, U+227D SUCCEEDS OR EQUAL TO
, which tell whether a Baggy
on the narrow side of the operator is a subset of the Baggy
on the other side; meaning all the objects in the smaller Baggy
are present in the larger one and their weights are at most as big.
Here’s a challenge: we have some materials and some stuff we want to build. Problem is, we don’t have enough materials to build all the stuff, so what we want to know is what combinations of stuff we can build. Let’s use some Bag
s!
my$materials= bag 'wood'xx300, 'glass'xx100, 'brick'xx3000;my@wanted=
bag('wood'xx200, 'glass'xx50, 'brick'xx3000) but'house',
bag('wood'xx100, 'glass'xx50) but'shed',
bag('wood'xx50) but'dog-house';say'We can build...';.putfor@wanted.combinations.grep: { $materials≽ [⊎] |$^stuff-we-want };# OUTPUT:# We can build...## house# shed# dog-house# house shed# house dog-house# shed dog-house
The $materials
is a Bag
with our materials. We used xx
repetition operator to indicate quantities of each. Then we have a @wanted
Array
with three Bag
s in it: that’s the stuff we want to build. We’ve also used used the but
operator on them to mix in names for them to override what those bags will .put
out as at the end.
Now for the interesting part! We call .combinations
on our list of stuff we want, and just as the name suggests, we get all the possible combinations of stuff we can build. Then, we .grep
over the result, looking for any combination that takes at most all of the materials we have (that’s the ≽
operator). On it’s fatter end, we have our $materials
Bag
and on its narrower end, we have the ⊎
operator that adds the bags of each combination of our stuff we want together, except we use it as a metaoperator [⊎]
, which is the same as putting that operator between each item of $^stuff-we-want
. In case you it’s new to you: the $^
twigil on $^stuff-we-want
creates an implicit signature on our .grep
block and we can name that variable anything we want.
And there we have it! The output of the program shows we can build any combination of stuff, except the one that contains all three items. I guess we just can’t have it all…
…But wait! There’s more!
Mixing it Up
Let’s look back at our recipe code. There’s something not quite perfect about it:
my$recipe= bag 'egg', 'egg', 'cup of milk', 'cup of flour';say$recipe;# OUTPUT: bag(cup of flour, egg(2), cup of milk)
What if a recipe calls for half a cup of milk instead of a whole one? How do we represent a quarter of a teaspoon of salt, if Bags
can only ever have integer weights?
The answer to that is the Mix
type (with the corresponding mutable version, MixHash
). Unlike a Bag
, a Mix
supports all Real
weights, including negative weights. Thus, our recipe is best modeled with a Mix
.
my$recipe=Mix.new-from-pairs:'egg'=>2, 'cup of milk'=>½,
'cup of flour'=>¾, 'salt'=>¼;
say$recipe;# mix(salt(0.25), cup of flour(0.75), egg(2), cup of milk(0.5))
Be sure to quote your keys and don’t use colonpair form (:42a
, or :a(42)
), since those are treated as named arguments. There’s also a mix
routine, but it doesn’t take weights and functions just like bag
routine, except returning a Mix
. And, of course, you can use a .Mix
coercer on a hash or a list of pairs.
Less-Than-Awesome creation aside, let’s make something with mixes! Say, you’re an alchemist. You want to make a bunch of awesome potions and you need to know the total amount of ingredients you’ll need. However, you realize that some of the ingredients needed by some reactions are actually produced as a byproduct by other reactions you’re making. So, what’s the most efficient amount of stuff you’ll need? Mix
es to the rescue!
my%potions=
immortality => (:oxium(6.98), :morphics(123.3), :unobtainium(2) ).Mix,
invisibility => (:forma(9.85), :rubidium(56.3), :unobtainium(−0.3)).Mix,
strength => (:forma(9.15), :rubidium(−30.3), :kuva(0.3) ).Mix,
speed => (:forma(1.35), :nano-spores(1.3), :kuva(1.3) ).Mix;say [⊎] %potions.values;# OUTPUT: mix(unobtainium(1.7), nano-spores(1.3), morphics(123.3),# forma(20.35), oxium(6.98), rubidium(26), kuva(1.6))
For convenience, we set up a Hash
, with keys being names of potions and values being Mix
es with quantities of ingredients. For reactions that produce one of the ingredients we seek, we’ve used negative weights, indicating the amount produced.
Then, we used the same ⊎
set addition operator we saw earlier, in it’s meta form: [⊎]
. We supply it the .values
of our Hash
that are our Mix
es, and it happily adds up all of our ingredients, which we see in the output.
Look at unobtainium
and rubidium
: the set operator correctly accounted for the quantities produced by reactions where those ingredients have negative weights!
With immortality potion successfully mixed, all we need to do now is figure out what to do for the next few millennia… How about coding some Perl 6?