ldami: Hash key order : beware of implicit assumptions

Saturday, December 1, 2012

Hash key order : beware of implicit assumptions

Perl hashes are not ordered, so one is not supposed to make assumptions about the key order. I thought I did not ... but Perl 5.17.6 showed me that I was wrong !

About two weeks ago I started receiving report about test failures which were totally incomprehensible to me. Since I work on Windows, I had no bleadperl environment, so it was hard to guess what was wrong just from the test reports. Andreas König kindly opened a ticket in which he spotted that the problem was related to a recent change in bleadperl : now Perl not only makes no guarantee about the key order, it even guarantees that the key order will be different through several runs of the same program!

This convinced me of investing some time to get a bleadperl environment on my Windows machine : VMware player + a virtual Unbutu + perlbrew did the job perfectly. Now I could start working on the bug.

The point were I was making an implicit assumption was a bit nasty, so I thought it was worth writing this post to share it : the code in SQL::Abstract::More more or less went like this :

my $ops = join "|", map quotemeta, keys %hash;
my $regex = qr/^($ops)?($rest_of_regex)/;

See the problem ? The regex starts with an alternation derived from the hash keys. At first glance one would think that the order of members in the alternation is not important ... except when one member is a prefix of the other, because the first member wins. For example, matching "absurd" against qr/^(a|ab|abc)?(.*)/ is not the same as qr/^(abc|ab|a)?(.*)/ : in one case $1 will contain 'a', in the other case it will contain 'ab'.

To fix the problem, the code above was rewritten to put the longest keys first, and everything is fine again.

6 comments:

AnonymousDecember 3, 2012 at 1:08 PM
Does keys() and values() use the same order? Because I frequently use this rto generate reverse lookup tables:

my %someLUT = ( foo => 1, bar => 2, baz => 3 );
my %reverseLUT;
@reverseLUT{ values %someLUT } = keys %someLUT; # ( 1 => 'foo', 2 => 'bar', 3 => 'baz' )

Will that still work?
ReplyDelete
Replies
Ruslan ZakirovDecember 3, 2012 at 2:04 PM
That will work. Order of things returned by keys and values builtins is matching as long as hash is not changed between the calls.
ReplyDelete
Replies
Maxim VuetsDecember 3, 2012 at 2:17 PM
I would not count on that though. It may be easier to build a reverse lookup table like this:

my %reverseLUT = reverse %someLUT;

Or more explicitly (for someone):

my %reverseLUT;
while (my ($k, $v) = each %someLUT) {
$reverseLUT{$v} = $k;
}

or...

my %reverseLUT;
for my $k (keys %someLUT) {
my $v = $someLUT{$k};
$reverseLUT{$v} = $k;
}
ReplyDelete
Replies
Olivier Mengué (DOLMEN)April 16, 2014 at 4:34 PM
Better than PerlBrew: https://github.com/tokuhirom/plenv
ReplyDelete
Replies

Add comment

ldami

Saturday, December 1, 2012

Hash key order : beware of implicit assumptions

6 comments:

Followers

Blog Archive

About Me