Saturday, December 1, 2012

Hash key order : beware of implicit assumptions

Perl hashes are not ordered, so one is not supposed to make assumptions about the key order. I thought I did not ... but Perl 5.17.6 showed me that I was wrong !

About two weeks ago I started receiving report about test failures which were totally incomprehensible to me. Since I work on Windows, I had no bleadperl environment, so it was hard to guess what was wrong just from the test reports. Andreas König kindly opened a ticket in which he spotted that the problem was related to a recent change in bleadperl : now Perl not only makes no guarantee about the key order, it even guarantees that the key order will be different through several runs of the same program!

This convinced me of investing some time to get a bleadperl environment on my Windows machine : VMware player + a virtual Unbutu + perlbrew did the job perfectly. Now I could start working on the bug.

The point were I was making an implicit assumption was a bit nasty, so I thought it was worth writing this post to share it : the code in SQL::Abstract::More more or less went like this :

  my $ops   = join "|", map quotemeta, keys %hash;
  my $regex = qr/^($ops)?($rest_of_regex)/;

See the problem ? The regex starts with an alternation derived from the hash keys. At first glance one would think that the order of members in the alternation is not important ... except when one member is a prefix of the other, because the first member wins. For example, matching "absurd" against qr/^(a|ab|abc)?(.*)/ is not the same as qr/^(abc|ab|a)?(.*)/ : in one case $1 will contain 'a', in the other case it will contain 'ab'.

To fix the problem, the code above was rewritten to put the longest keys first, and everything is fine again.

6 comments:

  1. Does keys() and values() use the same order? Because I frequently use this rto generate reverse lookup tables:

    my %someLUT = ( foo => 1, bar => 2, baz => 3 );
    my %reverseLUT;
    @reverseLUT{ values %someLUT } = keys %someLUT; # ( 1 => 'foo', 2 => 'bar', 3 => 'baz' )

    Will that still work?

    ReplyDelete
  2. That will work. Order of things returned by keys and values builtins is matching as long as hash is not changed between the calls.

    ReplyDelete
  3. I would not count on that though. It may be easier to build a reverse lookup table like this:

    my %reverseLUT = reverse %someLUT;

    Or more explicitly (for someone):

    my %reverseLUT;
    while (my ($k, $v) = each %someLUT) {
    $reverseLUT{$v} = $k;
    }

    or...

    my %reverseLUT;
    for my $k (keys %someLUT) {
    my $v = $someLUT{$k};
    $reverseLUT{$v} = $k;
    }

    ReplyDelete
    Replies
    1. Nice, thanks! I didn't think of that (reversing the key, value, key, value, ... list). Cheers!

      Delete
    2. Some relevant perldocs for you, Anonymous:

      Keys:

      The keys of a hash are returned in an apparently random order. The actual random order is subject to change in future versions of Perl, but it is guaranteed to be the same order as either the `values` or `each` function produces (given that the hash has not been modified)

      Reverse:

      This operator is also handy for inverting a hash, although there are some caveats. If a value is duplicated in the original hash, only one of those can be represented as a key in the inverted hash. Also, this has to unwind one hash and build a whole new one, which may take some time on a large hash

      Delete
  4. Better than PerlBrew: https://github.com/tokuhirom/plenv

    ReplyDelete