Thursday, August 27, 2009

Emacs/Perl integration

In answer to YAPC::EU::2009 's call for volunteers for the beginner's track, I offered a presentation on Perl development under Emacs. Part of the motivation for doing that talk was to force myself to have a look at various Emacs extensions (so far I'd been using just the standard cperl-mode coming with Emacs, which is already very rich in features). So here are the conclusions.

[Disclaimer: this was a quick, informal survey; maybe I misunderstood some points, or maybe some of the difficulties encountered were bound to my specific environment (Emacs on Win32); so I hope I don't do any injustice to these modules]
  • Emacs::EPL looks like it was a very nice integration work. For example it could redirect things like STDIN/STDOUT or %ENV variables for Perl scripts running under Emacs, so that user interaction would go through Emacs mechanisms (e.g. the minibuffer). It also allowed people to write Emacs extensions in Perl instead of e-lisp ! This sounds really interesting, but unfortunately it hasn't been updated since 2001 and I couldn't get it to work. I really wish that somebody would resurrect that module.
  • Devel::PerlySense is an ambitious project with lots of features; it runs a Perl process in the background to run/debug code, perform static analysis, etc. However it was so CPU-intensive that I found it hardly usable for daily work.
  • Sepia is an active project (latest release July 09), also running an inferior Perl process, also quite rich in features (it goes as far as analyzing Perl opcodes to deduce information about packages/methods/etc.). Here I had another problem : it would only recognize core modules and functions. I didn't find how to make it understand that there was also some code under perl/site/lib !
  • Emacs::PDE installed smoothly and provides a number of nice features, like perldoc/perltidy/perlcritic integration, tree views of sources/documentation, interactive perl shell, etc.

So in the coming weeks I plan to learn more about Emacs::PDE and maybe add it to my collection of tools for daily work.

Sunday, August 23, 2009

Be fair to ActiveState

Recently I heard several people emitting strong comments against ActiveState,
the company who first published a Perl distribution for Windows. The tone of those comments was about the same as people crusading against Microsoft.

I feel this is not fair.

True enough, Windows users now have an alternative solution with Strawberry Perl, which is more similar to Perl environments on Unix boxes. Strawberry Perl is progressing very well, which is rejoicing, and is likely to have a positive impact on the number of Perl installations in Windows world. Personally I haven't switched to Strawberry yet, but it's likely to happen in a near future, especially becausee I'm interested in the combination with Perl::Dist. Together these look like it will be possible to package a customized distribution for our organization, that would install everything we need in a single click ... but I still need a little bit of time to experiment with all that.

However, this is not a reason for despising ActivePerl, which continues to work well, and for quite a number of years has already been able to collaborate with the mingw gcc compiler for installing CPAN modules with C extensions. Personally I owe quite a lot to ActivePerl, which
almost 10 years ago provided me with an initial Perl environment to get some work done on our win32 workstations; without that, our projects at Geneva's law courts would probably never have started.

ActivePerl also did quite a lot to integrate Perl with various Windows components (PerlScript, OLE, etc.). This provides a very interesting environment for Win32 automation and has been tremendously useful for some migration tasks in our organization. Unfortunately, this usage of Perl is not widespread enough; it seems that only few people realized the power of Perl for Win32 administration, probably because many Windows sysadmins are so used to point-and-click that they just don't have the culture of automating common repetitive tasks.

Finally, ActivePerl's presentation of documentation in HTML with tree navigation was a real blessing for getting acquainted with the vast body of Perl and CPAN literature. I liked it so much that it became a major source of inspiration for developing the Pod::POM::Web HTTP documentation server.

As a whole, the Perl community greatly benefited from ActiveState's work, so it seems to me that people disregarding that company are just caught in vrittis (patterns of mind) that let them disregard any company whatsoever.

Thanks, ActiveState, and all the best for your future business plans.

Friday, August 14, 2009

Object-oriented accessors considered (sometimes) harmful

One of the big principles of object-oriented (OO) programming is encapsulation, which says that data inside an object should be hidden from the outside world, and all accesses to the data should go through methods. That's where OO accessor/mutator methods come from.

Encapsulation brings a number of advantages, like being able to change the internal representation of an object, validating any attempt to change state, blurring the difference between stored values and computed values, avoiding using object storage for action at distance, etc. All of this has been discussed for many years in numerous books and papers, and is wonderfully implemented in Moose (see its Attributes manpage).

However, encapsulation also brings a number of disadvantages, especially in Perl which, contrary to Java or Eiffel, is a multi-paradigm language. Below I'm discussing those disadvantages, not with the intention of throwing away encapsulation, but rather to issue a warning : don't blindly adopt encapsulation everywhere; think and weight where it is a gain and where it is a hindrance.

Good Perl idioms are no longer available


The canonical example in OO literature is the "point object" (even in the Moose synopsis). Now, in contrary to objects like a device driver or database handle, that would have lots of internal data to hide, a point object contains nothing but public information (its x and y coordinates). If we have direct access to the stored values, we can use a number of convenient Perl idioms, while doing the same using getter/setter methods is much more cumbersome. Consider the following examples:

# string interpolation
print "point is at coordinates $point->{x} / $point->{y}\n";

# symmetry transform
($point->{x}, $point->{y}) = ($point->{y}, $point->{x});

# zoom
$point->{$_} *= $zoom_factor for qw/x y/;

# temporary push aside
{ local $point->{x} += $far_away;
do_something_without_that_point_in_the_way();
} # point automatically comes back to its previous location

# nested datastructures and possibly auto-vivification
push @{$point->{styles}}, qw/square big/;
$point->{menu}{options}{color} = 'green';


Translating such idioms to getter/setter methods is left as an exercise to the reader...

No obvious distinction between "setter" methods and other methods

When I read some unknown code and find something like
  $obj->{foo} = $x;
$obj->bar($y);
I immediately know that the state of the object changes at line 1, while I have no clue at line 2 : I must go to the doc of method bar (or even worse, at the source code), to know if this is printing something out, modifying a database record, or just setting a bar attribute in memory inside object $obj.

Hard to debug

Sometimes the best way to find a bug is to go step by step in the debugger and examine which values sit in memory. However, if we use chained accessors to traverse a datastructure, the number of steps to go through becomes a nightmare. For example consider the following hypothetical line in a Catalyst controller :
  if ($c->request->body->length < $self->min_body) {
If I come across that line while debugging, I know that there are chances that the bug might sit in the min_body method, so I want to step into that method; but before getting there I'll have to step through all accessor methods for request, body and length, which is totally uninteresting for my debugging purpose.

Non-scalar attributes must either copy or reinvent an API

If an object contains an arrayref attribute, directly accessible and publicly documented as such, then every client can happily push, shift, reverse, grep, etc. into the arrayref using common Perl syntax.

If on the contrary, that arrayref is considered private, and clients must use getter/setter methods to access it, then clients have no choice but copying the entire array back and forth between the remote object and the local handling code. Needless to say, this has an impact on performance. In order to avoid it, the object could also implement additional methods to insert an item into the array, remove the last, etc. ... but this is reinventing a list API, and it will never be as rich and flexible as the collection of builtin Perl constructs for dealing with lists!

The same reasoning goes of course for hashref attributes.

No generic data traversal modules

Data traversal modules like Data::Visitor, or import/export modules like JSON or YAML, cannot take any decisions when they come to an opaque object that one is not supposed to peek into. This means that generic traversal tools are just inapplicable, or that every object has to implement some support for traversal (like a visit_value method); but writing such methods for every new class is kind of tedious.

It may be the case that meta-information as managed by Moose could perhaps help in traversal algorithms (ask the metaclass about which methods link to "subobjects" of the current one), but I'm not aware if somebody has already worked on that.

Methods are slower than direct access to attributes

No much to expand here: obviously method calls have a cost, especially in Perl which is more dynamic than other OO languages, so the compiler has less information to possibly optimize the method calls. So when traversing deep datastructures through nested accessor calls, there is a definite impact on performances.

Conclusion

In Perl, fully encapsulated objects are sometimes the best solution, sometimes not; weight these considerations before taking strong design decisions.

An interesting design is the one of DBI : objects (handles) are totally encapsulated, yet they exploit the power of tie to expose their attributes through a conventional hashref API, instead of OO getter and setter methods. This is a very clever compromise.

As far as I am concerned, I purposedly designed DBIx::DataModel to fully exploit the dual nature of Perl objects, having both the OO API for executing methods, and the hashref API for accessing column values. I wouldn't necessarily do that everywhere, but for row objects in a database, which are very open in nature, this just seemed an appropriate solution.

Thursday, August 13, 2009

extending DBD::SQLite functionalities

About one year ago I started my first project involving XS programming : this was because I needed specific collation sequences in SQLite databases. The basic SQLite implementation in C had support for user-defined collations, but at that time the Perl API in DBD::SQLite did not give access to it ... so I had to write the glue myself.

Learning XS was kind of fun : the C structures, the dozens of macros, the stack manipulation functions and the "mortalization" concept, all of this is pretty well documented, so once you have a real project to get motivated, you can enter that field without too much trouble.

Once done, I sent the patch to the DBD::Sqlite author, and got no answer, which was less fun. Fortunately; Audrey Tang incorporated the patch in her DBD::SQLite::Amalgamation version of the driver.

Nowadays the situation is very different, because Adam Kennedy has taken over maintenance of DBD::SQLite, and there are both an open source repository and a mailing list, and a reactive Kenichi Ishigaki who cares about release management, so it's much more pleasant to participate.

What I did recently was to study the SQLite API and add support for all other native functions that so far were not exposed to the Perl DBD::Sqlite driver --- not that I really need that at the moment, but just for completeness's sake. Such functions are, for example, hooks to be run whenever a commit, rollback or update occurs; this means for example that if your transaction does some stuff outside of the database -- like creating files or directories --, you can program a rollback hook that will 'undo' these changes in case of failure.

Another interesting functionality is another hook called set_authorizer, that can decide whether a given SQL statement is authorized or not on that database handle. This could be used for example to specify that only administrators are allowed to update or insert data, while any user can read.

If anybody is interested, that stuff is already available on CPAN in the current developer release of DBD::SQLite ... enjoy!

Wednesday, August 12, 2009

YAPC::EU::09 paper on managing Geneva's law courts

This year's YAPC::EU::2009 conference had printed proceedings.

For this occasion I wrote a paper about my talk Managing Geneva's law courts, from Cobol to Perl. An html copy of the paper is now online. The slides are available at slideshare.

Tuesday, August 11, 2009

The Perl community is exceptional

This year I went to the YAPC::EU::2009 conference in Lisbon. I missed the one from last year, so being a little less accustomed to those gatherings, I recovered the same feeling that I got at
my first YAPC conference in Birmingham in 2006 : this community is like nothing else I've seen before.

I had been to quite a lot of computer science conferences in my previous life, where the climate was very different. In the academic world, people mostly think about the papers: if your paper is accepted, you go to the conference to add one more line on your publication list; if it is not accepted, maybe you go nevertheless to see what are the ideas in fashion, and to do a bit of networking that may help you getting accepted on the following year, or build an international circle that will give more credibility to your next research proposal. In that world, it's nice to see the community again, but meanwhile there is always a feeling of competition, because the person sitting next to you might well be that anonymous expert that turned down your last paper submission. Question sessions may be quite harsh, too, because people in the audience sometimes are more interested in showing off their own knowledge, rather than genuinely trying to clarify what the speaker was talking about.

YAPC conferences are very different. People come because they love it, and some of them even come on their own time and money. They love the fun, of course, the special dresses, the jokes, the culture; but they also love to join for shaping new ideas, new tools, new constructs that will be helpful to the whole community. They don't come for marking their territory; they come for passion, and they know that by working together, everybody can help Perl to improve and expand.

I think I've never heard any aggressive question at a YAPC: rather, the audience sometimes helpfully points to some resources that the speaker might have overlooked.

All of this makes YAPC very much enjoyable, so I'll definitely try to attend next year in Pisa.

Saturday, August 8, 2009

Hello, world

OK, time to go blogging.

For a couple of months, it had been one of these things I knew I wanted to do sooner or later ... but preferred to leave until later. Somehow there was always something more urgent, a bug to fix, a release to prepare, a doc to write, an analysis to rediscuss, a new musical piece to rehearse, so I never got into really doing it.

Matt Trout already tickled me a couple of weeks ago in his commentary about DBIx::DataModel, but his vehement exhortations at YAPC::EU::2009 finally convinced me that there is no excuse to delay it any further : I have to join that Ironman competition, a grand idea for getting people to write and propagate thoughts and facts about Perl development.

Perl blogging had been around for a while, in particular in use.perl journals, but the Ironman competion really gave a new thrust to it; and the information that circulates through that new means is really valuable, because people share thoughts, considerations, design issues as soon as they emerge, even partially shaped, and that's good enough to get readers to also start thinking about those issues. Blogging is less intrusive than mailing lists: you don't throw something out saying "hey, folks, this is for you", you rather leave some dangling thought somewhere, and whoever comes across it can grab it if interested; so even if your idea is vague or controversial, it doesn't matter.

Once the decision to blog was taken, I had to decide about the how and where. I already had a L, no very much used so far, but good enough. Many people complain about the use.perl site, and true enough, it has a number of inconveniences, and didn't evolve since it was launched a couple of years ago; but it has the merit to exist and still is a valuable resource for the Perl community.
However, I felt that blogging in use.perl would somehow miss the point of Ironman, which is to get the world to hear about perl outside of the Perl echo chamber.

The other solution would be to go like real geeks, who have their own internet domains, running software they have chosen or written themselves, caring about configuration and maintenance issues ... however I haven't found the time to do it for the last 6 months. So finally there I am, hosted by blogger.com, and I know that everything written here will definitely be indexed, so the goal of propagating Perl culture should be attained...

Hello, world, please come back and stay in tune !