Tuesday, 10 June 2014

Processing command line options in Ruby

The story so far

I've been setting up a scripting environment at work, as I've mentioned previously. After considering Perl, Python, and Ruby, we chose the latter.
Ruby itself was installed from RPMs by the sysadmins. Then, I've installed Ruby Gems locally, and all the gems are also locally installed. This minimizes our dependency on the sysadmins, i.e., we don't have to open a request every time we need a new gem installed. The server has no internet connection, so we download the gems and run a local install, which means sorting out dependencies manually. So far, it has been going smoothly, and I'm quite impressed by the way gem management is handled.
One of our top goals is working with Informix DBs, and I'm happy to report that the Ruby Informix gem installed without a single hiccup (did I already mention I'm impressed?), and worked right out of the box (because I'd like to leave that quite clear - I'm impressed).
This move to Ruby was caused, in part, by a server migration, where we decided to take the opportunity to simplify our batch environment. We currently have a mix of Java and shell scripts, with a liberal sprinkling of awk and some Perl. We'll keep the Java processes as-is, but we're planning on moving the scripts to Ruby.
The first goal is migrating everything to the new servers, making only the necessary changes in order to keep things working with a new set of system requirements (e.g., replacing FTP with SFTP). With time, we'll be converting all the scripting logic to Ruby.
So, I've been diving into Ruby, in a mix of planning and prototyping, trying to get a grasp of the language's basics, and creating some building blocks for future scripts.

Ruby & Command line options - GetoptLong

I'm currently working on a component to process command line arguments, similar (in goal, if not in design) to what I've done here.
As I looked at how this is done in Ruby, the first option I found was GetoptLong. I set up the options and began writing the code to do something with it. And I immediately thought "I need something else". Why?
Well, when you have a class that lacks a method to get an option by name, forcing you to loop through all the options with a switch/case that says "If the current option is this, then do that", I have to wonder - does anyone actually think this is elegant?
Yes, I know. I could build such a method myself. After all, Ruby allows me to add methods to existing classes, so I could add a getter that took a name as argument, and returned the value of the option with that name or nil/exception if it didn't exist. But I couldn't believe there was no alternative. I couldn't be the only one thinking "This is sub-par design", there had to be a better way.

Ruby & Command line options - OptionParser

There is. It's called OptionParser. It's definitely better, and it seems quite powerful and complete. However, I've yet to find an example/tutorial that shows me how to do a couple of things that are dead simple in, e.g., Boost.Program_options:
The first is setting up a required option, making the option processing throw an exception if it's not present. Notice how easy it is with Boost:
("fich,f", po::value<string>(&fileName)->required(), "Ficheiro a processar")
OptionParser has a concept called "mandatory", but it means "If you include this option in the command line, you must specify its value". However, if you don't include it, no exception is thrown.
The second is setting up a default value directly in the options definition, instead of having to do it in some other code, even if it's local to the options definition. Once again, courtesy of Boost:
("validnc,c", po::bool_switch(&validateFieldNr)->default_value(false),
    "Valida nr. de campos por linha e termina")

And I already have a pet peeve with OptionParser. Suppose you have this:

options[:delimiter] = ';'
opts.on('-d', '--delimiter', 'Delimitador utilizado no ficheiro de dados') do |delim|
  options[:delimiter] = delim

Looking at this, you'd think "I have a default delimiter, ';', and I have an option to specify a different delimiter on the command line". Seems logical, right? Well, not quite. If you specify this option on the command line, options[:delimiter] will be set to true, and your delimiter will be left in ARGV, i.e., it won't be consumed by OptionParser.
Here's what you need, instead:
options[:delimiter] = ';'
opts.on('-d', '--delimiter delim', 'Delimitador utilizado no ficheiro de dados') do |delim|
  options[:delimiter] = delim
By the way, you don't need to use "delim" in '--delimiter delim', you can use anything, as long as it's there. Fortunately, I wasn't the first one to get bitten by this little detail, and I quickly found someone else wondering why were all his values true or false.
While I can't help but wonder at the reasoning that led to "This looks like a good idea", my ignorance on the language/class means I'll just live with it, since I don't feel competent to do better.

Pet Peeve - Documentation

Yes, this is my pet peeve. Docs. OptionsParser's minimal example is too minimal to be useful, and the complete example is so crowded, that details like this get lost in the noise; not that there's anything wrong with that, that's what a complete example is for, it's the minimal example that shouldn't be quite so minimal.
I believe a rule like "If you want your options to have a value, you have to put something - anything - following it on the long form string" should be made more visible in the documentation. Actually, most of the Ruby documentation I've found so far is quite minimalistic/optimistic. Tutorials like Boost's (yes, I'm using Boost as an example of good documentation; no, I never thought that day would come), or troubleshooting sections are absent from most of the docs I've found.
E.g., take a look at make_switch and see what you can make of it. Then, see the  complete example. Finally, look at this tutorial. It's only after I read this tutorial that I understood the semantics of "mandatory" and "optional" in this context, and gained a better understanding of how OptionParser works.
And going back to the "optimistic" bit I mentioned above, here's what I mean - sentences like these (which are from the tutorial) should be in the class's docs:
  • Switches that take a parameter only need to state the parameter name in the long form of the switch.
  • While your option string can define the parameter to be called "a,b,c", OptionParser will blindly allow any number of elements in the list. So, if you need a specific number of elements, be sure to check the array length yourself.

Another example - if you go to rubygems.org, you have the installation instructions, which end with this:
For more details and other options, see:  
ruby setup.rb --help
And, after a few hours of dealing with issues on Ruby Gems setup, I had to ask - why doesn't it end with this:
For more details and other options, see <this link for online docs, where we explain said details and options with more depth than we ever will in a help message>

Then, we have the guides. Filled with useful info, no denying that. However, when I needed to learn about things like GEM_HOME and RUBYLIB, I had to go elsewhere. While we could argue the latter is not actually Gem specific, I still believe it should be there, close to the sections regarding Ruby Gems installation and setup. Not all of us can just go gem update or ruby setup.rb with "you may need admin/root". Some of us don't even have an internet connection to begin with.


As I dive into Ruby, I like most of what I see, especially after having learnt to appreciate duck typing, available with C++ templates, and the total decoupling it provides.
So far, I've met a few design choices that baffle me, and the docs are definitely a weak point. Fortunately, there's this internet thingie where people are more than willing to share their knowledge.
As for processing command line arguments, I've settled for this "pattern":
options = {}
options[:delimiter] = ';'
opts.on('-d', '--delimiter delim', 
'Delimitador utilizado no ficheiro de dados') do |delim|
  options[:delimiter] = delim
options[:title] = false
opts.on('-t', '--title', 
'Indica se o ficheiro de dados tem header de título') do
  options[:title] = true
options[:settings] = nil
opts.on('-s', '--settings sf', 'Ficheiro de configurações') do |sf|
  options[:settings] = sf
if options[:settings] == nil
  puts('Tem que indicar o ficheiro a processar')


  1. Hi Paulo!

    Great to see you moving from pre-historic languages to more modern ones (please don't take that to serious :D)
    Your post is well written and is aware of the common problems when doing that kind of switch, different concepts, lack of official documentation and so on, but as you noted, there is the internet around with people sharing knowledge in a multitude of ways (blog posts, github repos, etc..) and that I believe is the strenght of ruby, a great community. I suggest you to check the puppet project since it is related to it automation and based on ruby, maybe you found there some ideas and code implementations that you can reuse.

  2. Eh eh eh. Don't worry, I won't :) Actually, that's the great thing about how C++ has evolved - it can be as pre-historic or as cutting-edge as you wish it to be.

    I've checked on puppet, but everything I found seems aimed at a lower level than the one we're working on now. I believe it could be useful for the sysadmins who manage the machines, but I found nothing aimed at our team's particular needs, which are more on the app side. E.g., right now, we're working on building blocks for working with Informix DBs; since Ruby Informix doesn't support the "load" statement, we're creating our own component to do it.

    Oh, and I have a second pet peeve - return should never be optional :)

    Thanks for taking the time to comment.