Saturday, 27 December 2014

'Tis the Season...

... to be jolly, and grateful, and hopeful.
 

On the personal front...

...this year has been very positive, the only clouds on the horizon being some health issues. If I had to choose the most important points of 2014, I'd go with these:
 
  • The level of complicity and understanding between me and my wife has increased tremendously. I believe we achieved this by undergoing a similar change, which takes us right to the next point...
  • We both began going at life with a more relaxed attitude. It's a daily exercise to keep from slipping, but it's quite worth it. Obviously, it occasionally slips, but not only have we gotten better at identifying these slips, both on ourselves and on each other, we are also quicker to defuse these situations, often by sharing a good laugh at ourselves.
The kids are working their way through college, on what are the first steps of their own journey, their own adventure. It's time they take full control of the pen and start writing their story; that's their part. Our part is hoping everything they've taken in through the years will serve them well in getting their bearings as they set out. More than hopeful, we're confident it will.
 
I'm a somewhat spiritual guy, although I don't often stop to think about it. However, looking at 2014, I feel blessed; I already knew I'd found someone more understanding of my failings than I ever deserved, but this year took it to a new height. If I have so much to write about in the following point is because I was fortunate enough to find someone as understanding as my wife.
 
If you don't care much for mildly technical stuff, you can jump straight to the end of the post.
 

On the professional front...

...this has been a year where I continued a trend that began in mid-2013, namely, generalization.
 
When it all began in 2011, I had picked up C#. Then, because I felt I wasn't learning anything other than the language, I've added C++. Actually, my plan was adding C++, but I ended up switching to C++, and C# was left behind. The learning experience was a lot more intense, not just about the language, but also about the whole system - processor, OS, environment, external libs, debugging, assembly, and a whole lot of etc.

Then, I've accidentally set out on a task for which I found little help on the web. So, for much of what I was doing, I was on my own. While I did manage to get a working result (which is still a top-hitter on google.com and bing.com), I wasn't totally happy with it, and I suspected I'd have been even less happy if my knowledge about what I was doing wasn't so lacking. So, I've stepped back and went back to basics. And I've quickly learned that, indeed, the potential for "improvement" (i.e., correction) in what I'd done was much bigger than I had anticipated.
 
Then, as I began taking on more technical tasks at work, the generalization began - certificates, network, DNS, managing Linux, setting up environments according to specific constraints, managing web servers, managing Weblogic server, picking up new languages (e.g., Ruby), revisiting familiar languages (e.g., Java).
 
At the same time, I've taken the learning experience provided by C++ deeper - assembly debugging, system traces, building gcc from source and installing it without touching the system gcc, doing the same with perl. And I've also began getting my feet wet with Javascript (bootstrap and query.js) and Android development.
 
And before I noticed, I had not only changed my course, but I was quite satisfied with that change. This is where I see myself going, becoming a generalist. I love solving problems, and you need a widespread knowledge to do it; you don't always find the cause of a problem at the same level, and I don't like getting stuck for lack of knowledge. I also don't like getting stuck for lack of system access, but that's the way things are when you work in a large-ish corporation.
 
So, here are my New Years resolutions, a.k.a., goals for 2015:
  • Redesign my libssh2 + asio components, incorporating what I've learned in the meantime. And hoping that two years from now I may look at what I've done, say "What I was thinking??!!", and repeat this goal again, as I keep learning more.
  • Pick up a functional language, probably Haskell or Erlang. It's about time I get my feet wet on functional programming. I love what I've learned about generic programming in C++ (actually, what I'm still learning), but it's time to add something more to the mix.
  • Continue my exposure to Javascript and Android.
  • Deepen my knowledge of systems/network administration.
  • Increase my knowledge of low-level tracing/debugging. It's time to begin some serious experiments with loading up core dumps and getting my bearings, and getting more mileage out of stuff like Windows Performance Analyzer.
Too ambitious, you say? Yes, I know. I won't manage to do all of this? Probably not. Which is a good thing, otherwise 2016 would be a very boring year.
 

At the end of the day...

 
 
...our family wishes you and your loved ones a Merry Christmas and a Happy 2015, filled with love, joy, and peace.
 
 

Sunday, 7 December 2014

A simple class to read files

Yes, I'm olde enough to know "simple" is a misnomer.
 
I have a new addition to my bluesy github repo.
 
Part of what I do at work involves reading through user-un-friendly log files. How user-un-friendly? Just to give a small example, the cases where a "log record" is on a single line are the exception (I wish it was just the Exceptions), and there's usually no easy way to correlate the beginning/end of operations with a single grep. This means I spend a lot of time creating scripts/small programs to process those log files.
 
I'm not an enthusiast of shell scripting. Oh, I can persuade it to do some complex stuff, yes, but it always seems to obey me reluctantly, and I'm left with the nagging feeling that, somehow, it's laughing behind my back. Not to mention that the resulting scripts are very... spaghetti-like. I find it hard to think modularly when writing shell scripts. I had a similar issue with Perl. It's still there with Ruby, but being modular with Ruby is a lot easier for me.
 
However, I don't have this difficulty with C++. Or Java, or C#, or Delphi/Lazarus, for that matter. It must be some sort of mental block, because while I couldn't create a modular design in shell scripting to save my life, that comes as second nature when I'm working in C++.
 
So, when I have to create some kind of custom tool to process these files, most of the time I turn to C++. Thanks to C++1y/Boost, I don't take much longer to write it than I would with any other tool/language, and I definitely appreciate that when I need to change something a few weeks/months later, I can find my bearings much more quickly.
 
And, after creating a few very similar programs, all driven by a file-reading loop, I've figured I had enough use-cases to create...
 

FileLineReader

A class... actually, a class template, that reads lines from a file. Surprising, heh?
 
template <typename LineMatcher = SimpleLineMatcher,
    typename LineCounter = SimpleLineCounter<unsigned long>>
class FileLineReader : private LineMatcher, private LineCounter
 
The concept is simple - read a line, which becomes the current line, and give the caller a way to get it (or copy it). Then, we add some nuggets of convenience:
  • Keep a counter of read lines. Implemented by LineCounter.
  • Supply matching operations, that not only perform matching on the current line, but also allow skipping lines based on matching/non-matching. The matching is implemented by LineMatcher; which is then used by FileLineReader to implement the skipping.

Why inheritance, instead of composition? Because there will be cases where LineMatcher and LineCounter have no state, and a data member is a bit of waste (yes, a tiny little bit of a waste). Can this be abused? Absolutely, but you know - protect against Murphy, not Machavelli.
 

Skipping lines 

The first line skipping functions I introduced were SkipMatchingLines(), which skips lines while there's a match, and SkipLinesUntilMatch(), which skips lines until it finds a match.
 
These functions share a similar trait - their stop condition is met after reading the line that triggers the stopping condition. Suppose we have this file:

[2014-01-01 00:00:00.000] match-1 This is line 0
[2014-01-01 00:00:00.100] match-1 This is line 1
[2014-01-01 00:00:00.200] match-1 This is line 2
[2014-01-01 00:00:00.300] match-2 This is line 3
[2014-01-01 00:00:00.400] match-2 This is line 4
 
Something like this

FileLineReader<> flr{kFileName};
flr.SkipMatchingLines("match-1");
// Process lines
 
can only stop when the line "... line 3" is read, because there's no way we can perform a match against a line that hasn't been read yet. This means that when we reach "// Process lines", the current line will be the first line to process, so we should "process-then-read" (do-while), rather than the more intuitive "read-then-process" (while).

I've entertained the notion of using tellg()/seekg() to rewind the file (IOW, un-read the line), but I didn't even get started, after reading about how it behaves in text mode. So, I'll stick with "process-then-read", for now.

Another common scenario I encounter is skipping a certain number of lines, usually a header. So, I've added this:
   
void SkipNumberLines(unsigned int number_lines);

Because we're skipping lines independently of any matching, it didn't make sense to implement this in LineMatcher; so, I've implemented it straight in FileLineReader. I'm not completely happy with this solution, but I figured it's better than no solution.
 

Skipping dilemmas

I also thought it would make sense to keep semantic coherence between all the line-skipping functions. So, SkipNumberLines() should stop on the first line to process, not on the last line skipped (even though it could, because we're skipping a known number of lines), just like the other functions.
 
No biggie, we just perform an extra read - so, for SkipNumberLines(3), we actually read 4 lines. Lovely, everything's coherent, little ponies are happy, and fairies are spreading pixie dust and sneezes throughout the realm.
 
The thing is... nobody expects the Spanish Inquisition... their chief weapon is surprise and SkipNumberLines(0)... their two chief weapons are surprise, an almost fanatical devotion to the Pope, and SkipNumberLines(0)... OK, among their weaponry we can find elements as diverse as... blah, blah, blah, and SkipNumberLines(0).

Yep, SkipNumberLines(0).

Let's say you have a function that calculates how many lines to skip, based on, e.g., an input argument - it could be the type of file being processed, or the phase of the moon; then, you could use it like this:

flr.SkipNumberLines(
    CalculateNumberLinesToSkip(some_relevant_argument));
 
You're aware CalculateNumberLinesToSkip() could return 0 (you could, say, be processing a file without a header). And I'd venture a guess that you would probably expect that SkipNumberLines(0) would skip, you know, a number of lines kinda equal to... zero. As in, more than -1, but definitely less than 1.

Which left me with two alternatives:
  1. Treat 0 as a special case. Which would force upon the caller the "specialness" of this case.
  2. Indulge in a fanatical devotion to coherence - read one line on SkipNumberLines(0), and consequences be damned.
  3. Forget coherence, and accept that even though they're all line-skipping functions, they can have different semantics/post-conditions.

OK, enough Spanish Inquisition jokes.
 
I've settled for alternative 3. I'm not entirely sure it's the best option, and as I get more use out of this class, I expect to find new scenarios and patterns, which will provide me with more data to revisit this design later. But, for now, it seems to be the safest choice.

And there you have it. Quite modest, yes, but it has been saving me some boilerplate code in these last few weeks.

I don't know yet how it will evolve. I can feel an itch with regards to a text mode rewind, but I'll have to dive into streams and buffers much more deeply than I feel like, at the moment.

And right now I have other challenges awaiting... impatiently, I might add. A crash course in some obscure aspects of Weblogic administration to better diagnose socket problems without resorting to strace.