Sunday 6 October 2013

g++ optimization mysteries

It's been a long time. Work has gone into overdrive, courtesy of a huge scope project. And not only has my spare time taken a hit, but, more importantly, there have been too many days when I get home and all I can do is pick up a book, or pick up my guitar, or watch a nice TV show with my wife (we're quite partial to Jamie Oliver shows on 24 Kitchen).

Still, I haven't abandoned C++, quite the opposite. I've been working on a few utilities to process files, and I've been experimenting with different options, and measuring each option's performance.

The last one I've tried left me with an interesting - and, so far, unsolved - puzzle.

We have a file where each line contains a field in a form; a whole form can be eight or nine lines (depending on whether all the form fields were filled). I've created a program that reads this file and outputs each form's fields on one line; something like this:

while (getline(...))
{
    // TOKENIZE THE LINE
    // GET THE FIELD'S DESCRIPTION AND VALUE
    // USE THE FIELD'S DESCRIPTION AS A MAP INDEX AND 
    //     STORE THE VALUE IN THE MAP
    // IF WE'VE READ ALL THE FIELDS, PRINT THE FORM
}


"Printing the form" means outputting the map to cout, which I did like this on my first version:

void OutputForm(CobForm& form)
{
    cout << form["dist"] << ";" << form["conc"] << ";" << 
        form["tlm"] << ";" << form["freg"] << ";" << 
        form["status"] << ";" << form["ID"] << ";" << 
        form["dt"] << ";" << form["result"] << ";" << 
        form["tipo"] << endl;
}


Actually, my first version had a CobForm const&, which the compiler prompty rejected, reminding me - probably in a slightly amused tone - that map's operator[] isn't const.

So, I ran this against the 3.8M lines input file, and it took +/- 1:32 on my PC; I ran it 10 times, and the variation was minimal, less than 1 second.

Then, I've decided to change it to pass-by-value:

void OutputForm(CobForm form)
{
    cout << form["dist"] << ";" << form["conc"] << ";" << 
        form["tlm"] << ";" << form["freg"] << ";" << 
        form["status"] << ";" << form["ID"] << ";" << 
        form["dt"] << ";" << form["result"] << ";" << 
        form["tipo"] << endl;
}

Another 10 runs, and no noticeable difference, +/- 1:33, again with variation below 1 second between runs.

At first, I thought maybe this function was inlined. However, nm disagreed:


00401b70 T OutputForm(std::map<std::string, std::string, 
    std::less<std::string>,
    std::allocator<std::pair<std::string const, 
    std::string> > >)


Yes, I was building in release mode. I added this to my Qt Creator project file, to keep the linker from removing the symbols:

QMAKE_LFLAGS_RELEASE -= -Wl,-s
 

Next, I've decided to look at the generated assembly. I'm far from an expert, but I thought "No harm in taking a look, right?"

First, pass by reference. This is what the compiler generated:


119             OutputForm(form);
0x40588e  <+0x19cd>         movl   $0x8,-0x1e8(%ebp)
0x405898  <+0x19d7>         lea    -0x150(%ebp),%ecx
0x40589e  <+0x19dd>         mov    %ecx,(%esp)
0x4058a1  <+0x19e0>         call   0x401b70  
                                <OutputForm(std::map<std::string, 
                                std::string, std::less<std::string>,
                                std::allocator<std::pair<
                                std::string const, std::string> > >&)>


With the following addresses:


form: 0x28fcf8
ebp:  0x28fe48
esp:  0x28fc10


So, we start by moving 8 into 0x28fc60 (-0x1e8(%ebp)), although I couldn't figure out why. Then, we move form's address into ecx, move that into the address pointed by esp (setting up the call argument), and we call OutputForm(). In OutputForm(), I was able to verify that form's address was 0x28fcf8.

Next, I did the same thing for the pass by value version. Which got me this:


119             OutputForm(form);
0x40593d  <+0x1a5c>         movl   $0x9,-0x218(%ebp)
0x405947  <+0x1a66>         lea    -0x150(%ebp),%ecx
0x40594d  <+0x1a6c>         mov    %ecx,(%esp)
0x405950  <+0x1a6f>         call   0x401b70 
                                <OutputForm(std::map<std::string, 
                                std::string, std::less<std::string>,
                                std::allocator<std::pair<
                                std::string const, std::string> > >)>

form: 0x28fcc8
ebp:  0x28fe48
esp:  0x28fbe0



Now, first thing I noticed is - no copy ctor. We're passing by value, but there's no copy ctor. Just to make sure, I made a debug build, and there it was, in all its splendor:


0x402262  <+0x00f6>         call   0x40a3e8 <std::map<std::string, 
                                std::string, std::less<std::string>,
                                std::allocator<std::pair< 
                                std::string const, std::string> > >::
                                map(std::map<std::string, std::string, 
                                std::less<std::string>,
                                std::allocator<std::pair< 
                                std::string const, std::string> > > const&)>


Then, I thought - maybe the compiler's decided it can perform a "sort-of-pass-by-ref" optimization. However, form's address was 0x28fcc8, and the address set up as an argument was -0x150(%ebp), i.e., 0x28fcf8. And, as soon as I stepped into OutputForm(), I was able to confirm that form's address in its scope was, indeed, 0x28fcf8.

Another thought occured me - perhaps it's some sort of alias. Maybe there's been no string allocation at all. So, I went back to the previous scope, and checked the addresses of form[0]:


form[0].first:  0x783618
form[0].second: 0x78361c


Then, I went into OutputForm() again, and the addresses were different:


form[0].first:  0x783c18
form[0].second: 0x783c1c

So, even though I can find no trace of a copy ctor, the fact is the addresses are different at each scope. And, for now, I have no idea what exactly is going on here.

However, this just shows, once again, the importance of measuring before "optimizing".