Saturday 20 July 2013

Measuring execution time - A tale of two systems

As I mentioned previously, I've been following Alexander Stepanov's A9 lectures.

After solving my previous problems, I was ready to run some tests. Before testing my version of minmax_element(), I've decided to test both the lecture's version and std's version.

The test shows the number of comparisons and time elapsed between two different strategies:
  • Using a simple approach, function minmax_element_simple(), shown below.
  • Using minmax_element().

template <typename I, typename Compare> // I is ForwardIterator
inline
std::pair<I, I> minmax_element_simple(I first, I last, Compare comp)
{
    return std::make_pair(course::min_element(first, last, comp), 
        course::max_element(first, last, comp));
}


Let's just say the results were not exactly as expected. minmax_element() performed 25% less comparisons. However, using uint64_t, it always took longer; how much longer depended on compiler (I've tested in GCC 4.7 and VC 11). Sometimes it took 35%-45% longer.

Unable to figure out what exactly was going on, I then sent Alexander Stepanov an e-mail. He was kind enough to answer, and told me it had to do with the cost of comparisons. As the cost of comparisons rises, the implementation that performs more comparisons will be penalized.

And that reminded me I had just the right class to run some tests, in my SSH code. Namely, this:

struct SSHSessionKey
{
    explicit SSHSessionKey(std::string const& pHost = "NO_HOST", 
        std::string const& pUser = "NO_USER"
        std::string const& pPort = "22")  
    : host(pHost), user(pUser), port(pPort) {}
 

#ifndef _MSC_VER
    ~SSHSessionKey() = default;
    SSHSessionKey(SSHSessionKey&&) = default;
    SSHSessionKey& operator=(SSHSessionKey&&) = default;
    SSHSessionKey(SSHSessionKey const&) = default;
    SSHSessionKey& operator=(SSHSessionKey const&) = default;
#endif
 
    bool operator<(SSHSessionKey const& other) const;
 

    std::string host;
    std::string user;
    std::string port;
};
 

// THE COMPARISON IS PERFORMED BY ATTRIBUTING A WEIGHT TO EACH FIELD: 
// HOST > PORT > USER.
inline
bool SSHSessionKey::operator<(SSHSessionKey const& other) const
{
    return (this->host < other.host) || (this->port < other.port) ||  
        (this->user < other.user);
}


Definitely, a higher cost than comparing two numbers.

So, after changing the code (and setting a lower limit, because 128 * 1024 * 1024 proved to be a bit too much for my system), I was able to verify that, indeed, for non-trivial cases, minmax_element() performs better. In this particular case, the average improvement was around 33%.

As Alexander Stepanov wrote in his reply, "Timing is tricky". On his system, e.g., he got different results from mine, even with uint64_t. Which drove home something I've been considering for some time.

You should always have a non-production system that's identical to your production system. And whenever you need to measure the efficiency of your software, that's where you should do it. Compile, deploy, and run it just as if it's the real thing. Yes, I know, instrumental code may have an impact, may introduce differences. Nothing's perfect. But while we have to live with the imperfections we can't remove, I see no reason why we should put up with those we can remove.

The way compilers/OSs/processors impact - and rearrange - our code's execution, we have to pay attention to the system where we measure performance; small differences between systems may translate into huge surprises when we finally run our code in production.

No comments:

Post a Comment