I've restarted watching Alex Stepanov's A9 Lectures, "Efficient Programming with Components", and if there's one thing I find essential in what he says is that, despite all the "programming recipes" we find and use, it's important that we think. I've recently been reminded of this need to think, concerning optimization.
We all have our recipes. As far as optimization is concerned, my recipe is:
- Go for readability first. Optimize as needed.
- "Using C++ == you should extract every single ounce of performance available, because otherwise you'd be more productive in any other programming language" is not a dogma. And, as far as productivity goes, it's not even necessarily true.
A simple scenario
Suppose we have a list of
char[]
and we want to find duplicates on said list. In its simplest form, testing two arrays for equality means traversing both, comparing each element until we reach the end of either array, or find a difference.
We can consider several additions to this algorithm, in order to optimise it.
Test for self
We could start by testing for self-comparison. Stepanov has a point when he says this is not an optimization, because most of the time we will not be comparing an element (in our case, a
char[]
) to itself. This brings us to one of those cases where we need to think. Maybe not on equality, since that is trivial; but let's consider assignment.
When defining assignment, should we test for self-assignment? All over the web, the recipes sound a very loud "YES". And yet, most of the time, that test will evaluate to false. The question we should ask ourselves is - if I perform self-assignment, what will happen?
Will I release/duplicate resources? Will I invalidate a class's invariants? Then, self-assignment must be prevented, and the test is required. Not because of optimization, but because a particular class requires it. If not, will it be terribly expensive? In this case, it may actually be an optimization. Will we be self-assigning some floats? How expensive is that? OTOH, how expensive is the test? Is it even worth thinking about, or should we just follow the recipe and put it in there by default?
As with everything concerning optimization, there's no clear-cut answer. Even though I'm inclined to agree with Stepanov on this, I'd probably include the test for self-assignment by default, and consider removing it if, after profiling my code, I could make the case that removing it would benefit performance.
Test for size
Back to our quest for duplicates on a list, we could test for different sizes. Suppose we store each of our elements in a simple structure like this:
struct SimpleArray
{
size_t size;
char* arr;
};
Testing for size means not having to traverse the array, so that's a good thing. Not for the traversing itself, that would be trivial, unless our arrays were longer than the processor's cache lines, but because we'd avoid going to memory again to read
*arr
(arr
itself should be on the processor's cache). That is a good thing.
Except when it isn't.
What if you know your data set is composed of arrays that have fixed length? Again, you may measure and conclude the impact of this test is negligible. Or it may mean the difference between respecting a performance requirement or not. In this particular case, if I had to optimise, I'd skip the structure and work directly on the array, because the size would be irrelevant.
A simple scenario - what we don't know...
We need to think, in order to make decisions. And for that, we need data.
- What is our process doing? If we're invoking services across a network, I doubt either a test for self-comparison or a size test on fixed size arrays will have much of an impact.
- What is the actual weight of these trivial operations? Under our expected workload? Under our worst-case scenario? Under 4x our worst-case scenario? When another process on our machine suddenly goes haywire and gobbles up resources like crazy?
- Is our process critical? Does it have strict performance requirements, with associated penalties?
The big picture
We sometimes have to deal with performance issues, where all the data we have to perform an analysis is "This damn thing is too slow". Based on this treasure trove of data, we have to consider: Our application; the application servers; the web servers; the database servers; the middleware servers; the servers hosting the services we consume (usually, via the middleware servers); and all the myriad of network equipment allowing all these nodes to communicate.
And I don't need more than two hands to count the cases where performance-related issues were caused by application code. And, most of those cases were caused by poor design, rather than by poor implementation - such as an app we received where an operation that failed was placed back into the queue for retrying; but a) instead of applying a delay before retrying, it was placed in the queue for immediate reprocessing; and b) it was placed back into the queue even if the error was not recoverable (e.g., a telephone number with 5 digits). As you might imagine, much fun (and quasi-DOS attacks) ensued.
So, let's get back to our title...
What is performance?
When we think of performance, we envision CPU cycles and cache hits/misses; or that we're processing X lines per second from this file, and we need to process 4X; or sending data via the network, and can we really afford to use SOAP, or should we choose a light-weight protocol (or even roll our own).
Suppose we have Platform A, which invokes a service on Platform B, via a Middleware Server. The service on PlatB sends back something which we then use to call another service on PlatB, which gives us the actual result.
The straightforward implementation looks like this:
PlatA (invoke) -> MWare (invoke) -> PlatB -> (return) -> MWare (return) -> Plat A
PlatA (invoke) -> MWare (invoke) -> PlatB -> (return) -> MWare (return) -> Plat A
PlatA (invoke) -> MWare (invoke) -> PlatB -> (return) -> MWare (return) -> Plat A
On invocation #1, PlatA gets the something required to get the result, and on invocation #2 gets the actual result.
Now, looking at this, we have the obvious optimization:
PlatA (invoke) -> MWare (invoke) -> PlatB -> (return) -> MWare
MWare (invoke) -> PlatB -> (return) -> MWare (return) -> Plat A
MWare (invoke) -> PlatB -> (return) -> MWare (return) -> Plat A
In fact, not only have we just saved a few seconds from the whole process, but we also abstracted PlatA from PlatB's two-call design. Brilliant, right?
Well...
Let's suppose we're responsible for PlatA and we receive a complaint from a costumer, saying his data is all wrong and he won't pay his invoice until we sort it out. Our analysis concludes we're getting wrong data from PlatB. We contact their support team, but in order to perform their analysis they require the something they sent us for that particular invocation. So, now we have to contact the MWare support team, ask for that piece of data, and then send it to PlatB's team. And, during all this time, the costumer's waiting; and our invoice is not getting paid.
"Hold on!" you say "That's not related to process performance, that's an operational problem". Yes, it is. And that's irrelevant, unless you're developing in some sort of hermit's vacuum with no contact with the organization for which you develop. Is that the case? I didn't think so.
Yes, the fact that we may have to wait 1-2 hours before we even begin to actually analyse the cause of the costumer's complaint is an operational problem. But that doesn't make it - and its consequences - any less real. And the potential of this operational problem has been fulfilled by our optimization.
Let's look at the gains and losses of this optimization:
- We've cut a few seconds off each invocation. Let's be optimistic and assume few == 10.
- We've added 1-2 hours to the response time, in case of costumer's complaints.
If we look at the "recipe" that says "don't optimize for the uncommon case", everything looks fine. However, different cases have different weights, and if there's one particular case where you want to be as fast as you can is when you've got an unsatisfied customer waiting for an answer (and saying he won't pay you until he gets one).
So much for recipes...
So, all things considered, what is performance? I don't have a clear-cut answer to this question, but I believe it goes beyond "how many requests per second can we handle".