We are often faced with the practical situation in which we need to return the entry from a certain column which corresponds to the maximum numerical value from another column subject to one or more conditions.
For example, from the table below:
we may wish to return the date (column C) which corresponds to the latest version (column B) for a given order number (column A), where by “latest” we mean “largest numerically”.
(You can download the workbook here.)
Usually we would have little choice but to do this with an array formula**, viz:
Readers should note the necessary double-inclusion of the IF clause here, not just in the construction for generating the lookup_value, but also in that for generating the lookup_array. Many’s the time I’ve witnessed attempted solutions to this scenario which omit this important point, i.e. which instead employ simply:
Whilst this construction could still return the correct value, it is not at all guaranteed to do so. The reason being that there is nothing to say that the maximum value which corresponds to that criterion is not also found within the range B2:B10 but for a different Order Number. And if that case happens to occur in a row which precedes the one we were hoping to return, then MATCH will evidently – and undesirably – return the date corresponding to that value instead.
In fact, here it can easily be verified that the above resolves to:
and if I just insert the actual values from the range B2:B10, this is:
and, since we have taken no measures to first conditionally amend the range in which we are performing this MATCH, and since we are unfortunate both that there is more than one occurrence of our maximum value (i.e. 4) within this range, and that the first occurrence of that value does not in fact correspond to our required order number, we will thus obtain incorrect results, the above resolving to:
i.e.: 21/02/2013, which is clearly not correct (being of course the date corresponding to the latest version for order 12345, and not for order 54321 as was our aim).
If we now take a look at the correct version of this array formula, i.e.:
we see that this time we have:
and this time, having first taken appropriate measures with regards the lookup_array, we can see that there is only one occurrence of our maximum value (4) within this array (even if there had been more than one, we would nevertheless be certain that they all related to the order number in question, and not to a different one). The above is thus:
i.e. 07/10/2014, this time as desired.
Whilst there is nothing whatsoever wrong with this array formula set-up, I happen to feel that the following non-array construction is a touch more elegant (and potentially less resource-heavy), i.e.:
Before I begin a deconstruction of this formula, readers wondering why, if non-CSE is something which is being pursued, we cannot employ an AGGREGATE-version of the above MAX/IF construction, suffice to say that it is, again, precisely this need to repeat our conditional statement within MATCH’s lookup_array which makes such a solution problematic.
To clarify, we need something like:
in which we have successfully replaced our MAX/IF construction in the CSE version with an appropriate one employing AGGREGATE.
However, the above formula still requires CSE in order to give correct results. The reason being that the lookup_array consists of an IF statement which is attempting to return our conditional array (as we saw previously). And of course this IF clause will not, without CSE, resolve to an array of returns.
Indeed, if we wish to use AGGREGATE here (and without CSE), we have to abandon the IF clause and accept a certain degree of convolutedness in order to force a coercion of that second array. For example:
where the MMULT here acts to coerce the necessary array from the product:
which would otherwise fail to resolve as such.
All of which, I hope, will convince readers of the justification for the alternative construction which I would now like to demonstrate, which recall is:
Certainly, and especially at first sight, a rather non-standard-looking expression. Though also, as can be seen, surprisingly concise, and of course non-array.
So let’s have a look at how this construction operates. This part:
will return an array consisting of entries from column B where the entry in the corresponding row of column A is equal to the value in F1, since we have:
Now, here comes the ingenious part. (And here I would like to thank Lori again, whose use of a similar construction was the inspiration behind this solution. In fact, all I have done is to tweak that construction ever-so-slightly so that it can handle error values, and so be employed in such cases as this.)
In order to find the position of the maximum value within this array, rather than use some combination along the lines of MATCH(MAX,…, etc., which would in any case require repetition of the clause which generated the above array, we can proceed as follows:
We first add a small value to each of the values in the above array. This is necessary since we will next be reciprocating the values in this array with unity, and, if we did not take this action, the zeroes in our array would result in #DIV/0! errors, an inconvenience which would complicate matters when we come to passing this array to FREQUENCY.
which is then:
The crux of this set-up is that we now pass this array to FREQUENCY as the bins_array parameter, with zero as the data_array. i.e. we construct:
which here is:
The logical reasoning behind our prior reciprocation now becomes evident, since it means that the largest value in the array prior to reciprocation (which of course is what we are seeking to determine) will necessarily be the smallest value in the new, post-reciprocation array.
And, if we pass an array of values, all between 0 and 1 (which of course must be the case for positive values which have been reciprocated with unity), as the bins_array to FREQUENCY, with 0 as the data_array, then that zero will be assigned to the smallest value from our bins_array; the remaining bins of course will be empty, or zero.
Since the smallest value is here 0.2, and since it occurs in the 7th position within that array, the above will resolve to:
(Note that, even though our range – A2:A10 – consists of only 9 cells, the above array in fact contains 10 elements. Most readers will know that this is due to the property of FREQUENCY of returning an array with one more entry than the bins_array passed to it.)
Having obtained this array, we simply need to now return the entry from column C from the position corresponding to the one occurrence of a non-zero entry (i.e. 1) in this array.
Naturally there are several ways in which we could do this. An INDEX/MATCH combination would be one, though I tend to have a slight preference for using LOOKUP in these situations, so that:
and which is 07/10/2014, as desired.
Another post to follow shortly. Watch this space!