Return Entry Corresponding to Maximum Value Based on Conditions 39

We are often faced with the practical situation in which we need to return the entry from a certain column which corresponds to the maximum numerical value from another column subject to one or more conditions.

For example, from the table below:

INDEX_MAX_IF Non-array Alternative

we may wish to return the date (column C) which corresponds to the latest version (column B) for a given order number (column A), where by “latest” we mean “largest numerically”.

(You can download the workbook here.)

Usually we would have little choice but to do this with an array formula**, viz:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),IF(A2:A10=F1,B2:B10),0))

Readers should note the necessary double-inclusion of the IF clause here, not just in the construction for generating the lookup_value, but also in that for generating the lookup_array. Many’s the time I’ve witnessed attempted solutions to this scenario which omit this important point, i.e. which instead employ simply:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),B2:B10,0))

Whilst this construction could still return the correct value, it is not at all guaranteed to do so. The reason being that there is nothing to say that the maximum value which corresponds to that criterion is not also found within the range B2:B10 but for a different Order Number. And if that case happens to occur in a row which precedes the one we were hoping to return, then MATCH will evidently – and undesirably – return the date corresponding to that value instead.

In fact, here it can easily be verified that the above resolves to:

=INDEX(C2:C10,MATCH(MAX(IF({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE},{4;2;5;3;1;3;4;1;2})),B2:B10,0))

which is:

=INDEX(C2:C10,MATCH(MAX({FALSE;FALSE;FALSE;FALSE;FALSE;3;4;1;2}),B2:B10,0))

i.e.:

=INDEX(C2:C10,MATCH(4,B2:B10,0))

and if I just insert the actual values from the range B2:B10, this is:

=INDEX(C2:C10,MATCH(4,{4;2;5;3;1;3;4;1;2},0))

and, since we have taken no measures to first conditionally amend the range in which we are performing this MATCH, and since we are unfortunate both that there is more than one occurrence of our maximum value (i.e. 4) within this range, and that the first occurrence of that value does not in fact correspond to our required order number, we will thus obtain incorrect results, the above resolving to:

=INDEX(C2:C10,1)

i.e.: 21/02/2013, which is clearly not correct (being of course the date corresponding to the latest version for order 12345, and not for order 54321 as was our aim).

If we now take a look at the correct version of this array formula, i.e.:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),IF(A2:A10=F1,B2:B10),0))

we see that this time we have:

=INDEX(C2:C10,MATCH(4,IF(A2:A10=F1,B2:B10),0))

which is:

=INDEX(C2:C10,MATCH(4,IF({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE},{4;2;5;3;1;3;4;1;2}),0))

i.e.:

=INDEX(C2:C10,MATCH(4,{FALSE;FALSE;FALSE;FALSE;FALSE;3;4;1;2},0))

and this time, having first taken appropriate measures with regards the lookup_array, we can see that there is only one occurrence of our maximum value (4) within this array (even if there had been more than one, we would nevertheless be certain that they all related to the order number in question, and not to a different one). The above is thus:

=INDEX(C2:C10,7)

i.e. 07/10/2014, this time as desired.

Whilst there is nothing whatsoever wrong with this array formula set-up, I happen to feel that the following non-array construction is a touch more elegant (and potentially less resource-heavy), i.e.:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

Before I begin a deconstruction of this formula, readers wondering why, if non-CSE is something which is being pursued, we cannot employ an AGGREGATE-version of the above MAX/IF construction, suffice to say that it is, again, precisely this need to repeat our conditional statement within MATCH’s lookup_array which makes such a solution problematic.

To clarify, we need something like:

=INDEX(C2:C10,MATCH(AGGREGATE(14,6,B2:B10/(A2:A10=F1),1),IF(A2:A10=F1,B2:B10),0))

in which we have successfully replaced our MAX/IF construction in the CSE version with an appropriate one employing AGGREGATE.

However, the above formula still requires CSE in order to give correct results. The reason being that the lookup_array consists of an IF statement which is attempting to return our conditional array (as we saw previously). And of course this IF clause will not, without CSE, resolve to an array of returns.

Indeed, if we wish to use AGGREGATE here (and without CSE), we have to abandon the IF clause and accept a certain degree of convolutedness in order to force a coercion of that second array. For example:

=INDEX(C2:C10,MATCH(AGGREGATE(14,6,B2:B10/(A2:A10=F1),1),MMULT((A2:A10=F1)*B2:B10,1),0))

where the MMULT here acts to coerce the necessary array from the product:

(A2:A10=F1)*B2:B10

which would otherwise fail to resolve as such.

All of which, I hope, will convince readers of the justification for the alternative construction which I would now like to demonstrate, which recall is:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

Certainly, and especially at first sight, a rather non-standard-looking expression. Though also, as can be seen, surprisingly concise, and of course non-array.

So let’s have a look at how this construction operates. This part:

(A2:A10=F1)*B2:B10

will return an array consisting of entries from column B where the entry in the corresponding row of column A is equal to the value in F1, since we have:

({12345;12345;12345;12345;12345;54321;54321;54321;54321}=54321)*B2:B10

which is:

({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE})*B2:B10

i.e.:

{0;0;0;0;0;3;4;1;2}

Now, here comes the ingenious part. (And here I would like to thank Lori again, whose use of a similar construction was the inspiration behind this solution. In fact, all I have done is to tweak that construction ever-so-slightly so that it can handle error values, and so be employed in such cases as this.)

In order to find the position of the maximum value within this array, rather than use some combination along the lines of MATCH(MAX,…, etc., which would in any case require repetition of the clause which generated the above array, we can proceed as follows:

We first add a small value to each of the values in the above array. This is necessary since we will next be reciprocating the values in this array with unity, and, if we did not take this action, the zeroes in our array would result in #DIV/0! errors, an inconvenience which would complicate matters when we come to passing this array to FREQUENCY.

Hence:

1/(1+(A2:A10=F1)*B2:B10)

becomes:

1/(1+{0;0;0;0;0;3;4;1;2})

which is then:

{1;1;1;1;1;0.25;0.2;0.5;0.333333333333333}

The crux of this set-up is that we now pass this array to FREQUENCY as the bins_array parameter, with zero as the data_array. i.e. we construct:

FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10))

which here is:

FREQUENCY(0,{1;1;1;1;1;0.25;0.2;0.5;0.333333333333333})

The logical reasoning behind our prior reciprocation now becomes evident, since it means that the largest value in the array prior to reciprocation (which of course is what we are seeking to determine) will necessarily be the smallest value in the new, post-reciprocation array.

And, if we pass an array of values, all between 0 and 1 (which of course must be the case for positive values which have been reciprocated with unity), as the bins_array to FREQUENCY, with 0 as the data_array, then that zero will be assigned to the smallest value from our bins_array; the remaining bins of course will be empty, or zero.

Since the smallest value is here 0.2, and since it occurs in the 7th position within that array, the above will resolve to:

{0;0;0;0;0;0;1;0;0;0}

(Note that, even though our range – A2:A10 – consists of only 9 cells, the above array in fact contains 10 elements. Most readers will know that this is due to the property of FREQUENCY of returning an array with one more entry than the bins_array passed to it.)

Having obtained this array, we simply need to now return the entry from column C from the position corresponding to the one occurrence of a non-zero entry (i.e. 1) in this array.

Naturally there are several ways in which we could do this. An INDEX/MATCH combination would be one, though I tend to have a slight preference for using LOOKUP in these situations, so that:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

which is:

=LOOKUP(1,0/{0;0;0;0;0;0;1;0;0;0},C2:C10)

i.e.:

=LOOKUP(1,{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!},C2:C10)

and which is 07/10/2014, as desired.

Another post to follow shortly. Watch this space!

39 comments

  1. Pingback: formula to extract the records with oldest date and another criteriia - Page 3

  2. Woow!! You maintained you promises. I asked you in Mr.Excel.com forum to explain me the formula that I posted and here we are, excellent explanation….
    Thank you 😊 Mr.XOR LX. You are top one excel expert… Waiting for next formula thank you

  3. Actually that formula above would only make sense if there is actually a match in column A, but I wanted to show off the SORT function anyways XD

  4. @Isai

    Indeed! And having read your articles on that function, I can testify to how good they are. πŸ™‚ And that’s coming from someone who doesn’t even use Sheets!

  5. Very good article. I have one question nevertheless: could you explain the reason of the “0/FREQUENCY” part?

    Thanks

  6. @Pierre

    Thanks! πŸ™‚

    Re your question: this is a standard technique with regards this construction using LOOKUP as an alternative to the INDEX/MATCH set-up I mentioned in the article.

    The reciprocation of the array prior to passing to LOOKUP guarantees that any zeroes in that array become #DIV/0! errors – and so are ignored by LOOKUP. The one non-zero value (1 in this case) will still be 1 after reciprocation – in fact, this will be the only non-error value within the array. As such, it will be that which LOOKUP uses in its calculations.

    If we did not first reciprocate the array, we would not at all guarantee correct results: if we instead try simply:

    =LOOKUP(1,{0;0;0;0;0;0;1;0;0;0},C2:C10)

    then are several reasons why this will not work. (This formula will in fact always return the value in C11, which is not even in our original table!)

    If you need a further technical explanation as to how LOOKUP operates in this way then let me know.

    Regards

  7. I love a the LOOKUP and FREQUENCY function together you have been Absolutely clear, it was nice i you make tutorial of this Advanced formula.. Like YouTube channel excelinfun..but of course your formulas are very advance so a tutorial would be much more intersting

  8. Fantastic site! I would suggest that your example would break if the version number were -1, causing a div/0 error in one of the array items. You could fix this by replacing “1+” in your formula with a different value, but I do have an alternative solution:

    =SUMPRODUCT(N(A2:A10=F1),N(B2:B10=MAX((A2:A10=F1)*B2:B10)),C2:C10)

    The first term checks if the order number matches, the second term checks if the version number is the latest version for the matching order, and the third term returns the date. The advantage is that you can easily look for the minimum number by changing “max” to “min”, etc.

    If there is a possibility of duplicates, you’d need to replace sumproduct with a different function or divide the whole function by:

    SUMPRODUCT(N(A2:A10=F1),N(B2:B10=MAX((A2:A10=F1)*B2:B10)))

    Which is just the count of matches.

    -Alex

  9. @Alex Groberman

    Hi! Glad you think so! Cheers! πŸ™‚

    And a very nice contribution as well. Yes – this construction is a good one, albeit one which – as you intimate – is more tailored for cases in which our desired return is a unique one (the amendment you propose for the case where this uniqueness is not something we can guarantee perhaps makes this set-up a little cumbersome).

    Also, have I missed something, or is your construction actually not as easily convertible to the equivalent MIN construction as you claim? Won’t any FALSE returns from the part:

    A2:A10=F1

    mean that:

    MIN((A2:A10=F1)*B2:B10)

    will always results in 0?

    Anyway, many thanks for an excellent contribution and I hope to hear more from you in the future!

    Regards

  10. @XOR LX

    Thanks for the explanation but I still cannot take the grasp of it 😦 Especially why, if we don’t do not reciprocate the array, the LOOKUP will return “0”. I feel that it is related to the fact that LOOKUP assume the array is sorted, but do not really understand how the formula will work. Thanks!

  11. @Pierre

    You are on the right lines with your comment re sorting.

    If the lookup_array is not sorted in ascending order, then, in general, the results of the LOOKUP function are not predictable, and not at all useful.

    However, there is one important point to make – and a very useful feature of LOOKUP – and that is that, if we choose a lookup_value which is guaranteed to be greater than all values within the lookup_array, then, with an unsorted lookup_array, LOOKUP will return the last (i.e. rightmost) numerical value in that array.

    Hence, in a construction such as:

    =LOOKUP(A1,{1,9,3,5,4,8,2})

    in which clearly the lookup_array is not sorted in ascending (numerical) order, if A1 is anything less than or equal to the largest value in that array, i.e. 9, then the results are sporadic to say the least. For example, if A1=2, then:

    =LOOKUP(2,{1,9,3,5,4,8,2})

    returns 1.

    And if A1=6 then:

    =LOOKUP(6,{1,9,3,5,4,8,2})

    returns 4, neither of which results has much meaning.

    However, if A1=10, then:

    =LOOKUP(10,{1,9,3,5,4,8,2})

    returns 2.

    In fact, if A1 is any value greater than 9 then the result will also be 2, since, as I mentioned, with an unsorted array and when the lookup_value is greater than all of the values in the lookup_array, LOOKUP returns the last numerical value within that array.

    And this is why you often see this method being employed for finding the last entry in a range which corresponds to certain conditions. Since LOOKUP also ignores error values, we can generally form a construction along the lines of:

    =LOOKUP(1,0/(criteria_range=criteria),result_vector)

    To give you an example using the table in this post, something like:

    =LOOKUP(1,0/(A2:A10=54321),B2:B10)

    will resolve to:

    =LOOKUP(1,0/({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE}),B2:B10)

    and taking advantage of the fact that:

    0/TRUE

    is 0 and:

    0/FALSE

    is #DIV/0!, we have:

    =LOOKUP(1,{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;0;0;0},B2:B10)

    and, since our choice of lookup_value was sufficiently large so as to always be greater than all of the values within our lookup_array (when we reciprocate an array of values – all equal to either 0 or 1 – with zero, that array can only ever contain values of 0 or #DIV/0!), we know that LOOKUP here will be looking for the last numerical within this array, i.e. the zero in the 9th position. It then returns the entry from the corresponding position in the range B2:B10, i.e. 2.

    If you find this method using reciprocation a little confusing, it doesn’t always have to be so. This property of LOOKUP crops up in a variety of practical situations: one example is searching for a list of keywords in a string and returning the entry corresponding to the found keyword from some table. Here we would have something like, e.g.:

    =LOOKUP(2^15,SEARCH(D1:D10,A1),E1:E10)

    For each of the values in D1:D10, if it is found within the string in A1 then SEARCH will return the numerical corresponding to the position within the string at which that value is found; if it is not found it will return a #VALUE! error.

    Hence we might have something like:

    =LOOKUP(2^15,{#VALUE!;#VALUE!;12;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!},E1:E10)

    The key point here is our choice of lookup_value. Clearly we have not done any prior reciprocation here, so of course 0 will not suffice (there is a value of 12 in our array). However, once you know, due to Excel’s limitations re this function, that the largest possible return from SEARCH is 32,767, then all we need to do is to ensure that our choice of lookup_value is greater than this figure.

    Any suitably large number will do. In fact, you will often see the use of the so-called BigNum here – 9.99999999999999E+307, the largest allowable positive value in Excel – though I tend to like the idea of using 2^15 (which just happens to be 32,768).

    Hope that helps.

  12. @XL XOR

    Great explanation! Now it is clear, thank you πŸ™‚

    Keep up the good work with your website, I enjoy the didactic style. It will hopefully help the average excel user that I am become more than average πŸ™‚

  13. @XL XOR

    You did it again … !!!

    Many many thanks for your de-construction explanation which is truly brilliant …!!!

  14. @XOR LX

    You’re absolutely right! Here’s my revision:

    =SUMPRODUCT(N(A2:A10=F1),N(B2:B10=MIN(IF(A2:A10=F1,B2:B10))),C2:C10)

    note that it’s now cse, but as you well know there are ways around that πŸ™‚

    With the new formula, max, min, or whatever can be used.

    -Alex

  15. @Alex

    Sure, but now you’re slightly defeating the point of this post, which was to present a non-array, concise alternative to the standard CSE construction involving INDEX/MATCH/MAX(MIN).

    True – your CSE set-up is a nice variation, though recall that it has the drawback that it will not work if there are any duplicates of the value in question.

    What’s more, please don’t use SUMPRODUCT within a CSE formula! πŸ™‚ SUMPRODUCT, in my opinion, should be seen as an alternative to array formulas, not a complement. Not that there’s anything wrong with what you’ve done, though the use of SUMPRODUCT is of course superfluous; we may as well just use SUM.

    Regards

  16. @XOR LX

    You’re absolutely right on both points πŸ™‚

    Somewhat unrelated, but have you noticed something funny about the FREQUENCY function? The help text has two interesting points:

    1) “If data_array contains no values, FREQUENCY returns an array of zeros.”

    2) “If bins_array contains no values, FREQUENCY returns the number of elements in data_array.”

    In practice, the only way to actually cause result #1 is to pass either a blank or text cell or an array only containing text, such as {“”}, to data_array. Passing zero, an error, or “” without the array won’t trigger the array of zeros.

    And I’m pretty sure result #2 isn’t actually possible, as the function will always return an array with a minimum of two entries (or an error), regardless of what I feed bins_array.

    To summarize, it seems the data_array parameter is checked by the function to see if it’s either only blank/text cells or an array containing only text, and if either is true it will return result #1. But thus far I haven’t figured out a way to coerce result #2 out of it.

    -Alex

  17. It is time that we meet here, in your wonderful site !

    Here are two more versions ,

    1)

    =AGGREGATE(14,6,(C2:C10)/(A2:A10=F1)/(B2:B10=AGGREGATE(14,6,(B2:B10)/(A2:A10=F1),1)),1)

    2)

    =MAX(IF(MMULT((A2:A10=F1)*(B2:B10=AGGREGATE(14,6,(B2:B10)/(A2:A10=F1),1)),1),C2:C10))

    David

  18. Thanks, David!

    Yes, the “double-AGGREGATE” solution is certainly possible, though also definitely not the shortest! πŸ™‚

    And another nice use of MMULT in your second offering to coerce a return from the IF statement without CSE – very nice!

    Good to see you here and look forward to more contributions from you in future.

    Cheers

  19. @XOR LX

    Hi and good day,

    Glad to see your site, it is amazing. I found there are lot of stuff need to be explore.

    I haven’t used double IF for both lookup value and array construction, this is truly amazing.

    Thank you so much for the explanation.

    Take Care.

    Khalid

  20. @Khalid

    Good to see you here! πŸ™‚

    Thank you for your kind words, and I hope that you find some material of interest here.

    Regards

  21. @Elias

    Unfortunately this suffers from the same drawback as the MAX/IF set-up in which the conditional statement is not also repeated within MATCH’s lookup_array.

    For the data as posted, try changing the value in A2 to 54321 and that in A8 to 12345 and see what your formula gives.

    Or, if your objection to that is that this would be to make it so the Version numbers are no longer grouped in order, simply put all the 54321 data at the top of the table and all the 12345 data at the bottom and, again, you’ll find that your formula does not give the correct result.

    Regards

  22. This may work then,

    =LOOKUP(1,1/(MAX(B2:B10*(A2:A10=F1))=MAX(B2:B10*(A2:A10=F1))),C2:C10)

    Regards

  23. Pingback: Index Match with MAX IFs

  24. Pingback: Shortest Formula Challenge #6: Results and Discussion « EXCELXOR

  25. What if i had to run this equation for 80k+ unique ID entries, for which I want to return the adjacent value to the maximum date associated with multiple IDs in a dataset where each ID has multiple entries?

    The calculations are bogging down my excel pretty badly

  26. @Su

    What’s the current formula you’re using? And when you refer to “80k+ unique ID entries”, is that the range being queried, or the number of entries for which you are seeking a solution (if the latter, what actually is the size of the range being queried?).

    Regards

  27. Hi XOR LX,

    You helped me over in SO on Friday and I was able to replicate the formula correctly when I was at home. Now that I am back at work, I am expanding my bin-array. The next step is the FREQUENCY function and ‘Evaluate Formula’ which returns many results larger than zero and in turn, when divided by zero, I am getting too many 0 instead of #DIV/0. When I take the formula and use it on a segment of the overall range it seems to work but when I expand it to 108 rows it is incorrect, any ideas?

  28. Hi XOR,

    You showed me this formula on SO on Friday and I was able to replicate it many times at home but when I got to work today I seem to be having trouble. Following the evaluate formula option the quandary seems to be at the FREQUENCY step which is return far more values above zero than it should. That in turn creates more 0 results than #DIV/0. Any ideas why this might be the case? Could external Add-ins or formulas be causing this problem?

  29. @Dan

    Can you let me know the precise construction you’re using which fails? When (i.e. for which ranges) does it cross the border between working and not working? Even better, are you able to share the corresponding file via some fileshare site (sensitive/confidential data replaced with dummy data if necessary)?

    Cheers

  30. XOR, it’s amazing how this post is two years old and still helping people. Here is a link to my excel spreadsheet

    https://drive.google.com/file/d/0Byp2zy_psqnEUmNHT2dBQTMtQ1U/view?usp=sharing

    I hope that works. It turns out that the issue is not with Add-ons. I have changed all the data to static and still come up with the same issue.

    In this example, each restaurant has a certain line of credit and a parent company (parent companies have unique IDs as well if that helps). I want to know the restaurant with the highest risk (largest line of credit) under each parent. However, if two restaurants have the same credit amount, the restaurant with the worse (higher) credit rating should show. The snag is that sometimes the restaurant with the overall worst (highest) rating of all the restaurants is showing up under parents that it does not belong to. The thing is, the formula,

    =LOOKUP(0,0/FREQUENCY(0,1/(D$15:D$19+(E$15:E$19=E15)*C$15:C$199)),B$15:B$19)

    works great (restaurants under Parent Japanese Co.) Although, once I try to cover all the restaurants with this formula,

    =LOOKUP(0,0/FREQUENCY(0,1/(D$2:D$29+(E$2:E$29=E2)*C$2:C$29)),B$2:B$29)

    it starts to act up.

    In the example, you will see “Stirred” showing multiple times throughout Column G. If you change Stirred’s rating to something low (10 for example) then the next highest, Subway in this case, replaces it.

    The issues seems to be with FREQUENCY and it’s very difficult to find information online regarding frequency that doesn’t require CSE, hence your blog post I guess.

    Any idea why this is happening? When I try to evaluate the formula, i can’t seem to understand why it acts the way it does.

  31. Ah, yes. Apologies. We need to make sure that the mitigating factor in ties is sufficiently small so as not to potentially interfere with results.Try:

    =LOOKUP(1,0/FREQUENCY(0,1/(D$2:D$29/10^6+(E$2:E$29=E2)*C$2:C$29)),B$2:B$29)

    By the way, I noticed that the formulas you posted begin:

    =LOOKUP(0,

    as opposed to:

    =LOOKUP(1,

    Your approach could technically fail, I believe, so best to use 1 for this value.

    Cheers

  32. Hi XOR, I must have pasted the wrong `LOOKUP(_,` as I was experimenting with solutions. I normally kept the 1 in. However, that solves my issue. Great solution.

    I figured I should mention that if the Restaurant is a stand-alone entity (the only one under the parent) AND has a Credit Limit of 0, then the restaurant with the highest highest credit rating reports as the Determining Entity. However, that is an easy work around, I just thought it should be noted. Thanks again, I appreciate it very much and learned a lot along the way!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s