We are often faced with the practical situation in which we need to return the entry from a certain column which corresponds to the maximum numerical value from another column subject to one or more conditions.

For example, from the table below:

we may wish to return the date (column C) which corresponds to the latest version (column B) for a given order number (column A), where by “latest” we mean “largest numerically”.

(You can download the workbook here.)

Usually we would have little choice but to do this with an **array formula****, viz:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),IF(A2:A10=F1,B2:B10),0))

Readers should note the necessary double-inclusion of the IF clause here, not just in the construction for generating the *lookup_value*, **but also** in that for generating the *lookup_array*. Many’s the time I’ve witnessed attempted solutions to this scenario which omit this important point, i.e. which instead employ simply:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),B2:B10,0))

Whilst this construction *could* still return the correct value, it is not at all *guaranteed* to do so. The reason being that there is nothing to say that the maximum value which corresponds to that criterion is not **also** found within the range B2:B10 but **for a different Order Number**. And if that case happens to occur in a row which precedes the one we were hoping to return, then MATCH will evidently – and undesirably – return the date corresponding to that value instead.

In fact, here it can easily be verified that the above resolves to:

=INDEX(C2:C10,MATCH(MAX(IF({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE},{4;2;5;3;1;3;4;1;2})),B2:B10,0))

which is:

=INDEX(C2:C10,MATCH(MAX({FALSE;FALSE;FALSE;FALSE;FALSE;3;4;1;2}),B2:B10,0))

i.e.:

=INDEX(C2:C10,MATCH(4,B2:B10,0))

and if I just insert the actual values from the range B2:B10, this is:

=INDEX(C2:C10,MATCH(4,{4;2;5;3;1;3;4;1;2},0))

and, since we have taken no measures to first conditionally amend the range in which we are performing this MATCH, and since we are unfortunate both that there is more than one occurrence of our maximum value (i.e. 4) within this range, and that the first occurrence of that value does *not* in fact correspond to our required order number, we will thus obtain incorrect results, the above resolving to:

=INDEX(C2:C10,1)

i.e.: **21/02/2013**, which is clearly not correct (being of course the date corresponding to the latest version for order **12345**, and not for order **54321** as was our aim).

If we now take a look at the correct version of this array formula, i.e.:

=INDEX(C2:C10,MATCH(MAX(IF(A2:A10=F1,B2:B10)),IF(A2:A10=F1,B2:B10),0))

we see that this time we have:

=INDEX(C2:C10,MATCH(4,IF(A2:A10=F1,B2:B10),0))

which is:

=INDEX(C2:C10,MATCH(4,IF({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE},{4;2;5;3;1;3;4;1;2}),0))

i.e.:

=INDEX(C2:C10,MATCH(4,{FALSE;FALSE;FALSE;FALSE;FALSE;3;4;1;2},0))

and this time, having first taken appropriate measures with regards the *lookup_array*, we can see that there is only one occurrence of our maximum value (4) within this array (even if there had been more than one, we would nevertheless be certain that they all related to the order number in question, and not to a different one). The above is thus:

=INDEX(C2:C10,7)

i.e. **07/10/2014**, this time as desired.

Whilst there is nothing whatsoever wrong with this array formula set-up, I happen to feel that the following non-array construction is a touch more elegant (and potentially less resource-heavy), i.e.:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

Before I begin a deconstruction of this formula, readers wondering why, if non-CSE is something which is being pursued, we cannot employ an AGGREGATE-version of the above MAX/IF construction, suffice to say that it is, again, precisely this need to repeat our conditional statement within MATCH’s *lookup_array* which makes such a solution problematic.

To clarify, we need something like:

=INDEX(C2:C10,MATCH(AGGREGATE(14,6,B2:B10/(A2:A10=F1),1),IF(A2:A10=F1,B2:B10),0))

in which we have successfully replaced our MAX/IF construction in the CSE version with an appropriate one employing AGGREGATE.

However, the above formula still requires CSE in order to give correct results. The reason being that the *lookup_array* consists of an IF statement which is attempting to return our conditional array (as we saw previously). And of course this IF clause will not, without CSE, resolve to an array of returns.

Indeed, if we wish to use AGGREGATE here (and without CSE), we have to abandon the IF clause and accept a certain degree of convolutedness in order to force a coercion of that second array. For example:

=INDEX(C2:C10,MATCH(AGGREGATE(14,6,B2:B10/(A2:A10=F1),1),MMULT((A2:A10=F1)*B2:B10,1),0))

where the MMULT here acts to coerce the necessary array from the product:

(A2:A10=F1)*B2:B10

which would otherwise fail to resolve as such.

All of which, I hope, will convince readers of the justification for the alternative construction which I would now like to demonstrate, which recall is:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

Certainly, and especially at first sight, a rather non-standard-looking expression. Though also, as can be seen, surprisingly concise, and of course non-array.

So let’s have a look at how this construction operates. This part:

(A2:A10=F1)*B2:B10

will return an array consisting of entries from column B where the entry in the corresponding row of column A is equal to the value in F1, since we have:

({12345;12345;12345;12345;12345;54321;54321;54321;54321}=54321)*B2:B10

which is:

({FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE})*B2:B10

i.e.:

{0;0;0;0;0;3;4;1;2}

Now, here comes the ingenious part. (And here I would like to thank **Lori** again, whose use of a similar construction was the inspiration behind this solution. In fact, all I have done is to tweak that construction ever-so-slightly so that it can handle error values, and so be employed in such cases as this.)

In order to find the position of the maximum value within this array, rather than use some combination along the lines of MATCH(MAX,…, etc., which would in any case require repetition of the clause which generated the above array, we can proceed as follows:

We first add a small value to each of the values in the above array. This is necessary since we will next be reciprocating the values in this array with unity, and, if we did not take this action, the zeroes in our array would result in #DIV/0! errors, an inconvenience which would complicate matters when we come to passing this array to FREQUENCY.

Hence:

1/(1+(A2:A10=F1)*B2:B10)

becomes:

1/(1+{0;0;0;0;0;3;4;1;2})

which is then:

{1;1;1;1;1;0.25;0.2;0.5;0.333333333333333}

The crux of this set-up is that we now pass this array to FREQUENCY as the *bins_array* parameter, with zero as the *data_array*. i.e. we construct:

FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10))

which here is:

FREQUENCY(0,{1;1;1;1;1;0.25;0.2;0.5;0.333333333333333})

The logical reasoning behind our prior reciprocation now becomes evident, since it means that the largest value in the array *prior to reciprocation* (which of course is what we are seeking to determine) will necessarily be the *smallest* value in the new, *post-reciprocation* array.

And, if we pass an array of values, all between 0 and 1 (which of course must be the case for positive values which have been reciprocated with unity), as the *bins_array* to FREQUENCY, with 0 as the *data_array*, then that zero will be assigned to the smallest value from our *bins_array*; the remaining bins of course will be empty, or zero.

Since the smallest value is here 0.2, and since it occurs in the 7th position within that array, the above will resolve to:

{0;0;0;0;0;0;1;0;0;0}

(Note that, even though our range – A2:A10 – consists of only 9 cells, the above array in fact contains 10 elements. Most readers will know that this is due to the property of FREQUENCY of returning an array with one more entry than the *bins_array* passed to it.)

Having obtained this array, we simply need to now return the entry from column C from the position corresponding to the one occurrence of a non-zero entry (i.e. 1) in this array.

Naturally there are several ways in which we could do this. An INDEX/MATCH combination would be one, though I tend to have a slight preference for using LOOKUP in these situations, so that:

=LOOKUP(1,0/FREQUENCY(0,1/(1+(A2:A10=F1)*B2:B10)),C2:C10)

which is:

=LOOKUP(1,0/{0;0;0;0;0;0;1;0;0;0},C2:C10)

i.e.:

=LOOKUP(1,{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!},C2:C10)

and which is **07/10/2014**, as desired.

Another post to follow shortly. Watch this space!

Pingback: formula to extract the records with oldest date and another criteriia - Page 3

Woow!! You maintained you promises. I asked you in Mr.Excel.com forum to explain me the formula that I posted and here we are, excellent explanation….

Thank you π Mr.XOR LX. You are top one excel expert… Waiting for next formula thank you

In Google Sheets, one way to do this would be to use the SORT function:

Spreadsheet: https://docs.google.com/spreadsheets/d/10MOdb-KqPO9UcebaUqYF22jq2Ktsg97hjOv3olyIS5U/edit?usp=sharing

Actually that formula above would only make sense if there is actually a match in column A, but I wanted to show off the SORT function anyways π

@GerryZYou’re very welcome! And thanks for your kind words. π

@IsaiIndeed! And having read your articles on that function, I can testify to how good they are. π And that’s coming from someone who doesn’t even use Sheets!

Very good article. I have one question nevertheless: could you explain the reason of the “0/FREQUENCY” part?

Thanks

@PierreThanks! π

Re your question: this is a standard technique with regards this construction using LOOKUP as an alternative to the INDEX/MATCH set-up I mentioned in the article.

The reciprocation of the array prior to passing to LOOKUP guarantees that any zeroes in that array become #DIV/0! errors – and so are ignored by LOOKUP. The one non-zero value (1 in this case) will still be 1 after reciprocation – in fact, this will be the only non-error value within the array. As such, it will be that which LOOKUP uses in its calculations.

If we did

notfirst reciprocate the array, we would not at all guarantee correct results: if we instead try simply:then are several reasons why this will not work. (This formula will in fact always return the value in C11, which is not even in our original table!)

If you need a further technical explanation as to how LOOKUP operates in this way then let me know.

Regards

I love a the LOOKUP and FREQUENCY function together you have been Absolutely clear, it was nice i you make tutorial of this Advanced formula.. Like YouTube channel excelinfun..but of course your formulas are very advance so a tutorial would be much more intersting

Fantastic site! I would suggest that your example would break if the version number were -1, causing a div/0 error in one of the array items. You could fix this by replacing “1+” in your formula with a different value, but I do have an alternative solution:

The first term checks if the order number matches, the second term checks if the version number is the latest version for the matching order, and the third term returns the date. The advantage is that you can easily look for the minimum number by changing “max” to “min”, etc.

If there is a possibility of duplicates, you’d need to replace sumproduct with a different function or divide the whole function by:

Which is just the count of matches.

-Alex

@Alex GrobermanHi! Glad you think so! Cheers! π

And a very nice contribution as well. Yes – this construction is a good one, albeit one which – as you intimate – is more tailored for cases in which our desired return is a

uniqueone (the amendment you propose for the case where this uniqueness is not something we can guarantee perhaps makes this set-up a little cumbersome).Also, have I missed something, or is your construction actually not as easily convertible to the equivalent MIN construction as you claim? Won’t any FALSE returns from the part:

mean that:

will always results in 0?

Anyway, many thanks for an excellent contribution and I hope to hear more from you in the future!

Regards

@XOR LX

Thanks for the explanation but I still cannot take the grasp of it π¦ Especially why, if we don’t do not reciprocate the array, the LOOKUP will return “0”. I feel that it is related to the fact that LOOKUP assume the array is sorted, but do not really understand how the formula will work. Thanks!

@PierreYou are on the right lines with your comment re sorting.

If the

lookup_arrayis not sorted in ascending order, then, in general, the results of the LOOKUP function are not predictable, and not at all useful.However, there is one important point to make – and a very useful feature of LOOKUP – and that is that, if we choose a

lookup_valuewhich isguaranteed to be greater thanall values within thelookup_array, then, with an unsortedlookup_array, LOOKUP will return the last (i.e. rightmost) numerical value in that array.Hence, in a construction such as:

in which clearly the

lookup_arrayis not sorted in ascending (numerical) order, if A1 is anything less than or equal to the largest value in that array, i.e. 9, then the results are sporadic to say the least. For example, if A1=2, then:returns 1.

And if A1=6 then:

returns 4, neither of which results has much meaning.

However, if A1=10, then:

returns 2.

In fact, if A1 is

anyvalue greater than 9 then the result will also be 2, since, as I mentioned, with an unsorted array and when thelookup_valueis greater than all of the values in thelookup_array, LOOKUP returns the last numerical value within that array.And this is why you often see this method being employed for finding the

lastentry in a range which corresponds to certain conditions. Since LOOKUP also ignores error values, we can generally form a construction along the lines of:To give you an example using the table in this post, something like:

will resolve to:

and taking advantage of the fact that:

is 0 and:

is #DIV/0!, we have:

and, since our choice of

lookup_valuewas sufficiently large so as to always be greater than all of the values within ourlookup_array(when we reciprocate an array of values – all equal to either 0 or 1 – with zero, that array can only ever contain values of 0 or #DIV/0!), we know that LOOKUP here will be looking for the last numerical within this array, i.e. the zero in the 9th position. It then returns the entry from the corresponding position in the range B2:B10, i.e. 2.If you find this method using reciprocation a little confusing, it doesn’t always have to be so. This property of LOOKUP crops up in a variety of practical situations: one example is searching for a list of keywords in a string and returning the entry corresponding to the found keyword from some table. Here we would have something like, e.g.:

For each of the values in D1:D10, if it is found within the string in A1 then SEARCH will return the numerical corresponding to the position within the string at which that value is found; if it is not found it will return a #VALUE! error.

Hence we might have something like:

The key point here is our choice of

lookup_value. Clearly we have not done any prior reciprocation here, so of course 0 will not suffice (there is a value of 12 in our array). However, once you know, due to Excel’s limitations re this function, that the largest possible return from SEARCH is 32,767, then all we need to do is to ensure that our choice oflookup_valueis greater than this figure.Any suitably large number will do. In fact, you will often see the use of the so-called

BigNumhere – 9.99999999999999E+307, the largest allowable positive value in Excel – though I tend to like the idea of using 2^15 (which just happens to be 32,768).Hope that helps.

@XL XOR

Great explanation! Now it is clear, thank you π

Keep up the good work with your website, I enjoy the didactic style. It will hopefully help the average excel user that I am become more than average π

@PierreGood! Glad to hear it! And thanks for your kind words. π

@XL XOR

You did it again … !!!

Many many thanks for your de-construction explanation which is truly brilliant …!!!

@XOR LX

You’re absolutely right! Here’s my revision:

note that it’s now cse, but as you well know there are ways around that π

With the new formula, max, min, or whatever can be used.

-Alex

@JamesYou’re very welcome! Glad you found it of use! π

Cheers!

@AlexSure, but now you’re slightly defeating the point of this post, which was to present a non-array, concise alternative to the standard CSE construction involving INDEX/MATCH/MAX(MIN).

True – your CSE set-up is a nice variation, though recall that it has the drawback that it will not work if there are any duplicates of the value in question.

What’s more, please

don’tuse SUMPRODUCT within a CSE formula! π SUMPRODUCT, in my opinion, should be seen as analternativeto array formulas, not a complement. Not that there’s anything wrong with what you’ve done, though the use of SUMPRODUCT is of course superfluous; we may as well just use SUM.Regards

@XOR LX

You’re absolutely right on both points π

Somewhat unrelated, but have you noticed something funny about the FREQUENCY function? The help text has two interesting points:

1) “If data_array contains no values, FREQUENCY returns an array of zeros.”

2) “If bins_array contains no values, FREQUENCY returns the number of elements in data_array.”

In practice, the only way to actually cause result #1 is to pass either a blank or text cell or an array only containing text, such as {“”}, to data_array. Passing zero, an error, or “” without the array won’t trigger the array of zeros.

And I’m pretty sure result #2 isn’t actually possible, as the function will always return an array with a minimum of two entries (or an error), regardless of what I feed bins_array.

To summarize, it seems the data_array parameter is checked by the function to see if it’s either only blank/text cells or an array containing only text, and if either is true it will return result #1. But thus far I haven’t figured out a way to coerce result #2 out of it.

-Alex

It is time that we meet here, in your wonderful site !

Here are two more versions ,

1)

2)

David

Thanks, David!

Yes, the “double-AGGREGATE” solution is certainly possible, though also definitely not the shortest! π

And another nice use of MMULT in your second offering to coerce a return from the IF statement without CSE – very nice!

Good to see you here and look forward to more contributions from you in future.

Cheers

@XOR LX

Hi and good day,

Glad to see your site, it is amazing. I found there are lot of stuff need to be explore.

I haven’t used double IF for both lookup value and array construction, this is truly amazing.

Thank you so much for the explanation.

Take Care.

Khalid

@KhalidGood to see you here! π

Thank you for your kind words, and I hope that you find some material of interest here.

Regards

What about this,

Regards

@EliasUnfortunately this suffers from the same drawback as the MAX/IF set-up in which the conditional statement is not also repeated within MATCH’s

lookup_array.For the data as posted, try changing the value in A2 to 54321 and that in A8 to 12345 and see what your formula gives.

Or, if your objection to that is that this would be to make it so the

Versionnumbers are no longer grouped in order, simply put all the 54321 data at the top of the table and all the 12345 data at the bottom and, again, you’ll find that your formula does not give the correct result.Regards

This may work then,

Regards

Nevermind it looks like it is not working either.

Regards

Pingback: Index Match with MAX IFs

Pingback: Shortest Formula Challenge #6: Results and Discussion « EXCELXOR

What if i had to run this equation for 80k+ unique ID entries, for which I want to return the adjacent value to the maximum date associated with multiple IDs in a dataset where each ID has multiple entries?

The calculations are bogging down my excel pretty badly

@SuWhat’s the current formula you’re using? And when you refer to “80k+ unique ID entries”, is that the range being queried, or the number of entries for which you are seeking a solution (if the latter, what actually

isthe size of the range being queried?).Regards

Hi XOR LX,

You helped me over in SO on Friday and I was able to replicate the formula correctly when I was at home. Now that I am back at work, I am expanding my bin-array. The next step is the FREQUENCY function and ‘Evaluate Formula’ which returns many results larger than zero and in turn, when divided by zero, I am getting too many 0 instead of #DIV/0. When I take the formula and use it on a segment of the overall range it seems to work but when I expand it to 108 rows it is incorrect, any ideas?

Hi XOR,

You showed me this formula on SO on Friday and I was able to replicate it many times at home but when I got to work today I seem to be having trouble. Following the evaluate formula option the quandary seems to be at the FREQUENCY step which is return far more values above zero than it should. That in turn creates more 0 results than #DIV/0. Any ideas why this might be the case? Could external Add-ins or formulas be causing this problem?

@DanCan you let me know the precise construction you’re using which fails? When (i.e. for which ranges) does it cross the border between working and not working? Even better, are you able to share the corresponding file via some fileshare site (sensitive/confidential data replaced with dummy data if necessary)?

Cheers

XOR, it’s amazing how this post is two years old and still helping people. Here is a link to my excel spreadsheet

https://drive.google.com/file/d/0Byp2zy_psqnEUmNHT2dBQTMtQ1U/view?usp=sharing

I hope that works. It turns out that the issue is not with Add-ons. I have changed all the data to static and still come up with the same issue.

In this example, each restaurant has a certain line of credit and a parent company (parent companies have unique IDs as well if that helps). I want to know the restaurant with the highest risk (largest line of credit) under each parent. However, if two restaurants have the same credit amount, the restaurant with the worse (higher) credit rating should show. The snag is that sometimes the restaurant with the overall worst (highest) rating of all the restaurants is showing up under parents that it does not belong to. The thing is, the formula,

works great (restaurants under Parent Japanese Co.) Although, once I try to cover all the restaurants with this formula,

it starts to act up.

In the example, you will see “Stirred” showing multiple times throughout Column G. If you change Stirred’s rating to something low (10 for example) then the next highest, Subway in this case, replaces it.

The issues seems to be with FREQUENCY and it’s very difficult to find information online regarding frequency that doesn’t require CSE, hence your blog post I guess.

Any idea why this is happening? When I try to evaluate the formula, i can’t seem to understand why it acts the way it does.

Ah, yes. Apologies. We need to make sure that the mitigating factor in ties is sufficiently small so as not to potentially interfere with results.Try:

By the way, I noticed that the formulas you posted begin:

as opposed to:

Your approach could technically fail, I believe, so best to use 1 for this value.

Cheers

Hi XOR, I must have pasted the wrong `LOOKUP(_,` as I was experimenting with solutions. I normally kept the 1 in. However, that solves my issue. Great solution.

I figured I should mention that if the Restaurant is a stand-alone entity (the only one under the parent) AND has a Credit Limit of 0, then the restaurant with the highest highest credit rating reports as the Determining Entity. However, that is an easy work around, I just thought it should be noted. Thanks again, I appreciate it very much and learned a lot along the way!

Glad to hear that we got there in the end! Cheers and until the next time!