Last week I set readers the challenge which can be found here.

This one turns out to be a good deal more complex than it at first appears, and so perhaps not surprisingly no correct results were received..

**GreasySpot** at first thought that *Advanced Filter* would be a viable solution, but quickly realised that it wasn’t actually appropriate here. Besides, as I mentioned, the idea of this (and of all these challenges in fact) is to try to achieve the results using worksheet formulas alone.

So how can we achieve our desired results?

In the end I came with up two viable, different solutions, neither of which is what I would call particularly straightforward. In fact, I’m still of the opinion that there must exist a more succinct, elegant solution: it’s just that as yet it’s managed to elude me! As such I would still very much welcome any suggestions from readers which improve upon the solutions I give below.

I will dissect just one of these two solutions here, though if any readers request it I shall happily do so for the other in a future post. However, for anyone interested, the alternative solution, a FREQUENCY-based approach to the problem, is the following **array formula**:

=IF(ROWS($1:1)>$D$1,"",INDEX(List1,MATCH(TRUE,IF(Arry5,MMULT(0+(Arry4>=TRANSPOSE(Arry4)),IF(Arry5,COUNTIF(List1,List1)-COUNTIF(List2,List1),0)),0)>=ROWS($1:1),0)))

where:

*Arry4* is defined as:

=ROW(INDIRECT("1:"&1+ROWS(List1)))

and *Arry5* as:

=FREQUENCY(0+List1,0+List1)

I happen to think that the above is a touch more complex than the approach I am about to present, however, and perhaps also not as intuitive.

As readers who have attempted to solve this challenge may be aware, one of the stumbling blocks which prevents itself is the fact that the values in our lists are not *unique*. In fact, if this *were* the case, it would be a relatively straightforward matter of constructing an array of values which return FALSE to some form of MATCH query between *List1* and *List2*.

Things are not so simple, however, and so, if we want to use an approach such as this, we must first ensure that our values *are* unique, though at the same time being careful that they can somehow still be related back to the original values.

We do this as follows:

First go to *Name Manager* and define $A$2:$A$12 and $B$2:$B$12 as *List1* and *List2* respectively, as well as defining:

*Arry1* as:

=ROW(List1)-MIN(ROW(List1))

and *Arry2* as:

=ROWS(List1)-ROW(List1)+MIN(ROW(List1))

The formula in D1, used to determine the number of expected returns, is simply:

=COUNT(List1)-COUNT(List2)

and the **array formula** in C2 is then:

=IF(ROWS($1:1)>$D$1,"",SMALL(IF(1-ISNUMBER(MATCH(List1+(COUNTIF(OFFSET(INDEX(List1,1,1),Arry1,,Arry2,),List1)/10^6),List2+(COUNTIF(OFFSET(INDEX(List2,1,1),Arry1,,Arry2,),List2)/10^6),0)),List1),ROWS($1:1)))

**How does it work?**

I’ll be using the values in *List1* and *List2* as given in the original post last week, so readers wishing to follow the breakdown may wish to remind themselves of those values.

The initial IF clause is, as usual, quite straightforward. The formula in D1 will generate our expected number of returns, i.e. 7 in this case, and so, in formulas beyond the seventh row to which we copy the formula we know that we will return a blank, as required.

Let’s then deconstruct the main (FALSE) clause in this solution, and I’ll begin by examining the COUNTIF part, which forms the bulk of this set-up and indeed appears twice within it. That part is:

COUNTIF(OFFSET(INDEX(List1,1,1),Arry1,,Arry2,),List1)

This part:

INDEX(List1,1,1)

is quite straightforward, and simply returns a reference to the first cell in *List1*, i.e. A2. Of course, I could simply have hard-coded this reference in the formula itself, though I personally prefer to generalize my solutions where possible.

The *rows* and *height* parameters of our OFFSET function are given by *Arry1* and *Arry2* respectively, so let’s look at each of these. Recall that *Arry1* was defined as:

=ROW(List1)-MIN(ROW(List1))

which resolves to:

{2;3;4;5;6;7;8;9;10;11;12}-MIN({2;3;4;5;6;7;8;9;10;11;12})

which is:

{2;3;4;5;6;7;8;9;10;11;12}-2

i.e.:

{0;1;2;3;4;5;6;7;8;9;10}

Looking at *Arry2* now, which we defined as:

=ROWS(List1)-ROW(List1)+MIN(ROW(List1))

we see that this resolves to:

11-{2;3;4;5;6;7;8;9;10;11;12}+MIN({2;3;4;5;6;7;8;9;10;11;12})

which is:

11-{2;3;4;5;6;7;8;9;10;11;12}+2

i.e.:

{11;10;9;8;7;6;5;4;3;2;1}

And if we now look at our expanded COUNTIF construction, we have:

COUNTIF(OFFSET(A2,{0;1;2;3;4;5;6;7;8;9;10},,{11;10;9;8;7;6;5;4;3;2;1},),List1)

and so we see that we will be passing two arrays of 11 values each as our *rows* and *height* parameters to OFFSET, which will effectively mean that we will be passing 11 separate ranges to COUNTIF.

The first of these ranges will be defined via offsetting cell A2 by 0 rows and having a height of 11, i.e. A2:A12. The second of these ranges will be obtained by offsetting A2 by 1 row and having a height of 10, i.e. A3:A12, the third such range will be A4:A12, the fourth A5:A12, and so on and so on until we get to our eleventh range which will consist of a single cell, i.e. A12.

What’s more, for each of these 11 ranges, the *criteria* parameter being passed to COUNTIF will be the entry in the *corresponding* position from the array of 11 elements in *Arry1*. Effectively, then, the 11 COUNTIFs we will be performing in a single construction will be equivalent to calculating each of:

=COUNTIF(A2:A12,A2)

=COUNTIF(A3:A12,A3)

=COUNTIF(A4:A12,A4)

…

=COUNTIF(A12:A12,A12)

So why have we gone to all the effort of generating such a construction?

Well, resolving our COUNTIF/OFFSET construction gives the array:

{2;1;1;3;2;1;2;1;1;2;1}

(Readers can easily verify that these would also be the results of the 11 individual formulas that I laid out above.)

What we have effectively done here is to generate a series of additional indices which will help differentiate those entries in *List1* which share the same value.

By taking these counts over an ever-decreasing range (first A2:A12, then A3:A12, then A4:A12, etc.) we guarantee that the count associated with each entry in *List1* having the same value will be unique.

Things might become clearer when we see how this array is being treated in the main formula, since we can see that we are first dividing each element within this array by a very large value (10^6), and then adding the resulting array to the array *List1*. Hence:

List1+(COUNTIF(OFFSET(INDEX(List1,1,1),Arry1,,Arry2,),List1)/10^6)

becomes:

{1;1;2;3;3;3;4;4;5;6;6}+({0.000002;0.000001;0.000001;0.000003;0.000002;0.000001;0.000002;0.000001;0.000001;0.000002;0.000001})

which is then:

{1.000002;1.000001;2.000001;3.000003;3.000002;3.000001;4.000002;4.000001;5.000001;6.000002;6.000001}

And so readers can hopefully see that the addition of these small increments, obtained from our array of counts, has resulted in an array in which each entry is now *unique*.

The three entries of 3 from our original *List1* array have now become 3.000001, 3.000002 and 3.000003. Our 2 values of 6 in the original array are now 6.000001 and 6.000002.

The important thing to bear in mind here is that the small increments which now differentiate these entries are not simply random: rather, they attribute a count to each of those entries. If, for example, the largest value with an integer part of 3 in the generated array is 3.000003, we know that there must be precisely three entries of 3 in *List1*. Similarly, the fact that the largest value with an integer part of 2 in that array is 2.000001 tells us that there is only a single entry of 2 in *List1*.

The point of all this is that we can now perform a similar operation on the entries in *List2*, and this near-identical construction is seem in our main formula as the *lookup_array* for the MATCH function, i.e.:

List2+(COUNTIF(OFFSET(INDEX(List2,1,1),Arry1,,Arry2,),List2)/10^6)

Substituting in the actual entries from *List2* and also the arrays *Arry1* and *Arry2*, calculated above, gives:

{1;3;3;6;0;0;0;0;0;0;0}+(COUNTIF(OFFSET($B$2,{0;1;2;3;4;5;6;7;8;9;10},,{11;10;9;8;7;6;5;4;3;2;1},),{1;3;3;6;0;0;0;0;0;0;0})/10^6)

and, by a similar logical process as that outlined above, we can easily verify that this will resolve to:

{1;3;3;6;0;0;0;0;0;0;0}+({1;2;1;1;0;0;0;0;0;0;0}/10^6)

which is finally:

{1.000001;3.000002;3.000001;6.000001;0;0;0;0;0;0;0}

Just as with *List1*, then, we have differentiated our entries in *List2* so that they are now effectively unique.

As such, we now have a means by which we can compare the two arrays via a simple MATCH statement. And this was precisely what we wished to do (and which some readers may have attempted) from the very start, but of course were unable to whilst our lists contained duplicate values.

If we now substitute these expanded arrays into our main formula, we have:

SMALL(IF(1-ISNUMBER(MATCH({1.000002;1.000001;2.000001;3.000003;3.000002;3.000001;4.000002;4.000001;5.000001;6.000002;6.000001},{1.000001;3.000002;3.000001;6.000001;0;0;0;0;0;0;0},0)),List1),ROWS($1:1))

and resolving the MATCH part gives:

SMALL(IF(1-ISNUMBER({#N/A;1;#N/A;#N/A;2;3;#N/A;#N/A;#N/A;#N/A;4}),List1),ROWS($1:1))

which is:

SMALL(IF(1-{FALSE;TRUE;FALSE;FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;TRUE},List1),ROWS($1:1))

i.e.:

SMALL(IF({1;0;1;1;0;0;1;1;1;1;0},List1),ROWS($1:1))

Expanding *List1*:

SMALL(IF({1;0;1;1;0;0;1;1;1;1;0},{1;1;2;3;3;3;4;4;5;6;6}),ROWS($1:1))

and resolving the IF function:

SMALL({1;FALSE;2;3;FALSE;FALSE;4;4;5;6;FALSE},ROWS($1:1))

And you can see that the numerical values within this array are precisely those that we wish to return.

An interesting challenge, then, and one whose apparent simplicity would certainly seem to suggest a more concise solution than that which I’ve just outlined. However, I hope that readers have at least taken something of value from this deconstruction. The technique for allocating counts to each value within a range, and so effectively making them unique, is a quite useful one.

Another challenge to follow shortly (Monday morning this week – bit later than usual). Watch this space!