Last week I set readers the challenge which can be found here.
This one turns out to be a good deal more complex than it at first appears, and so perhaps not surprisingly no correct results were received..
GreasySpot at first thought that Advanced Filter would be a viable solution, but quickly realised that it wasn’t actually appropriate here. Besides, as I mentioned, the idea of this (and of all these challenges in fact) is to try to achieve the results using worksheet formulas alone.
So how can we achieve our desired results?
In the end I came with up two viable, different solutions, neither of which is what I would call particularly straightforward. In fact, I’m still of the opinion that there must exist a more succinct, elegant solution: it’s just that as yet it’s managed to elude me! As such I would still very much welcome any suggestions from readers which improve upon the solutions I give below.
I will dissect just one of these two solutions here, though if any readers request it I shall happily do so for the other in a future post. However, for anyone interested, the alternative solution, a FREQUENCY-based approach to the problem, is the following array formula:
Arry4 is defined as:
and Arry5 as:
I happen to think that the above is a touch more complex than the approach I am about to present, however, and perhaps also not as intuitive.
As readers who have attempted to solve this challenge may be aware, one of the stumbling blocks which prevents itself is the fact that the values in our lists are not unique. In fact, if this were the case, it would be a relatively straightforward matter of constructing an array of values which return FALSE to some form of MATCH query between List1 and List2.
Things are not so simple, however, and so, if we want to use an approach such as this, we must first ensure that our values are unique, though at the same time being careful that they can somehow still be related back to the original values.
We do this as follows:
First go to Name Manager and define $A$2:$A$12 and $B$2:$B$12 as List1 and List2 respectively, as well as defining:
and Arry2 as:
The formula in D1, used to determine the number of expected returns, is simply:
and the array formula in C2 is then:
How does it work?
I’ll be using the values in List1 and List2 as given in the original post last week, so readers wishing to follow the breakdown may wish to remind themselves of those values.
The initial IF clause is, as usual, quite straightforward. The formula in D1 will generate our expected number of returns, i.e. 7 in this case, and so, in formulas beyond the seventh row to which we copy the formula we know that we will return a blank, as required.
Let’s then deconstruct the main (FALSE) clause in this solution, and I’ll begin by examining the COUNTIF part, which forms the bulk of this set-up and indeed appears twice within it. That part is:
is quite straightforward, and simply returns a reference to the first cell in List1, i.e. A2. Of course, I could simply have hard-coded this reference in the formula itself, though I personally prefer to generalize my solutions where possible.
The rows and height parameters of our OFFSET function are given by Arry1 and Arry2 respectively, so let’s look at each of these. Recall that Arry1 was defined as:
which resolves to:
Looking at Arry2 now, which we defined as:
we see that this resolves to:
And if we now look at our expanded COUNTIF construction, we have:
and so we see that we will be passing two arrays of 11 values each as our rows and height parameters to OFFSET, which will effectively mean that we will be passing 11 separate ranges to COUNTIF.
The first of these ranges will be defined via offsetting cell A2 by 0 rows and having a height of 11, i.e. A2:A12. The second of these ranges will be obtained by offsetting A2 by 1 row and having a height of 10, i.e. A3:A12, the third such range will be A4:A12, the fourth A5:A12, and so on and so on until we get to our eleventh range which will consist of a single cell, i.e. A12.
What’s more, for each of these 11 ranges, the criteria parameter being passed to COUNTIF will be the entry in the corresponding position from the array of 11 elements in Arry1. Effectively, then, the 11 COUNTIFs we will be performing in a single construction will be equivalent to calculating each of:
So why have we gone to all the effort of generating such a construction?
Well, resolving our COUNTIF/OFFSET construction gives the array:
(Readers can easily verify that these would also be the results of the 11 individual formulas that I laid out above.)
What we have effectively done here is to generate a series of additional indices which will help differentiate those entries in List1 which share the same value.
By taking these counts over an ever-decreasing range (first A2:A12, then A3:A12, then A4:A12, etc.) we guarantee that the count associated with each entry in List1 having the same value will be unique.
Things might become clearer when we see how this array is being treated in the main formula, since we can see that we are first dividing each element within this array by a very large value (10^6), and then adding the resulting array to the array List1. Hence:
which is then:
And so readers can hopefully see that the addition of these small increments, obtained from our array of counts, has resulted in an array in which each entry is now unique.
The three entries of 3 from our original List1 array have now become 3.000001, 3.000002 and 3.000003. Our 2 values of 6 in the original array are now 6.000001 and 6.000002.
The important thing to bear in mind here is that the small increments which now differentiate these entries are not simply random: rather, they attribute a count to each of those entries. If, for example, the largest value with an integer part of 3 in the generated array is 3.000003, we know that there must be precisely three entries of 3 in List1. Similarly, the fact that the largest value with an integer part of 2 in that array is 2.000001 tells us that there is only a single entry of 2 in List1.
The point of all this is that we can now perform a similar operation on the entries in List2, and this near-identical construction is seem in our main formula as the lookup_array for the MATCH function, i.e.:
Substituting in the actual entries from List2 and also the arrays Arry1 and Arry2, calculated above, gives:
and, by a similar logical process as that outlined above, we can easily verify that this will resolve to:
which is finally:
Just as with List1, then, we have differentiated our entries in List2 so that they are now effectively unique.
As such, we now have a means by which we can compare the two arrays via a simple MATCH statement. And this was precisely what we wished to do (and which some readers may have attempted) from the very start, but of course were unable to whilst our lists contained duplicate values.
If we now substitute these expanded arrays into our main formula, we have:
and resolving the MATCH part gives:
and resolving the IF function:
And you can see that the numerical values within this array are precisely those that we wish to return.
An interesting challenge, then, and one whose apparent simplicity would certainly seem to suggest a more concise solution than that which I’ve just outlined. However, I hope that readers have at least taken something of value from this deconstruction. The technique for allocating counts to each value within a range, and so effectively making them unique, is a quite useful one.
Another challenge to follow shortly (Monday morning this week – bit later than usual). Watch this space!