Advanced Formula Challenge #6: Results and Discussion Reply

Last week I set readers the challenge which can be found here.

This one was perhaps a little less complex than ones I’d set in previous weeks, though of course it would still, in my opinion, fall within the boundaries of what I would deem “advanced Excel”.

It also demonstrates some techniques which we can apply to solving problems involving non-contiguous ranges, and in particular tell us which functions may be applicable to such set-ups.

Two good solutions received from John Jairo V and cyrilbrd (and Bill‘s was practically there as well, but for a small amendment – and the fact that I didn’t structure the question in full to begin with – sorry!).

This week I will use the solution as given by John by means of explanation. He actually offered two solutions, the second being a non-array AGGREGATE version of the array formula.

In fact, although I’m aware that not all readers may have Excel 2010 or later, I would like to choose that one for my deconstruction, since, as I mentioned to John, it is nice to see such a formula being used, and readers not familiar with it (or who have little experience with it) may benefit from an example of how this function operates as an alternative to CSE constructions.

Besides, the set-up with AGGREGATE does not differ greatly in its syntax from the array version, once you understand the workings of that formula, and so it should not be too difficult for readers wishing to construct the equivalent array formula from this AGGREGATE solution to do so.

John’s set-up was as follows:

First, we go to Name Manager and define RNG as:

=$C$2:$C$7,$E$2:$E$4,$E$6:$E$9,$G$3:$G$11,$J$1:$J$9,$L$5:$L$6,$N$3:$N$7,$N$9:$N$10,$N$12

i.e. a range union consisting of several non-contiguous ranges.

Readers who weren’t aware that it was possible to define such a range should take note. Such range unions cannot be passed to all worksheet functions, though there are several functions which can handle such ranges.

The formula in A1, used to determine the number of non-blanks returns, was:

=SUMPRODUCT(--(FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)=1))

And the main formula in A2 then:

=IF(ROWS(A$2:A2)>$A$1,"",AGGREGATE(15,6,(ROW(INDIRECT("1:"&MAX(RNG)+1))-1)/(FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)=1),ROWS(A$2:A2)))

How does it work?

First let’s look at that formula in A1, which gives us our expected number of returns, i.e.:

=SUMPRODUCT(--(FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)=1))

In fact, the basic gist of this set-up is the use of the FREQUENCY function, which is an extremely useful function, even more so as it is capable of operating over such non-contiguous ranges.

Another such function is MAX, which also happily operates over multiple, non-contiguous ranges. Hence, this part:

MAX(RNG)

is equivalent to taking the maximum value from across all the ranges which make up RNG. This function also has the nice property that it ignores any logical and text values within the range(s) passed to it, which is obviously very important here.

Hence, in our example, this maximum value is 9, and so:

ROW(INDIRECT("1:"&MAX(RNG)+1))-1

becomes:

ROW(INDIRECT("1:"&9+1))-1

which is:

ROW(INDIRECT("1:"&10))-1

i.e.:

{1;2;3;4;5;6;7;8;9;10}-1

which is finally:

{0;1;2;3;4;5;6;7;8;9}

And so we see that we have created an array of values from 0 up to the maximum value across our ranges, which will be used as the bins_array parameter for the FREQUENCY function.

In his solution, Bill used instead a hard-coded range equivalent to the above, though that approach would only work if the values in RNG were all single-digit numbers.

Even though all those in the example I gave were just that, I gave no explicit indication that this could be assumed to always be the case, and so we should look to more flexible, dynamic set-ups such as that here, which, by first determining the maximum number in our range, guarantees a sufficient range of values for our bins_array.

Hence, this part:

FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)

now becomes:

FREQUENCY(RNG,{0;1;2;3;4;5;6;7;8;9})

which is:

{0;1;1;0;0;5;2;2;0;1;0}

(For readers not sure of how FREQUENCY operates, suffice to say that this array can be interpreted as saying that, across our range union of discontinuous ranges, there are, in total, no occurrences of the first value in our bins_array, i.e. 0, one occurrence of the second value in our bins_array, i.e. 1, etc., etc.)

Since the premise of the challenge is to return those values which occur precisely once, we are clearly interested in the 1s from this array, and so:

=SUMPRODUCT(--(FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)=1))

becomes:

=SUMPRODUCT(--({0;1;1;0;0;5;2;2;0;1;0}=1))

i.e.:

=SUMPRODUCT(--({FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}))

which is evidently 3.

And so we know that, in rows to which we copy this formula, beyond the third such row we will return blanks, as required.

Let’s now look at the main formula, then, which recall is:

=IF(ROWS(A$2:A2)>$A$1,"",AGGREGATE(15,6,(ROW(INDIRECT("1:"&MAX(RNG)+1))-1)/(FREQUENCY(RNG,ROW(INDIRECT("1:"&MAX(RNG)+1))-1)=1),ROWS(A$2:A2)))

The initial IF clause we have just explained, and we can immediately substitute in both the results of the FREQUENCY function which we have just calculated, and also the ROW/INDIRECT clause, both of which appear again in this construction. We will therefore have:

AGGREGATE(15,6,({0;1;2;3;4;5;6;7;8;9})/({FALSE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE}),ROWS(A$2:A2))

And this is the key technique to note when using AGGREGATE in this way. Since, if we set the second parameter in this function, options, to 6, i.e. “Ignore error values”, then it will do precisely that.

And, just like if we were using an array formula with an IF statement, here we can ensure that our TRUE returns from the FREQUENCY function return their corresponding values from the first array whilst also ensuring that the FALSEs are not considered by employing the neat trick of calculating the quotient of the two arrays.

The reason this works is that, since any numerical value divided by zero (which is the equivalent of Boolean FALSE) results in an error, the above will resolve to:

AGGREGATE(15,6,{#DIV/0!;1;2;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;9;#N/A},ROWS(A$2:A2))

The initial function_num parameter is here set to 15, which is equivalent to performing the worksheet function SMALL on the array, and, similar to that function, we also have a k parameter (here AGGREGATE’s fourth parameter).

Readers may want to note that setting the first parameter to 5, i.e. MIN, would not here be appropriate. In fact, of all the available options for AGGREGATE’s first parameter, only those from 14 to 19 may be employed in cases where the array being passed to it is not an actual worksheet range, but, for example (and as here), the array resulting from some operation(s) carried out by other functions nested within AGGREGATE. With function_num parameters 1-13, the array passed to AGGREGATE must be an actual worksheet range.

Our k parameter here is clearly 1, since, for the formula in A2, we obtain this via ROWS(A$2:A2). And so, the net result is that the above AGGREGATE construction resolves to 1, the smallest numerical value within that array.

Hence, putting it all together, we have:

=IF(ROWS(A$2:A2)>$A$1,"",1)

i.e.:

=IF(1>3,"",1)

which is 1, as required.

Many thanks to John again, and indeed to Cyril and Bill for their contributions.

Another challenge to follow shortly. Watch this space!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s