Advanced Formula Challenge #3: Results and Discussion 3

Last week I posed readers the challenge which can be found here.

One solution was received, again from Bill, and this time it was not only correct, but a very good solution indeed. So congratulations again to Bill!

In fact, rather than dissect my own solution this week (which in any case differs only in minor details from Bill’s), I would like to present a breakdown of the solution given by Bill, as follows:

He first created two Defined Names:

Name: first
Refers to:

=LEFT(SUBSTITUTE($A$1:$A$6,"-",REPT(" ",5)),5)

Name: last
Refers to:

=RIGHT(SUBSTITUTE($A$1:$A$6,"-",REPT(" ",5)),5)

His array formula was then:

=IF(ROWS($D$1:$D1)>SUM(last-first+1),"",SMALL(IF(first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)>--last,"",first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)),ROWS($D$1:$D1)))

How does it work?

First let’s have a look at those two Defined Names. The first one, first:

=LEFT(SUBSTITUTE($A$1:$A$6,"-",REPT(" ",5)),5)

will resolve to (not on its own, of course, though recall that this function is being passed to an array formula, which will give the necessary coercion):

LEFT(SUBSTITUTE({"1-2";"4-6";"9";"10-11";"13-16";"21"},"-",REPT(" ",5)),5)

which is, resolving the REPT function:

=LEFT(SUBSTITUTE({"1-2";"4-6";"9";"10-11";"13-16";"21"},"-","     "),5)

As Bill himself stated, the choice of 5 for both the number_times parameter of REPT and the num_chars parameter of LEFT was arbitrary. We might (and should) of course choose a larger value for the 2nd parameter of the LEFT function, so as to be certain that our solution is valid for cases where the values to the left of the hyphen are much larger than those present here, e.g. 999,999.

However, looking at the data in question, I imagine Bill presumed that this choice would be more than sufficient, a pragmatic decision which I will not contest here!

This then becomes:

=LEFT({"1     2";"4     6";"9";"10     11";"13     16";"21"},5)

i.e.:

{"1    ";"4    ";"9";"10   ";"13   ";"21"}

and so we see that Bill has created an array of (untrimmed) strings containing the values to the left of the hyphen for each of the strings in A1:A6. By a similar deconstruction we can easily verify that the second Defined Name, last, will resolve to the following array:

{"    2";"    6";"9";"   11";"   16";"21"}

So, looking now at the initial IF statement in his solution:

IF(ROWS($D$1:$D1)>SUM(last-first+1),""

we can now substitute in our Defined Names, i.e.:

IF(ROWS($D$1:$D1)>SUM({"    2";"    6";"9";"   11";"   16";"21"}-{"1    ";"4    ";"9";"10   ";"13   ";"21"}+1),""

which resolves to:

IF(ROWS($D$1:$D1)>SUM({2;3;1;2;4;1}),""

Readers can see here that applying a TRIM function to the two arrays first and last was not strictly necessary: the mathematical operation (subtraction) which first coerces the strings into numerical values is able to ignore any extra spacing preceding (or following) the values in those strings.

And so we can see that the elements in this array being passed to SUM correspond to the number of returns we will generate from each of the entries in A1:A6: the string “1-2” will give us 2 values, the string “4-6” 3 values, the string “9” 1 value, etc., etc.

Hence, the sum of this array will provide us with the total number of expected returns, and so, resolving the above:

IF(ROWS($D$1:$D1)>13,""

we know that, beyond the 13th row to which this formula is copied, blanks will result from the formulas in those cells, as required.

Let’s now look at the main clause of the formula, which is:

SMALL(IF(first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)>--last,"",first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)),ROWS($D$1:$D1))

We can see that the core of this construction, and indeed which is repeated twice, is the part:

first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)

so let’s look in detail at how this operates.

Well, similarly to the above analysis, we can easily verify that the part:

last-first

resolves to the array:

{1;2;0;1;3;0}

and so taking the maximum of this will give 3. We then add 1 to this value to obtain 4.

In effect, what this value represents is the maximum number of returns we expect from any of the strings in A1:A6. You can see that it corresponds to the string “13-16”, from which we naturally expect to see 4 returns (13, 14, 15 and 16). The reason for going to the effort to obtain this maximum value will become clear soon.

Hence, our construction now looks like:

=first+TRANSPOSE(ROW(INDIRECT("1:"&4))-1)

which becomes:

first+TRANSPOSE({1;2;3;4}-1)

i.e.:

first+{0,1,2,3}

And this is the crux of Bill’s solution then, and a nice bit of technique. He first generates a single-column array consisting of values from 0 up to a value one less than the maximum number of expected returns, i.e. {0;1;2;3}, and then transposes it so that it becomes a single-row array, i.e. {0,1,2,3}.

Next, he adds this array to the array of strings defined by first, i.e. {“1 “;”4 “;”9”;”10 “;”13 “;”21”}, which as you can see is a single-column array.

Because the two arrays in this addition are orthogonal – one a 6-row-by-1-column array, the other a 1-row-by-4-column array – we know that the result of this addition will be a 6-row-by-4-column array the entries of which will consist of the 24 values obtained by adding each of the 4 values in the array {0,1,2,3} in turn to the six values in the array {“1 “;”4 “;”9”;”10 “;”13 “;”21”}.

Hence:

first+TRANSPOSE({1;2;3;4}-1)

which is:

{"1    ";"4    ";"9";"10   ";"13   ";"21"}+{0,1,2,3}

will become:

{1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24}

Each of the six rows in this matrix is the result of adding 0, 1, 2 and 3 to the strings in first. For example, the 2nd row in this matrix – {4,5,6,7} – corresponds to the values obtained from adding 0, 1, 2 and 3 to the string “4 “. The 6th row in this matrix – {21,22,23,24} – corresponds to the values obtained from adding 0, 1, 2 and 3 to the string “21”.

You can probably see that we have, in essence, our desired list of returns. The only slight issue is that our array:

{1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24}

consists of not only our desired values, but also some which we do not want to return. Of course, this was something which had to be accepted. The reason for which Bill devised a construction to generate the value of 4 previously was to ensure that we added on a sufficient number of integers such that all values from the string with the largest expected number of returns, i.e. “13-16”, would be accounted for.

The only downside to this approach is that it meant adding this array of 4 values – {0,1,2,3} – to every one of the strings in first, and this resulted in the 6-row-by-4-column matrix containing some extraneous, undesired values. The sixth row in that matrix – {21,22,23,24} – contains three values (22, 23 and 24) which do not form part of our original list in A1:A6 and which we do not wish to return. Similarly for the the fourth row – {10,11,12,13} – which contains the two additional, unwanted values of 12 and 13.

Bill resolved this issue with an IF clause which, on the face of it, appears quite straightforward, simply comparing each of the values obtained above against the array last, which remember is an array consisting of, effectively, the “upper bounds” on each of the strings in A1:A6. If, for example, we can compare the values from the fourth row in the above matrix, i.e. {10,11,12,13} against the upper bound for our fourth row, i.e. 11, the fourth element in last, then we have the means with which we can then reject the 12 and 13 from that array, keeping only our desired 10 and 11.

In fact, constructing this comparison is not as straightforward as it might at first seem. Indeed, it requires a knowledge of the subtleties of array manipulation, and Bill is to be commended for having demonstrated such a knowledge. To explain, on seeing that:

IF(first+TRANSPOSE(ROW(INDIRECT("1:"&MAX(last-first)+1))-1)>--last,""

becomes:

IF({1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24}>{2;6;9;11;16;21},""

readers who are not too familiar with the means by which Excel resolves arrays of differing dimensions might at first sight be perplexed as to how we can perform an IF statement in which the two arrays being compared are of such evident dissimilarity.

But the way in which this comparison is resolved is in essence no different to the way in which the addition we saw previously was resolved, in which you’ll recall that the two arrays being added differed both in their size and displacement. The fact that the operation was addition in that case and a comparison (greater than) here is irrelevant: Excel resolves the two according to the same logic.

Effectively, when we perform the comparison:

IF({1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24}>{2;6;9;11;16;21}

Excel interprets this as an array of comparisons, in which the comparisons are made between elements from each array having the same row value.

For example, the values in the 2nd row of the first, larger array (which recall is a 6-row-by-4-column array), i.e. {4,5,6,7}, will each be tested against the condition as to whether they are greater than the value from the 2nd row of the second array, i.e. 6. Similarly, the values in the fifth row of the large array, i.e. {13,14,15,16} will each be compared to the value in the fifth row of the second array, i.e. 16. And so on.

In essence, then:

IF({1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24}>{2;6;9;11;16;21}

will resolve to:

IF({FALSE,FALSE,TRUE,TRUE;FALSE,FALSE,FALSE,TRUE;FALSE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE;FALSE,TRUE,TRUE,TRUE}

(Readers may wish to verify some of these values against the logic I outlined above.)

Resolving our IF clause then means that:

IF({FALSE,FALSE,TRUE,TRUE;FALSE,FALSE,FALSE,TRUE;FALSE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE;FALSE,TRUE,TRUE,TRUE},"",{1,2,3,4;4,5,6,7;9,10,11,12;10,11,12,13;13,14,15,16;21,22,23,24})

becomes:

{1,2,"","";4,5,6,"";9,"","","";10,11,"","";13,14,15,16;21,"","",""}

and we can now see that this array, apart from a few blanks, consists of precisely those values we wish to return.

I will leave it to the reader to complete the resolution of Bill’s construction, which consists of passing this array to SMALL, the k parameter for which is obtained via a simple ROWS construction.

An impressive piece of work. Thanks once more to Bill. Another challenge to follow shortly. Watch this space!

3 comments

  1. What do you think about

    =TRIM(LEFT(SUBSTITUTE(Sheet1!$A$1:$A$6,"-",REPT(" ",LEN(Sheet1!$A$1:$A$6))),LEN(Sheet1!$A$1:$A$6)))

    as Named Range “First” and

    =SUBSTITUTE(SUBSTITUTE($A$1:$A$6,First,""),"-","")

    as Last?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s