Shortest Formula Challenge #6: Results and Discussion Reply

A couple of weeks ago I set readers the challenge which can be found here.

Once again, some truly excellent responses and a noticeably collaborative attempt towards obtaining our final, minimal-length solution. So many thanks to all who contributed: Alex, John Jairo, Lori, Snakehips and Will!

And that solution, at 108 characters, is:

=LOOKUP(,0/FREQUENCY(0,2^-(LEN(Q5:Q77)>4)/MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)),Q5:Q6)

How does it work?

Let’s break this rather unusual-looking beast down bit-by-bit, and we’ll begin with this part:

MID(Q5:Q77,COLUMN(A1:G1),1)

Before beginning the dissection, the first thing I should point out is that certain liberties have been taken in the process of this challenge towards achieving the goal of obtaining the shortest possible solution. And this means that, although the resulting solution is perfectly sound, it may nevertheless contain constructions which, for the sake of rigour and clarity, we would (and indeed should) normally avoid.

Here, for example (though by no means our worst “culprit”), the part:

COLUMN(A1:G1)

although quite valid, should, by virtue of its being static and of relatively short length, be substituted with:

{1,2,3,4,5,6,7}

though of course here we are swayed by the fact that the former comprises fewer characters than the latter.

In any case, we now see that:

MID(Q5:Q77,COLUMN(A1:G1),1)

becomes:

MID(Q5:Q77,{1,2,3,4,5,6,7},1)

and, at the risk of repeating (at least to my regular readers) an oft-cited statement of mine, here again can be seen the importance of ensuring that the arrays being passed to MID for its text and start_num parameters are orthogonal. Since the range Q5:Q77 is clearly a column-vector, we must necessarily construct the second parameter as a row-vector.

Hence the need for:

{1,2,3,4,5,6,7}

or, analogously:

COLUMN(A1:G1)

and not:

{1;2;3;4;5;6;7}

an equivalent (among several) of which is:

ROW(A1:A7)

though of course:

=TRANSPOSE(COLUMN(A1:G1))

would amount to the same thing, albeit being somewhat unnecessary here.

Having ensured the orthogonality of these two arrays, we can be satisfied that the result of our MID construction, i.e.:

MID(Q5:Q77,COLUMN(A1:G1),1)

will comprise a 73-row-by-7-column matrix, the 7 entries in each of those 73 rows being the individual letters which make up the corresponding entry from the list in Q5:Q77.

In full, for those interested, it will be:

{"A","R","G","O","T","","";"G","A","T","O","R","","";"G","R","O","A","T","","";"T","A","R","D","O","","";"D","A","G","O","","","";"D","A","R","T","","","";"D","A","T","O","","","";"D","O","A","T","","","";"D","R","A","G","","","";"D","R","A","T","","","";"G","O","A","D","","","";"G","O","A","T","","","";"G","R","A","D","","","";"G","R","A","T","","","";"G","R","O","T","","","";"O","R","A","D","","","";"R","A","T","O","","","";"R","O","A","D","","","";"R","O","T","A","","","";"T","A","R","O","","","";"T","O","A","D","","","";"T","O","G","A","","","";"T","O","R","A","","","";"T","R","A","D","","","";"T","R","O","D","","","";"T","Z","A","R","","","";"A","D","O","","","","";"A","D","Z","","","","";"A","G","O","","","","";"A","R","T","","","","";"A","Z","O","","","","";"D","A","G","","","","";"D","O","G","","","","";"D","O","R","","","","";"D","O","T","","","","";"G","A","D","","","","";"G","A","R","","","","";"G","A","T","","","","";"G","O","A","","","","";"G","O","D","","","","";"G","O","R","","","","";"G","O","T","","","","";"O","A","R","","","","";"O","A","T","","","","";"O","D","A","","","","";"O","R","A","","","","";"O","R","T","","","","";"R","A","D","","","","";"R","A","G","","","","";"R","A","T","","","","";"R","O","D","","","","";"R","O","T","","","","";"T","A","D","","","","";"T","A","G","","","","";"T","A","O","","","","";"T","A","R","","","","";"T","O","D","","","","";"T","O","G","","","","";"T","O","R","","","","";"Z","A","G","","","","";"Z","O","A","","","","";"A","D","","","","","";"A","G","","","","","";"A","R","","","","","";"A","T","","","","","";"D","A","","","","","";"D","O","","","","","";"G","O","","","","","";"O","D","","","","","";"O","R","","","","","";"T","A","","","","","";"T","O","","","","","";"Z","A","","","","",""}

We now determine respective scores for each of the entries within this large matrix using the table provided.

The most obvious (and arguably natural) choice here would be to use LOOKUP (the table in question comprises a lookup_vector in ascending alphabetical order, so the use of this function is appropriate here), as indeed was the choice of some readers.

Again, though, this being a shortest formula challenge, it is often worth seeking – hopefully shorter – alternatives, one of which, in this case, is SUMIF.

It might at first sight appear a touch odd to employ such a function; after all, each letter appears precisely once within the table, so summing all (i.e. one!) values pertaining to a given letter could be construed as overkill.

Once we get over our initial distaste for such an ‘incorrect’ use of SUMIF, however, we realise that (provided of course we are returning numerics) it is actually a perfectly valid alternative. Indeed, although I haven’t tested this, it may even turn out that the use of SUMIF in this way performs more efficiently than does e.g. LOOKUP.

Anyway, now to our next ‘abuse’ for the sake of formula-brevity, viz:

SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3)

the culprit in which may initially take some time to spot, until you look again and see that the range being passed as SUMIF’s sum_range, i.e. V3, appears to flout the requirement that the dimensions of the range (here U3:U28) and sum_range be equal.

I won’t go into details in this post, but readers can see here for a full explanation of this property of SUMIF (interestingly not shared by its siblings SUMIFS, COUNTIF and COUNTIFS).

As such, then, and trusting me that:

SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3)

is equivalent to:

SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3:V28)

(apart from the former being volatile, that is), it is not too difficult to see that this will resolve to another 73-row-by-7-column array, though, whereas in the previous step containing individual letters, this time comprising their corresponding values as given by our table.

Now, in order to obtain a total score for each of our 73 words, we pass this array to MMULT, for which, in light of the fact that this first array comprises 7 columns, must naturally involve a second array comprising that number of rows. As such, we require it to be:

{1;1;1;1;1;1;1}

which static array we should – in general and where appropriate – prefer to alternative constructions whose resulting output is that same array.

Here, as we already know, we are seeking to reduce formula-length at all costs. Hence the decision of many (including myself) to instead employ:

ROW(A1:A7)^0

which, I imagine, all of us believed couldn’t be bettered until Lori demonstrated a nice (and shorter by one character) option using indexing, viz:

1^N(+A1:A7)

which Alex then refined even further by taking advantage of our knowledge of the contents of certain cells within the worksheet:

1^V3:V9

which nearly halves the length of the ROW set-up!

Having obtained our necessary second array, we can now calculate:

MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)

which will result in a single column-vector comprising the scores for each of our 73 words in Q5:Q77, i.e.:

{6;6;6;6;6;5;5;5;6;5;6;5;6;5;5;5;4;5;4;4;5;5;4;5;5;13;4;13;4;3;12;5;5;4;4;5;4;4;4;5;4;4;3;3;4;3;3;4;4;3;4;3;4;4;3;3;4;4;3;13;12;3;3;2;2;3;3;3;3;2;2;2;11}

We now introduce a clause which, in line with the stipulations of the challenge, will double the score for any word which happens to cover one of the ‘double-word’ squares. It doesn’t take much effort to see that this will apply to any word of length 5 or greater, so we could construct something like:

1+(LEN(Q5:Q77)>4)

which will result in the necessary array of 1s and 2s and with which we can multiply our above array of word-scores.

But then Lori again demonstrated that, since we will in any case be reciprocating this resulting array in the next step prior to passing to FREQUENCY, a little manipulation here can save us a character or two, and so we construct:

2^-(LEN(Q5:Q77)>4)/MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)

which is:

{0.5;0.5;0.5;0.5;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1;1}/{6;6;6;6;6;5;5;5;6;5;6;5;6;5;5;5;4;5;4;4;5;5;4;5;5;13;4;13;4;3;12;5;5;4;4;5;4;4;4;5;4;4;3;3;4;3;3;4;4;3;4;3;4;4;3;3;4;4;3;13;12;3;3;2;2;3;3;3;3;2;2;2;11}

the point being that the score of any word of more than 4 letters is effectively halved (which, in light of the way in which we are constructing our array to pass to FREQUENCY in order to obtain the maximum, is effectively equivalent to being doubled).

We now have:

LOOKUP(,0/FREQUENCY(0,{0.0833333333333333;0.0833333333333333;0.0833333333333333;0.0833333333333333;0.166666666666667;0.2;0.2;0.2;0.166666666666667;0.2;0.166666666666667;0.2;0.166666666666667;0.2;0.2;0.2;0.25;0.2;0.25;0.25;0.2;0.2;0.25;0.2;0.2;0.0769230769230769;0.25;0.0769230769230769;0.25;0.333333333333333;0.0833333333333333;0.2;0.2;0.25;0.25;0.2;0.25;0.25;0.25;0.2;0.25;0.25;0.333333333333333;0.333333333333333;0.25;0.333333333333333;0.333333333333333;0.25;0.25;0.333333333333333;0.25;0.333333333333333;0.25;0.25;0.333333333333333;0.333333333333333;0.25;0.25;0.333333333333333;0.0769230769230769;0.0833333333333333;0.333333333333333;0.333333333333333;0.5;0.5;0.333333333333333;0.333333333333333;0.333333333333333;0.333333333333333;0.5;0.5;0.5;0.0909090909090909}),Q5:Q6)

which results in “TZAR”, as required (though “ADZ” or “ZAG” are also acceptable returns, scoring equally 13).

Readers are referred to this post for an explanation as to how this LOOKUP/FREQUENCY set-up functions for returning a maximum.

Suffice it to say that the version here incorporates the last of our ‘liberties’ for the sake of formula-brevity, since the range we are passing as LOOKUP’s result_vector, which should of course ‘properly’ be Q5:Q77, is here the rather dubious-looking Q5:Q6:

LOOKUP(,0/FREQUENCY(0,2^-(LEN(Q5:Q77)>4)/MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)),Q5:Q6)

Again, such a practice is not to be recommended outside the scope of such challenges. Nevertheless, our saving of a single character is here perfectly appropriate, the technical justification for which is similar to that given earlier with regard to the reduced range we passed as SUMIF’s sum_range.

One interesting difference is that, whereas a sum_range comprising just a single cell, e.g. V3, will be redimensioned within a SUMIF construction to match the dimensions of the range, to achieve the appropriate redimensioning with LOOKUP it is necessary that the result_vector comprise a range of at least two cells. That is, the following attempt to reduce our formula-length even further:

=LOOKUP(,0/FREQUENCY(0,2^-(LEN(Q5:Q77)>4)/MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)),Q5)

does not give the desired result. In fact, it returns 0.

But why 0? Whence does this value come?

It turns out that this leads to the rather wonderful and esoteric discovery that, if a single cell is passed to LOOKUP for its result_vector, then this array is again redimensioned so that it is of the same size as the lookup_vector, though this time its vector-type is – by default – a row-vector, i.e. irrespective of the vector-type of the lookup_vector.

If readers aren’t quite sure what is meant here, using the workbook provided in the original challenge, enter “XXX” in cell AP5 and evaluate:

=LOOKUP(,0/FREQUENCY(0,2^-(LEN(Q5:Q77)>4)/MMULT(SUMIF(U3:U28,MID(Q5:Q77,COLUMN(A1:G1),1),V3),1^V3:V9)),Q5)

The result? “XXX”!

This time, the single-cell range Q5 has been redimensioned to Q5:CK5, i.e. of the same size as Q5:Q78 (the result of redimensioning Q5:Q6 here), though also orthogonal to it. What a difference a cell makes!

And I say “by default – a row-vector”, since it can readily be shown that it is not the case, as might at first be hypothesized, that the result_vector is always redimensioned such that it is orthogonal to the lookup_vector.

For example, with:

=LOOKUP(A1,A3:A12,B3)

it is indeed true that the result_vector is here orthogonal to the lookup_vector, the above being equivalent to:

=LOOKUP(A1,A3:A12,B3:K3)

though this is as a result of the aforementioned default behaviour, and not due to any implicit vector transformation, as can be seen when we investigate e.g.:

=LOOKUP(A1,B2:K2,B3)

which is not equivalent to:

=LOOKUP(A1,B2:K2,B3:B12)

but rather to:

=LOOKUP(A1,B2:K2,B3:K3)

Strange behaviour indeed! Though certainly worthy of further investigation, and potentially of practical use.

Thanks again to all who contributed. Another challenge to follow shortly. Watch this space!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s