Shortest Formula Challenge #2: Results and Discussion 5

Last week I set readers the challenge which can be found here.

5 correct solutions received, courtesy of John Jairo V, GreasySpot, Bill Szysz, James and ChrisBM (who actually missed off a final parenthesis in his formula, though I will be lenient here!). So well done to all!

As to whose was the shortest, excluding the offering from Isai Alvarado, who beat everyone with his 51-character (excluding the equals sign) Google Sheets construction (well done Isai!), that accolade is shared by John and Bill, both of whose solutions came in at 56 characters, which is quite a remarkable coincidence when you consider that each used a completely different construction! So congratulations to John and Bill!

Those two solutions (both requiring CSE) are:

John:

=COUNT(MATCH(A3:A12&B3:B12&C3:C12,E3:E12&F3:F12&G3:G12,))

Bill:

=SUM(COUNTIFS(A3:A12,E3:E12,B3:B12,F3:F12,C3:C12,G3:G12))

Not that I’m going to, but if I was forced to pick an outright winner then I guess it would have to be Bill’s: not only is it, in my opinion, the more natural of the two set-ups, but, technically, by not including some form of separator as part of the concatenation of the column entries, it’s theoretically possible that John’s MATCH construction could give incorrect results; for example, if we had entries in our results table of both FranceParisEuro and FrancEpariseUro (not that there exist any countries, cities or currencies named thus!).

However, certainly in this case I imagine the probability of such an occurrence to be extremely close to zero. Nevertheless, it is worth making the general point that our data may not always be such that the chance of there being a “coincidence” such as the above is insignificant. As such, to be rigorous, we should in general prefer something like:

=COUNT(MATCH(A3:A12&"|"&B3:B12&"|"&C3:C12,E3:E12&"|"&F3:F12&"|"&G3:G12,))

or some other suitable separator in place of “|”.

How do they work?

Let’s look at each of these solutions in turn, beginning with John’s, i.e.:

=COUNT(MATCH(A3:A12&B3:B12&C3:C12,E3:E12&F3:F12&G3:G12,))

On the face of it this is quite a simple, elegant solution. But of course that does not mean it is at all easy to find, and, for those seeing it for the first time, it is certainly a technique worth remembering.

What both these solutions have in common is that they offer us a means to obtain the desired count without, for example, using a series of VLOOKUPs or INDEX/MATCH combinations, as might be our first thoughts when confronted with such a challenge.

Here, by first forming concatenated strings of the entries in both our tables, we give ourselves the means by which we can perform a single series of matches over ten strings (subject to the caveat re there being no examples as I gave above, of course). Hence, resolving these concatenations gives:

=COUNT(MATCH({"FranceParisManat";"SerbiaBelgradeDinar";"LithuaniaVilniusZłoty";"HungaryBudapestForint";"AndorraChisinauManat";"UkraineSofiaHryvnia";"ArmeniaYerevanDram";"RomaniaRomeLeu";"BulgariaSofiaLev";"CroatiaBerlinLira"},{"AndorraAndorra la VellaEuro";"ArmeniaYerevanDram";"BulgariaSofiaLev";"CroatiaZagrebKuna";"FranceParisEuro";"HungaryBudapestForint";"LithuaniaVilniusLitas";"RomaniaBucharestLeu";"SerbiaBelgradeDinar";"UkraineKievHryvnia"},))

which is:

=COUNT({#N/A;9;#N/A;6;#N/A;#N/A;2;#N/A;3;#N/A})

Note here that – and this is where John managed to better the solution given by GreasySpot by a single character – if the match_type parameter is omitted, Excel interprets this as being identical to the equivalent construction with a match_type parameter of zero.

In fact, wherever appropriate, Excel will interpret an omitted parameter as zero. And this feature is not exclusive to the parameters of the MATCH function. For example, we often see this technique employed in constructions involving INDEX, such as:

=SUM(INDEX(A1:B10,,2))

which is identical to (and which form some prefer):

=SUM(INDEX(A1:B10,0,2))

(See here for more on how INDEX interprets a row_num or column_num parameter of zero.)

Note that I said “wherever appropriate”: an omitted parameter is not always equivalent to a parameter of zero. For example, another formula in which such abbreviations are common is OFFSET, for example:

=OFFSET(A1,,1,,)

though it should not be thought that this is equivalent to:

=OFFSET(A1,0,1,0,0)

which will in fact result in a #REF! error, since a value of zero for the height or width parameter is, understandably, not permitted. Here, then:

=OFFSET(A1,,1,,)

is in fact equivalent to:

=OFFSET(A1,0,1,1,1)

which serves to highlight the fact that the statement that an omitted parameter is identical to a parameter of zero is not to be taken as universal.

We can even experiment with such cases as for:

=MATCH(0,A1:A10,0)

which gives the relative position of the first occurrence of a zero within the range A1:A10. Technically, this could be replaced with:

=MATCH(,A1:A10,)

though – unless we’re involved in some sort of Shortest Formula Challenge, of course 🙂 – most would argue that the small abbreviation achieved here is outweighed by the loss in terms of comprehensibility, not to mention the potential for error inherent in using such shortcuts. Likewise for:

=MIN(,1,2,3)

which will return 0, though again, although of theoretical interest there is little in terms of practicality to recommend the use of such a syntax.

Returning to our deconstruction, we can take advantage of the fact that the COUNT function ignores errors within the range passed to it (although James’ solution using ISERROR is perfectly correct, this property of COUNT means that, in this case, it is not strictly necessary to include an error clause).

And so:

=COUNT({#N/A;9;#N/A;6;#N/A;#N/A;2;#N/A;3;#N/A})

is simply 4, as desired.

Let’s now take a look at the alternative, slightly more rigorous solution given by Bill, which recall is:

=SUM(COUNTIFS(A3:A12,E3:E12,B3:B12,F3:F12,C3:C12,G3:G12))

I should first say that I find it a touch surprising that Bill’s was the only one of five solutions which employed COUNTIFS. Indeed, on the face of it, what other function is more naturally suited to such a task?

I think it’s perhaps indicative of the majority of Excel users’ understanding of the group of functions comprising COUNTIF(S)/SUMIF(S) that four out of five people chose what is, arguably, a more complex and less intuitive construction, one which necessitates first generating concatenated strings and then passing these to a set-up using COUNT and MATCH.

And yet this alternative both requires no prior manipulation and also consists of nothing more than a straightforward COUNTIFS wrapped in SUM.

Perhaps the difficulty arises not so much from a lack of understanding of how these functions operate in their “normal” usage (i.e. non-array, with single values for the criteria parameters), but from a failure to appreciate how the output of such functions is affected when we pass arrays (in this case ranges consisting of multiple cells) to one or more of the criteria parameters and commit the formula with CSE.

Perhaps it would help if I first express, in words, what the construction:

=SUM(COUNTIFS(A3:A12,E3:E12,B3:B12,F3:F12,C3:C12,G3:G12))

when array-entered, is actually calculating.

The point to note is that, when one of the criteria parameters consists of more than one element, Excel – when properly coerced, i.e. with some external function acting on the COUNTIFS and with CSE if appropriate – will calculate the COUNTIFS separately over each element within that array.

What’s more, if, as here, we have more than one criteria parameter consisting of multiple entries (E3:E12, F3:F12 and G3:G12), Excel again performs a series of separate COUNTIFS calculations, the set of criteria parameters for each being equivalent to corresponding elements (i.e. from the same row) from our criteria parameters.

To put it another way, the above expression is identical to performing each of the following ten separate calculations:

=COUNTIFS(A3:A12,E3,B3:B12,F3,C3:C12,G3)
=COUNTIFS(A3:A12,E4,B3:B12,F4,C3:C12,G4)
=COUNTIFS(A3:A12,E5,B3:B12,F5,C3:C12,G5)

=COUNTIFS(A3:A12,E12,B3:B12,F12,C3:C12,G12)

and then summing the results.

Put this way, it’s not difficult to see why this construction gives us the desired result, since the above are evidently equal to, respectively: 0 (the number of rows in our answer table for which the entry in column A is “Andorra” and the corresponding entry in column B is “Andorra la Vella” and the corresponding entry in column C is “Euro”), 1 (the number of rows for which column A = “Armenia” and column B = “Yerevan” and column C is “Dram”), 1 (“Bulgaria”/”Sofia”/”Lev”), etc., etc.

All of which is precisely why, if we use the Evaluate Formula tool on:

=SUM(COUNTIFS(A3:A12,E3:E12,B3:B12,F3:F12,C3:C12,G3:G12))

we see that it resolves to:

=SUM({0;1;1;0;0;1;0;0;1;0})

which, again, is 4, as required.

Another challenge to follow shortly. Watch this space!

5 comments

  1. Hello XORLX,

    It appears to me that the formula designed by Bill Szysz … is 55 characters … 1 character shorter …and, in addition, his formula does not require CSE …

    Why is it he is not picked as the Winner …???

  2. @XOR LX

    From what I gathered, the formula goes:

    =SUM(--(MMULT(--(COUNTIFS(A3:C12,E3:G12)>0),{1;1;1})=3))

    and the len() is 55 … is it not the way to compute it …?

  3. But as I explained to Bill in comments to that post, that formula is not valid. The fact that it gives the correct answer is pure coincidence.

  4. @XOR LX

    Thanks …

    I will go back to your comments … to understand why it is a pure coincidence …

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s