Counting Rows Where Condition Is Met In At Least One Column 42

In this post I would like to present a solution to the situation in which we wish to count the number of rows for which a stipulated condition is met in at least one of several columns.

To illustrate what is meant by this, consider the extract below:

Counting Rows Where At Least One Condition Is Met

which details levels of scrap nickel exports for various countries and for various years (you can download the workbook here).

First, let’s suppose that we wish to determine the number of countries whose 2004 figure was greater than or equal to 1000. I imagine all readers will be perfectly capable of achieving this with a simple construction, viz:

=COUNTIF(B2:B14,">=1000")

or:

=SUMPRODUCT(N(B2:B14>=1000))

We might also wish to consider e.g. how many countries’ export total was greater than this threshold for both 2004 and 2005, for which we would use:

=COUNTIFS(B2:B14,">=1000",C2:C14,">=1000")

or:

=SUMPRODUCT(N(B2:B14>=1000),N(C2:C14>=1000))

But what if we wish to count the number of countries for which at least one of their 2004 and 2005 figures satisfies this criterion?

I have highlighted the rows which meet this new specification in the below:

Counting Rows Where At Least One Condition Is Met 2

(In fact, Italy is the only example here for which that threshold was not achieved in both years.)

Clearly a “standard” COUNTIF(S) construction will not suffice here, since we must be sure to not double-count. Indeed, most people tend to switch to SUMPRODUCT in this situation, viz:

=SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)>0))

However, we can, if we choose, construct a solution using COUNTIFS, which may well be worthwhile in certain circumstances given the advantages of that function over SUMPRODUCT (more efficient in general, ability to reference entire columns without detriment).

Of course, this set-up is only valid when considering a mere two columns for the criteria, though that is perhaps a common-enough occurrence to justify such a solution. In any case, as the number of columns to be considered increases, as does the inappropriateness of both a COUNTIFS solution and the SUMPRODUCT construction given above. Indeed, there we must turn to our old friend MMULT for help, of which more presently.

Anyway, back to the case for just two columns, then. Here it will help to reiterate what we are trying to achieve, i.e. count the number of countries for which at least one of their 2004 and 2005 figures satisfies this criterion.

If we think about it, then, put another way, this effectively means considering the three mutually-exclusive cases:

1) The 2004 figure is >=1000 and the 2005 figure is <1000
2) The 2004 figure is <1000 and the 2005 figure is >=1000
3) The 2004 figure is >=1000 and the 2005 figure is >=1000

and then summing the results of each.

Or, translated into Excel:

=COUNTIFS(B2:B14,">=1000",C2:C14,">=1000")+COUNTIFS(B2:B14,">=1000",C2:C14,"<1000")+COUNTIFS(B2:B14,"<1000",C2:C14,">=1000")

Neither particularly elegant nor concise. However, when we realise that we can abbreviate this as:

=SUM(COUNTIFS(B2:B14,{">=",">=","<"}&1000,C2:C14,{">=","<",">="}&1000))

then things are not so bad, and we have succeeded in achieving a COUNTIFS-based solution to our problem in place of the ubiquitously-recommended SUMPRODUCT set-up.

Let’s now consider the situation in which we wish to derive this result, but where the number of columns to be considered is not just two, but several, potentially many. For example, let’s say we want to determine the number of countries for which at least one of their yearly figures from 2004 to 2012 is greater than or equal to 1000.

We can see from the below screenshot that the result here should be 11, since for all but two countries – Poland and Turkey – this criterion is satisfied.

Counting Rows Where At Least One Condition Is Met 3

Here we simply cannot construct a COUNTIFS construction along the lines of the above, and the equivalent SUMPRODUCT solution:

=SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)+(D2:D14>=1000)+(E2:E14>=1000)+(F2:F14>=1000)+(G2:G14>=1000)+(H2:H14>=1000)+(I2:I14>=1000)+(J2:J14>=1000)>0))

although correct, is certainly starting to look unwieldy (and would be even more so if the number of columns to be considered was not 9 but, say, 20 or 30).

Fortunately, since our range of columns is contiguous, we can query the entire range (B2:J14) in a single expression, provided that we then manipulate the resulting array appropriately.

The required solution here is the following array formula**:

=SUM(N(MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))>0))

How does it work?

Let’s first look at the array resulting from our comparison as to whether each of the entries within that array is greater than or equal to 1000 or not, i.e. that containing the Boolean results of:

B2:J14>1000

which, for our example, will be:

{TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE}

As always, coercing these Booleans to their numerical equivalents will help visualize matters, such that:

N(B2:J14>1000)

gives:

{1,1,1,1,1,1,1,1,1;0,0,1,1,1,1,1,1,1;0,0,1,1,0,1,1,1,1;0,0,0,0,0,1,1,1,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,1,1;0,1,0,0,0,0,1,0,1;1,1,1,1,1,1,1,1,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,0,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1}

Now, in order to calculate the number of 1s within each row, we use MMULT. And, since the above array (a 13-row-by-9-column array) contains 9 columns, the matrix with which we form the product must consist of this number of rows (for readers who weren’t already aware, this is a condition for matrix multiplication, i.e. that the number of columns in the first matrix be equal to the number of rows in the second). And naturally each of these 9 entries must be unity.

Hence, we form, as our second matrix:

TRANSPOSE(COLUMN(B2:J14)^0)

which works since:

COLUMN(B2:J14)

will give us an array consisting of a number of elements equivalent to the number of columns within our range, i.e. 9, our construction becoming:

TRANSPOSE({2,3,4,5,6,7,8,9,10}^0)

Raising to the power of 0 is a nice trick which ensures that our array consists of nothing but 1s, since, for those who didn’t know, any non-zero numerical value raised to the power of zero is unity.

We thus have:

TRANSPOSE({1,1,1,1,1,1,1,1,1})

We must now ensure that this array of 9 1s is one consisting of 9 rows, not 9 columns, so that it is of a valid form to pass to MMULT.

And this we achieve of course by transposing it, so that the above becomes:

{1;1;1;1;1;1;1;1;1}

We now have our two arrays to pass to MMULT, so that:

MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))

has now become:

MMULT({1,1,1,1,1,1,1,1,1;0,0,1,1,1,1,1,1,1;0,0,1,1,0,1,1,1,1;0,0,0,0,0,1,1,1,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,1,1;0,1,0,0,0,0,1,0,1;1,1,1,1,1,1,1,1,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,0,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1},{1;1;1;1;1;1;1;1;1})

which is:

{9;7;6;3;9;9;3;9;0;9;8;0;9}

Readers may wish to confirm that the values in this array are indeed precisely the number of entries within each of the rows in our range which are >=1000 (the 3 in position 4, for example, corresponding to the fact that Denmark achieved that threshold in just 3 years: 2009, 2010 and 2011).

We then query this array as to which elements are at least one or greater, and then sum the results, such that:

=SUM(N(MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))>0))

which is:

=SUM(N({9;7;6;3;9;9;3;9;0;9;8;0;9}>0))

becomes:

=SUM(N({TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE}))

which is:

=SUM({1;1;1;1;1;1;1;1;0;1;1;0;1})

i.e. 11, as required.

Before concluding this post, it may be worth pointing out a slight variation on this solution which avoids the need for array-entry, viz:

=SUM(N(MMULT(N(B2:J14>1000),ROW(INDEX(A:A,1):INDEX(A:A,COLUMNS(B2:J14)))^0)>0))

which readers who have a particular aversion to CSE formulas may wish to investigate further.

Another post to follow shortly. Watch this space!

42 comments

  1. Why not sum the result of an AND() or OR() boolean of column results

    as in: sum(and(B2:B14>1000,C2:C14,<,,= and such) and value outside the formula for interactivity.

  2. @ikkeman

    Sorry – not sure I understand what you mean here, and it isn’t clear to me what the formula you posted is in its entirety and where it should be placed. Perhaps you can clarify?

    I’m also not quite sure what you mean by “for interactivity”?

    Regards

  3. Just for fun:

    =SUM(N(SUBTOTAL(4,OFFSET(B2:J2,ROW(B2:B14)-ROW(B2),0))>1000))
  4. Or:

    =SUM(N(SUBTOTAL(4,OFFSET(B2:J2,N(INDEX(ROW(B2:B14)-ROW(B2),)),0))>1000))

    without CSE

  5. The MMULT with CSE example shown here is used often in Google Sheets. However, there are also two alternatives for this specific example.

    The first alternative uses the COUNTUNIQUE function, which counts the number of unique values. It’s perfect for this example.

    =ArrayFormula(countunique(if(B2:J14>1000,row(2:14),)))

    Note: ArrayFormula() in Google Sheets is similar to CSE in Excel (you press Control+Shift+Enter for it to appear).

    The second alternative uses QUERY and COUNTIF.

    =ArrayFormula(countif(query(transpose(B2:J14>1000),,50000),"*TRUE*"))

    The advantage of using QUERY over MMULT in a situation like this is that you only have to reference the range a single time.

    See the following spreadsheet with the formulas:

    https://docs.google.com/spreadsheets/d/1vXq_HlhYsOcJBQ2OUb7DtIDT1TOW8XAcCaYePQ8i0MA/edit?usp=sharing

  6. Mea Culpa, the booleans won’t work, since they return a single value.

    By interactivity I mean to get the hard coded values (1000 in your example) out of the formula.

    type 1000 in M2 and change your first formula (L2) to

    =SUMPRODUCT(N((B2:B14>=M2)+(C2:C14>=M2)>0))

    This lets users other than yourself to change the criteria and try some different inputs. Couple that M2 input to your conditional format and you get give additional insight.

    Or you can add a row of required limits per year in B16:J16 and use the formula from L8 slightly modified:

    =SUM(N(MMULT(N(B2:J14>B16:J16),TRANSPOSE(COLUMN(B2:J14)^0))>0))

    As a general rule I try never to put fixed arbitrary values inside formulas

  7. @Alex Groberman

    Many thanks. Yes – the SUBTOTAL/OFFSET combination is of course an alternative and perfectly valid set-up here. I just tend to avoid it if I can due to the volatility.

    Regards

  8. @isai

    Many thanks as always.

    I like the look of this countunique function, I have to say. Tell me – is there also a function which will return an array consisting of those unique values?

    This is something which, in general, requires a combination of several functions in Excel. For example, given the array:

    {1;2;2;3;3;3}

    we might wish to generate the array:

    {1;2;3}

    which we could do in several ways, e.g.:

    MODE.MULT(IF({1,1},IF(FREQUENCY(A1:A6,A1:A6),A1:A6)))

    Does Google Sheets perhaps have an in-built function for such a task?

    Regards

  9. @ikkeman

    Yes. You are correct that taking such values outside the formula is always to be preferred. I was merely wishing to demonstrate the techniques here, though of course it was still rather lazy to leave this value within the formula. 🙂

    Many thanks.

  10. Hi XOR LX,

    It is also possible that,

    =SUM(N(MMULT(N(B2:J14>1000),N(ROW(1:9)>0))>0))

    Another option,like isai ,

    =COUNTDIFF(IF(B2:J14>1000,ROW(2:14)),,FALSE)

    Regards

  11. @David

    Perhaps, though this is not a very dynamic set-up, as the part:

    ROW(1:9)

    is static and would require manually amending should the number of the columns within the range change.

    Dynamism and flexibility are two things which I consider important features of any solution.

    Regards

  12. Hi XOR LX,

    The UNIQUE function returns unique rows based on a range/array, so

    =unique({1;2;2;3;3;3})

    would generate

    {1;2;3}
  13. @Isai

    But does that function with regards to rows only? What happens if the array passed is two-dimensional?

    Regards

  14. Hi XOR LX,

    If the array passed is two-dimensional, only unique rows are returned.

    =unique({"Isai","Alvarado";"XOR","LX";"Isai","Alvarado";"XOR","LX"})

    (four rows and two columns)

    would output

    {"Isai","Alvarado";"XOR","LX"}

    (two rows and two columns)

    If you wanted a single column of unique values based on an array that is multiple columns wide, taking each individual cell’s value into account, regardless of whether they are in the same row or not, you would have to either vertically merge ranges into a single column first

    =unique({A1:A10;B1:B10;C1:C10;D1:D10})

    or use a formula like

    =ArrayFormula(unique(transpose(split(concatenate(A1:D10&char(9)),char(9)))))

    Note: In Google Sheets, the CONCATENATE function can concatenate an entire range together at once.

  15. After shortest formula #4, I’ve revisited this problem:

    =SUM(N(FREQUENCY(IFERROR((B2:J14>1000)^0*ROW(A1:A13),""),ROW(A1:A13))>0))

    Whaddaya think?

    -Alex

  16. In fact, for the “row” part, the actual buckets are arbitrary since they just need to be distinct, so a slightly prettier version would be:

    =SUM(N(FREQUENCY(IFERROR((B2:J14>1000)^0*ROW(A2:A14),""),ROW(A2:A14))>0))
  17. Hi! Very interesting solution, thanks!

    You do not need to ^0 there in your formula. When you made N(B2:J14>1000) there are already array of 1s and 0s, why to convert columns numbers to 1s? We need second part of MMULT only for collapse matrix to one column, so, if in first part there are 0s and 1s, doesnt matter what numbers exactly in second – we’ll get 0s or numbers.

    But, we can use ^0 in other place with small COUNT trick:

    =COUNT(MMULT(N(B2:J14>1000);TRANSPOSE(COLUMN(B2:J14)))^0)
  18. I came across your site back in January and subscribed immediately. Thank you very much for all the great content. Even if I am not able to participate in the weekly challenges I learn a lot from your subsequent explanations.

    I would very much like more articles like the above on the use of MMULT and TRANSPOSE as I find it a bit difficult to get useful, educational information on these functions elsewhere.

  19. @Alex, inspired by your formula:

    =SUM(N(FREQUENCY((B2:J14>1000)*ROW(A2:A14);ROW(A1:A13))>0))-1
  20. @Alex

    Nice alternative! I generally prefer to avoid IFERROR set-ups if possible, though it’s still nice to see a completely different approach to the problem using FREQUENCY.

    Cheers

  21. @Maxim

    Good point re my unnecessary use of ^0 here – many thanks. I’m so used to including it in these types of constructions that I completely overlooked that it’s not at all necessary here.

    Re your revision to Alex’s FREQUENCY solution, I’m not sure that the logic here is completely correct. What happens, for example, if every single entry in the range is, say, 2000?

    Thanks a lot

  22. @Hans Knudsen

    Welcome to the site! And thank you for your kind words.

    And don’t worry – those two functions feature in abundance here, so you’re certain to see more uses of them in future posts! I may also attempt to post a ‘guide-to’-type post on MMULT at some point, as you’re not the first person to have shown interest in an explanation of that function.

    Many thanks and look forward to hearing more from you in future.

    Regards

  23. OK, I see that I’ve missed something…

    There is another one suggestion (not mine, from one of my friends):

    =COUNT(1/MMULT(-(B2:J14>1000);-TRANSPOSE(B1:J1>0)))

    What do you think? It works fine if header row is numeric or string, just not empty ))

  24. @Maxim

    It’s certainly a possibility, yes, and it might be something that I’d consider using myself if I was certain that it would always be rigorous for the given dataset (which, as you say, it should be here). However, I’d perhaps also be hesitant to recommend it as part of a generic solution, given that it’s not completely foolproof.

    Certainly one to remember for the next Shortest Formula Challenge, though!

    Thanks a lot

  25. Hi XOR LX,

    Is there a way to throw in … an additional condition …unique strings in column A …???

    Cheers

    James

  26. @James

    I’m sure there is!

    But what do you mean by “unique” here? That only strings which occur precisely once within column A are considered?

    Cheers

  27. @ XOR LX,

    Yes, absolutely … you got it right …

    Based on your sample worksheet, should France and Italy be listed twice within column A … they should only be counted once…

    Cheers

  28. @ XOR LX,

    Should France sastisfy the preset condition once or more times … France should only be counted once … as a distinct country …

  29. @James

    But that’s what I don’t understand. I assume you’re talking about a formula which considers some of the data from columns B to J in the above table?

    So if France occupied, say, two rows in that table, which row’s data do I ignore when it comes to the figures in columns B to J?

    Or am I misunderstanding you here?

    Cheers

  30. @ XOR LX,

    Sorry … I am not expressing myself very clearly …

    There is no row to be ignored … but each country should only be counted once … even if condition is fulfilled in several rows …

    Hope this clarifies …

    Cheers

  31. @James

    I’m completely lost!

    How can no rows be ignored if each country is only to be counted once?

    Can you give me a small hypothetical example, perhaps using the data table in this post and e.g. imagining that “France” occupied 2 rows?

    Cheers

  32. @ XOR LX,

    Let me try to re-phrase it in a much better English :

    “If a given country appears more than once, and that for more than one of those occurrences the conditions are met, then it should nevertheless be counted as one”

    Hope this clarifies …

    Cheers

  33. @James

    Not at all. You explained it well. It’s simply my inability to understand simple statements!

    Interesting query, and I don’t believe there’s a more efficient set-up than:

    =SUM(N(FREQUENCY(IF(MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14))),MATCH(A2:A14,A2:A14,0)),ROW(A2:A14)-MIN(ROW(A2:A14))+1)>0))

    though I may be missing something!

    Thanks a lot. Thought-provoking question!

    Cheers

  34. @ XOR LX

    Absolutely Fabulous …!!!

    As usual, an incredibly brilliant solution …!!!

    Many many thanks

    Cheers

  35. Hi XOR LX,

    I am trying to understand the COUNTIF and SUMPRODUCT functions in depth. Please can you let me know why you used N? as in your sumproduct formula

    =SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)>0))
    
  36. @Baber Beg

    Sure, though I should first point out that N is by no means the only choice we have here; any other function/mathematical operation which converts Boolean TRUE/FALSE results into their numerical equivalents (TRUE=1, FALSE=0) would also suffice.

    The point is that, in a construction such as:

    =SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)>0))

    the two queries, i.e.:

    B2:B14>=1000

    and:

    C2:C14>=1000

    will resolve, for each cell being queried, to either TRUE or FALSE, and so we will have (using the data provided):

    =SUMPRODUCT(N(({TRUE;FALSE;FALSE;FALSE;TRUE;TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE})+({TRUE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE})>0))

    As you may know, when we then add the corresponding elements in these two arrays, the Boolean TRUE/FALSEs are converted into their numerical equivalents, such that:

    TRUE+TRUE=1+1=2
    FALSE+FALSE=0+0=0
    FALSE+TRUE=0+1=1
    TRUE+FALSE=1+0=1

    (This is the point of the construction: any rows for which at least one of the entries from column B and column C satisfies our condition(s) – i.e. the result of the above is 1 or 2 – will be considered.)

    And so the above formula resolves to:

    =SUMPRODUCT(N({2;0;0;0;2;2;1;2;0;2;2;0;2}>0))

    As mentioned, we are here interested only in those rows for which the entry in the above array is a 1 or a 2 (or equivalently, non-zero), and so the test for >0 resolves to:

    =SUMPRODUCT(N({TRUE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE}))

    Finally to your query, then. Unfortunately, without a little further manipulation, SUMPRODUCT is not able to add up the TRUEs and FALSEs in the above to give our desired count. Instead we must first convert them to their numerical equivalents.

    And one way to do this (amongst many) is to use the N function, since:

    =N(FALSE)

    is equal to 0, and:

    =N(TRUE)

    is equal to 1.

    And so, finally:

    =SUMPRODUCT(N({TRUE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE}))

    becomes:

    =SUMPRODUCT({1;0;0;0;1;1;1;1;0;1;1;0;1})

    and now we can add up those numerics, giving 8, as desired.

    As I said, N is not the only choice here. In fact, to be honest I rarely use it myself for Boolean conversions, instead using a suitable mathematical operation. Hence, we could also use any of:

    =SUMPRODUCT(0+((B2:B14>=1000)+(C2:C14>=1000)>0))
    =SUMPRODUCT(1*((B2:B14>=1000)+(C2:C14>=1000)>0))
    =SUMPRODUCT(((B2:B14>=1000)+(C2:C14>=1000)>0)/1)

    the point being that in all three cases a mathematical operation (addition of zero, multiplication by one and division by one) is employed which simultaneously converts Boolean TRUE/FALSE values to their numerical equivalents whilst also leaving those mathematical values unchanged.

    Hope that helps!

  37. This is crazy… how did I not know you could do this…. ?!?!? 🙂

    =SUM(COUNTIFS(B2:B14,{">=",">=","=","="}&1000))

    Not documented on the MS site AFAIK…. Great tip.

    Ger

  38. Pingback: #Excel Super Links #94 – shared by David Hager | Excel For You

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s