In this post I would like to present a solution to the situation in which we wish to count the number of rows for which a stipulated condition is met in *at least one* of several columns.

To illustrate what is meant by this, consider the extract below:

which details levels of scrap nickel exports for various countries and for various years (you can download the workbook here).

First, let’s suppose that we wish to determine the number of countries whose 2004 figure was greater than or equal to 1000. I imagine all readers will be perfectly capable of achieving this with a simple construction, viz:

=COUNTIF(B2:B14,">=1000")

or:

=SUMPRODUCT(N(B2:B14>=1000))

We might also wish to consider e.g. how many countries’ export total was greater than this threshold for *both* 2004 *and* 2005, for which we would use:

=COUNTIFS(B2:B14,">=1000",C2:C14,">=1000")

or:

=SUMPRODUCT(N(B2:B14>=1000),N(C2:C14>=1000))

But what if we wish to count the number of countries for which *at least one* of their 2004 and 2005 figures satisfies this criterion?

I have highlighted the rows which meet this new specification in the below:

(In fact, Italy is the only example here for which that threshold was not achieved in *both* years.)

Clearly a “standard” COUNTIF(S) construction will not suffice here, since we must be sure to not double-count. Indeed, most people tend to switch to SUMPRODUCT in this situation, viz:

=SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)>0))

However, we can, if we choose, construct a solution using COUNTIFS, which may well be worthwhile in certain circumstances given the advantages of that function over SUMPRODUCT (more efficient in general, ability to reference entire columns without detriment).

Of course, this set-up is only valid when considering a mere two columns for the criteria, though that is perhaps a common-enough occurrence to justify such a solution. In any case, as the number of columns to be considered increases, as does the inappropriateness of both a COUNTIFS solution and the SUMPRODUCT construction given above. Indeed, there we must turn to our old friend MMULT for help, of which more presently.

Anyway, back to the case for just two columns, then. Here it will help to reiterate what we are trying to achieve, i.e. count the number of countries for which *at least one* of their 2004 and 2005 figures satisfies this criterion.

If we think about it, then, put another way, this effectively means considering the three mutually-exclusive cases:

1) The 2004 figure is >=1000 and the 2005 figure is <1000

2) The 2004 figure is <1000 and the 2005 figure is >=1000

3) The 2004 figure is >=1000 and the 2005 figure is >=1000

and then summing the results of each.

Or, translated into Excel:

=COUNTIFS(B2:B14,">=1000",C2:C14,">=1000")+COUNTIFS(B2:B14,">=1000",C2:C14,"<1000")+COUNTIFS(B2:B14,"<1000",C2:C14,">=1000")

Neither particularly elegant nor concise. However, when we realise that we can abbreviate this as:

=SUM(COUNTIFS(B2:B14,{">=",">=","<"}&1000,C2:C14,{">=","<",">="}&1000))

then things are not so bad, and we have succeeded in achieving a COUNTIFS-based solution to our problem in place of the ubiquitously-recommended SUMPRODUCT set-up.

Let’s now consider the situation in which we wish to derive this result, but where the number of columns to be considered is not just two, but several, potentially many. For example, let’s say we want to determine the number of countries for which *at least one* of their yearly figures from 2004 to 2012 is greater than or equal to 1000.

We can see from the below screenshot that the result here should be 11, since for all but two countries – Poland and Turkey – this criterion is satisfied.

Here we simply cannot construct a COUNTIFS construction along the lines of the above, and the equivalent SUMPRODUCT solution:

=SUMPRODUCT(N((B2:B14>=1000)+(C2:C14>=1000)+(D2:D14>=1000)+(E2:E14>=1000)+(F2:F14>=1000)+(G2:G14>=1000)+(H2:H14>=1000)+(I2:I14>=1000)+(J2:J14>=1000)>0))

although correct, is certainly starting to look unwieldy (and would be even more so if the number of columns to be considered was not 9 but, say, 20 or 30).

Fortunately, since our range of columns is contiguous, we can query the entire range (B2:J14) in a single expression, provided that we then manipulate the resulting array appropriately.

The required solution here is the following **array formula****:

=SUM(N(MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))>0))

**How does it work?**

Let’s first look at the array resulting from our comparison as to whether each of the entries within that array is greater than or equal to 1000 or not, i.e. that containing the Boolean results of:

B2:J14>1000

which, for our example, will be:

{TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE;TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE}

As always, coercing these Booleans to their numerical equivalents will help visualize matters, such that:

N(B2:J14>1000)

gives:

{1,1,1,1,1,1,1,1,1;0,0,1,1,1,1,1,1,1;0,0,1,1,0,1,1,1,1;0,0,0,0,0,1,1,1,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,1,1;0,1,0,0,0,0,1,0,1;1,1,1,1,1,1,1,1,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,0,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1}

Now, in order to calculate the number of 1s within each row, we use MMULT. And, since the above array (a 13-row-by-9-column array) contains 9 columns, the matrix with which we form the product must consist of this number of rows (for readers who weren’t already aware, this is a condition for matrix multiplication, i.e. that the number of columns in the first matrix be equal to the number of rows in the second). And naturally each of these 9 entries must be unity.

Hence, we form, as our second matrix:

TRANSPOSE(COLUMN(B2:J14)^0)

which works since:

COLUMN(B2:J14)

will give us an array consisting of a number of elements equivalent to the number of columns within our range, i.e. 9, our construction becoming:

TRANSPOSE({2,3,4,5,6,7,8,9,10}^0)

Raising to the power of 0 is a nice trick which ensures that our array consists of nothing but 1s, since, for those who didn’t know, any non-zero numerical value raised to the power of zero is unity.

We thus have:

TRANSPOSE({1,1,1,1,1,1,1,1,1})

We must now ensure that this array of 9 1s is one consisting of 9 rows, not 9 columns, so that it is of a valid form to pass to MMULT.

And this we achieve of course by transposing it, so that the above becomes:

{1;1;1;1;1;1;1;1;1}

We now have our two arrays to pass to MMULT, so that:

MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))

has now become:

MMULT({1,1,1,1,1,1,1,1,1;0,0,1,1,1,1,1,1,1;0,0,1,1,0,1,1,1,1;0,0,0,0,0,1,1,1,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,1,1;0,1,0,0,0,0,1,0,1;1,1,1,1,1,1,1,1,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1;1,1,1,1,1,1,1,0,1;0,0,0,0,0,0,0,0,0;1,1,1,1,1,1,1,1,1},{1;1;1;1;1;1;1;1;1})

which is:

{9;7;6;3;9;9;3;9;0;9;8;0;9}

Readers may wish to confirm that the values in this array are indeed precisely the number of entries within each of the rows in our range which are >=1000 (the 3 in position 4, for example, corresponding to the fact that Denmark achieved that threshold in just 3 years: 2009, 2010 and 2011).

We then query this array as to which elements are at least one or greater, and then sum the results, such that:

=SUM(N(MMULT(N(B2:J14>1000),TRANSPOSE(COLUMN(B2:J14)^0))>0))

which is:

=SUM(N({9;7;6;3;9;9;3;9;0;9;8;0;9}>0))

becomes:

=SUM(N({TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;TRUE;FALSE;TRUE;TRUE;FALSE;TRUE}))

which is:

=SUM({1;1;1;1;1;1;1;1;0;1;1;0;1})

i.e. **11**, as required.

Before concluding this post, it may be worth pointing out a slight variation on this solution which avoids the need for array-entry, viz:

=SUM(N(MMULT(N(B2:J14>1000),ROW(INDEX(A:A,1):INDEX(A:A,COLUMNS(B2:J14)))^0)>0))

which readers who have a particular aversion to CSE formulas may wish to investigate further.

Another post to follow shortly. Watch this space!

Why not sum the result of an AND() or OR() boolean of column results

as in: sum(and(B2:B14>1000,C2:C14,<,,= and such) and value outside the formula for interactivity.

@ikkemanSorry – not sure I understand what you mean here, and it isn’t clear to me what the formula you posted is in its entirety and where it should be placed. Perhaps you can clarify?

I’m also not quite sure what you mean by “for interactivity”?

Regards

Just for fun:

^ The above enclosed in curly brackets (CSE)

Or:

without CSE

The MMULT with CSE example shown here is used often in Google Sheets. However, there are also two alternatives for this specific example.

The first alternative uses the COUNTUNIQUE function, which counts the number of unique values. It’s perfect for this example.

Note: ArrayFormula() in Google Sheets is similar to CSE in Excel (you press Control+Shift+Enter for it to appear).

The second alternative uses QUERY and COUNTIF.

The advantage of using QUERY over MMULT in a situation like this is that you only have to reference the range a single time.

See the following spreadsheet with the formulas:

https://docs.google.com/spreadsheets/d/1vXq_HlhYsOcJBQ2OUb7DtIDT1TOW8XAcCaYePQ8i0MA/edit?usp=sharing

Mea Culpa, the booleans won’t work, since they return a single value.

By interactivity I mean to get the hard coded values (1000 in your example) out of the formula.

type 1000 in M2 and change your first formula (L2) to

This lets users other than yourself to change the criteria and try some different inputs. Couple that M2 input to your conditional format and you get give additional insight.

Or you can add a row of required limits per year in B16:J16 and use the formula from L8 slightly modified:

As a general rule I try never to put fixed arbitrary values inside formulas

@Alex GrobermanMany thanks. Yes – the SUBTOTAL/OFFSET combination is of course an alternative and perfectly valid set-up here. I just tend to avoid it if I can due to the volatility.

Regards

@isaiMany thanks as always.

I like the look of this

countuniquefunction, I have to say. Tell me – is there also a function which will return an array consisting of those unique values?This is something which, in general, requires a combination of several functions in Excel. For example, given the array:

we might wish to generate the array:

which we could do in several ways, e.g.:

Does Google Sheets perhaps have an in-built function for such a task?

Regards

@ikkemanYes. You are correct that taking such values outside the formula is always to be preferred. I was merely wishing to demonstrate the techniques here, though of course it was still rather lazy to leave this value within the formula. 🙂

Many thanks.

Hi XOR LX,

It is also possible that,

Another option,like isai ,

Regards

@DavidPerhaps, though this is not a very dynamic set-up, as the part:

is static and would require manually amending should the number of the columns within the range change.

Dynamism and flexibility are two things which I consider important features of any solution.

Regards

Hi XOR LX,

The UNIQUE function returns unique rows based on a range/array, so

would generate

@IsaiBut does that function with regards to rows only? What happens if the array passed is two-dimensional?

Regards

Hi XOR LX,

If the array passed is two-dimensional, only unique rows are returned.

(four rows and two columns)

would output

(two rows and two columns)

If you wanted a single column of unique values based on an array that is multiple columns wide, taking each individual cell’s value into account, regardless of whether they are in the same row or not, you would have to either vertically merge ranges into a single column first

or use a formula like

Note: In Google Sheets, the CONCATENATE function can concatenate an entire range together at once.

After shortest formula #4, I’ve revisited this problem:

Whaddaya think?

-Alex

In fact, for the “row” part, the actual buckets are arbitrary since they just need to be distinct, so a slightly prettier version would be:

Hi! Very interesting solution, thanks!

You do not need to ^0 there in your formula. When you made N(B2:J14>1000) there are already array of 1s and 0s, why to convert columns numbers to 1s? We need second part of MMULT only for collapse matrix to one column, so, if in first part there are 0s and 1s, doesnt matter what numbers exactly in second – we’ll get 0s or numbers.

But, we can use ^0 in other place with small COUNT trick:

I came across your site back in January and subscribed immediately. Thank you very much for all the great content. Even if I am not able to participate in the weekly challenges I learn a lot from your subsequent explanations.

I would very much like more articles like the above on the use of MMULT and TRANSPOSE as I find it a bit difficult to get useful, educational information on these functions elsewhere.

@Alex, inspired by your formula:

@AlexNice alternative! I generally prefer to avoid IFERROR set-ups if possible, though it’s still nice to see a completely different approach to the problem using FREQUENCY.

Cheers

@

MaximGood point re my unnecessary use of ^0 here – many thanks. I’m so used to including it in these types of constructions that I completely overlooked that it’s not at all necessary here.

Re your revision to Alex’s FREQUENCY solution, I’m not sure that the logic here is completely correct. What happens, for example, if every single entry in the range is, say, 2000?

Thanks a lot

@Hans KnudsenWelcome to the site! And thank you for your kind words.

And don’t worry – those two functions feature in abundance here, so you’re certain to see more uses of them in future posts! I may also attempt to post a ‘guide-to’-type post on MMULT at some point, as you’re not the first person to have shown interest in an explanation of that function.

Many thanks and look forward to hearing more from you in future.

Regards

OK, I see that I’ve missed something…

There is another one suggestion (not mine, from one of my friends):

What do you think? It works fine if header row is numeric or string, just not empty ))

@MaximIt’s certainly a possibility, yes, and it might be something that I’d consider using myself if I was certain that it would always be rigorous for the given dataset (which, as you say, it should be here). However, I’d perhaps also be hesitant to recommend it as part of a generic solution, given that it’s not completely foolproof.

Certainly one to remember for the next Shortest Formula Challenge, though!

Thanks a lot

Hi XOR LX,

Is there a way to throw in … an additional condition …unique strings in column A …???

Cheers

James

@JamesI’m sure there is!

But what do you mean by “unique” here? That only strings which occur precisely once within column A are considered?

Cheers

@ XOR LX,

Yes, absolutely … you got it right …

Based on your sample worksheet, should France and Italy be listed twice within column A … they should only be counted once…

Cheers

@JamesSure, but how do you decide which row’s data would be ignored and which considered in such cases?

Cheers

@ XOR LX,

Should France sastisfy the preset condition once or more times … France should only be counted once … as a distinct country …

@JamesBut that’s what I don’t understand. I assume you’re talking about a formula which considers some of the data from columns B to J in the above table?

So if France occupied, say, two rows in that table, which row’s data do I ignore when it comes to the figures in columns B to J?

Or am I misunderstanding you here?

Cheers

@ XOR LX,

Sorry … I am not expressing myself very clearly …

There is no row to be ignored … but each country should only be counted once … even if condition is fulfilled in several rows …

Hope this clarifies …

Cheers

@JamesI’m completely lost!

How can no rows be ignored if each country is only to be counted once?

Can you give me a small hypothetical example, perhaps using the data table in this post and e.g. imagining that “France” occupied 2 rows?

Cheers

@ XOR LX,

Let me try to re-phrase it in a much better English :

“If a given country appears more than once, and that for more than one of those occurrences the conditions are met, then it should nevertheless be counted as one”

Hope this clarifies …

Cheers

@JamesNot at all. You explained it well. It’s simply my inability to understand simple statements!

Interesting query, and I don’t believe there’s a more efficient set-up than:

though I may be missing something!

Thanks a lot. Thought-provoking question!

Cheers

@ XOR LX

Absolutely Fabulous …!!!

As usual, an incredibly brilliant solution …!!!

Many many thanks

Cheers

@JamesYou’re welcome, as always! 🙂

Good to see you again. Cheers.

Hi XOR LX,

I am trying to understand the COUNTIF and SUMPRODUCT functions in depth. Please can you let me know why you used N? as in your sumproduct formula

@Baber BegSure, though I should first point out that N is by no means the only choice we have here; any other function/mathematical operation which converts Boolean TRUE/FALSE results into their numerical equivalents (TRUE=1, FALSE=0) would also suffice.

The point is that, in a construction such as:

the two queries, i.e.:

and:

will resolve, for each cell being queried, to either TRUE or FALSE, and so we will have (using the data provided):

As you may know, when we then add the corresponding elements in these two arrays, the Boolean TRUE/FALSEs are converted into their numerical equivalents, such that:

TRUE+TRUE=1+1=2

FALSE+FALSE=0+0=0

FALSE+TRUE=0+1=1

TRUE+FALSE=1+0=1

(This is the point of the construction: any rows for which

at least oneof the entries from column B and column C satisfies our condition(s) – i.e. the result of the above is 1 or 2 – will be considered.)And so the above formula resolves to:

As mentioned, we are here interested only in those rows for which the entry in the above array is a 1 or a 2 (or equivalently, non-zero), and so the test for >0 resolves to:

Finally to your query, then. Unfortunately, without a little further manipulation, SUMPRODUCT is not able to add up the TRUEs and FALSEs in the above to give our desired count. Instead we must first convert them to their numerical equivalents.

And one way to do this (amongst many) is to use the N function, since:

is equal to 0, and:

is equal to 1.

And so, finally:

becomes:

and now we can add up those numerics, giving 8, as desired.

As I said, N is not the only choice here. In fact, to be honest I rarely use it myself for Boolean conversions, instead using a suitable mathematical operation. Hence, we could also use any of:

the point being that in all three cases a mathematical operation (addition of zero, multiplication by one and division by one) is employed which simultaneously converts Boolean TRUE/FALSE values to their numerical equivalents whilst also leaving those mathematical values unchanged.

Hope that helps!

This is crazy… how did I not know you could do this…. ?!?!? 🙂

Not documented on the MS site AFAIK…. Great tip.

Ger

@Ger“Crazy” is what I try to do here! 🙂

Pingback: #Excel Super Links #94 – shared by David Hager | Excel For You

Hi XOR LX,

Hope you are doing fine …

Using your MMULT – TRANSPOSE combination, what would be the exact reverse formula to ‘ Counting-Columns-where-at-least-one condition-is-met ‘…

Thanks in advance for your insight

Cheers

Hi James,

What exactly do you mean by “exact reverse”? Can you give a small example?

Cheers

Hi XOR LX,

Currently, I am using following Array Formula :

Against a range where only cells A10, B4 to B10 contain a value, the above formula produces the correct result of 2 …

How could this formula ‘ converted ‘ into your MMULT – TRANSPOSE combination ?

Cheers

@James

You mean perhaps:

Cheers

@ XOR LX

Absolutely Perfect …!!!

As Usual ..

Cheers

I must admit I am a bit loss at the solutions in the comment however using the formula below achieved what I wanted, almost.

RepsData is basically 4 columns that specify the name of the people on either side of a sale. 2 allowed for each side, and a person can be on both sides of the sale.

In this formula it tells me how many times the name store in E17 occurs at least once. Perfect!

I want to do this monthly so I have to restrict the array to the date value in named range TraderReleaseDate that is between for example May 1 and May 31.

Can I somehow insert a condition that will allow this? I would like this reporting sheet to be dynamic and not rely on me defining the monthly array.

Hi Mark,

That looks like a Google Sheets formula to me. I’m afraid I’m not familiar with Sheets, so unless you want to work in Excel and then translate any solutions I give you back into the necessary equivalents, I’m afraid I won’t be able to help you.

Regards

I am usually able to implement the solution in google sheets.

The question is …. can I filter the results by month?

I guess it is easy if you can dynamically name a range and then apply this logic to the range.

@MarkOk, so can you just first clarify why you can’t simply replace the COUNTIF in your construction with COUNTIFS, incorporating the additional date criteria as appropriate?

Regards

It sounds like a simple solution.

A quick try came back saying the array arguments are of different size.

The countifs I use for the data range is

Where C21 and D21 are the start and end dates.

Any help ?

I can’t help unless I see the full formula. What happened to the RepsData=$E$17 part? Weren’t you transposing this originally?

Please post any updated formulas in their Excel versions, not Sheets.

Regards

Yes … the formula is

RepsData is 4 columns of data.

$E$17 is the RepName I am counting.

I don’t understand the Query(Transpose …. part of this formula

The date condition I want to add is TraderReleaseDate,”>=”&$C21,TraderReleaseDate,”<="&$D21 which is looking at dates in a single column of data.

I believe all versions of these formulas don't like the difference in ranges.

When I work with one column of the Reps data everything works. I could just add the 4 columns of Reps data together but then I count the cases where the rep appears more than once on one line.

Ex

TradeRelease Date A1 A2 B1 B2

Jan Mark Bob Jim Mark

Jan John Bob Jim Mark

Feb John Bob Jim Dave

Mark should be 2

Jim should be 3

John should be 2

Bob should be 3

Dave should be 1

The formula above works on the entire data set but if I just want Jan I don't know how to filter it. As I said, if there is a way to dynamically define the range that would work but I am unaware if that is possible. I am trying to create live data reporting that doesn't require me to define these ranges.

To add one more complication, I will then want to add up other columns of data that will include Revenue $.

Ah, sorry, I should’ve been clearer, at least on one part. That’s the exact same formula that you gave me in your original post; I was hoping that you could show be your revision of that formula (using COUNTIFS, not COUNTIF), which includes your attempt at incorporating the date criteria.

Also, I’m afraid you’re still posting Google Sheets formulas. I’d really have to insist on the Excel versions, if that’s ok.

Regards

Sorry …

I realize this is Google sheets.

It is more the concept around the different sized arrays. I can usually work through the syntax.

In excel … can you define a range dynamically in some way? Perhaps a a macro that can run when you open the sheet that will dynamically Define a range "May Sales" for example where all dates in TraderReleaseDate has a May date value.

Ok, thanks.

In Excel then you’d adapt the solution I give in this post thus (committed as an array formula):

Regards

Excellent. The syntax is pretty much the same. I was having trouble getting it to work so I saved in excel and opened in google sheets. I don’t think it changed at all.

One issue. It only works if Repsdata is a single column. Repsdata is over 4 columns.

The idea of this is that within the date range it only counts the number of rows where $E$17 appears at least once in the data range (Repsdata)

I tested with Repsdata being 4 columns, not 1. Can you please post a very reduced mock dataset with your expected results plus the (erroneous) results my formula gives?

Let’s make it the Excel version, if that’s ok.

Regards

Since there is no upload I will post in here … hopefully what you want.

RepData (5 x 1 col) RepsData (5×4)

May 5 2019 Mark Mark Bob Jim

May 7 2019 Mark Bob Jim Sue

May 14 2019 Sue Bob Jim Mark

June 1 2019 Sue Bob Fred Doug

June 15 2019 John Steve Mark Sean

Date Range

May 1 2019 May 31 2019 Formula would be here> Expect

Jun2 2019 June 30 2019 result of 4 for Mark if it works

$F$14 = Trader Name

ex Mark

Your formula using Repsdata – 5 rows x 4 cols

Your formula using 5 rows x 1 column of RepData

Although I think I had this working ok with 1 column of data in Excel this morning I did not save it.

Now I get #N/A for both formulas and when evaluate the formula it gets values in part but 1/2 of it shows #N/A

Thanks a lot. Apologies for the lack of an upload option.

Unfortunately it seems as if the HTML editor has truncated your formulas, so I’m not sure what they would be in their entirety. Would you be willing to send me a file via email (xorlx99@gmail.com)?

Regards

For sure … will send it to you

Pingback: Count number of rows that have X in cell A or B or C - Boot Panic