In this post I would like to expatiate on a technique which has been hinted at previously (and so which may already be familiar to regular readers), though which I’d like to make explicit, and, what’s more, within the context of a post which outlines one of the most practical uses for that technique.

As most readers will no doubt know, the vast majority of Excel functions are able to ignore Booleans (and sometimes, where appropriate, other non-numerics) within the range passed. As such, they effectively operate over a reduced range which comprises the non-Booleans (or numerics) only, allowing us to include conditional statements (generally using IF) within our function so as to restrict which values are – ultimately – processed by our construction.

For example, and using the workbook provided (download here), we can derive such constructions as:

=AVERAGE(IF(A2:A11="X",B2:B11))

the resolution of which – at the risk of pointing out the obvious – is:

=AVERAGE(IF({FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;FALSE;TRUE;TRUE;TRUE},B2:B11))

which is:

=AVERAGE({FALSE;58;FALSE;23;FALSE;93;FALSE;35;27;40})

and since – like the vast majority of functions – AVERAGE ignores Booleans (i.e. the FALSE entries here) within the range passed, the above is effectively equivalent to:

=AVERAGE({58;23;93;35;27;40})

i.e. 46.

And naturally we can replace AVERAGE here with a whole host of functions and achieve equivalent results; the point being that in almost all cases any Booleans are similarly disregarded.

However, there exist a number of Excel functions which are not able to process (i.e. ignore) certain datatypes within the array passed. Some of the more common statistical functions – GROWTH, LINEST, LOGEST and TREND being good examples – will not accept arrays other than those which consist entirely of numerics. So, for example, replacing AVERAGE in the previous formula with LINEST, i.e.:

=LINEST(IF(A2:A11="X",B2:B11))

whose resolution – up until the final step – is likewise:

=LINEST({FALSE;58;FALSE;23;FALSE;93;FALSE;35;27;40})

unfortunately falls down at the next (and final) step, viz:

#VALUE!

though clearly we were hoping rather for:

-3.88571428571429

i.e. the evaluation of:

=LINEST({58;23;93;35;27;40})

Replace LINEST with GROWTH, LOGEST or TREND and you will meet a similar fate.

What to do then? How can we obtain our desired result (without having to drastically rearrange our source data, that is)?

Well, the solution is to make sure that the array we are passing consists of just those values which meet our criteria, and nothing else (i.e. none of those pesky Boolean FALSEs).

It should be pointed out that this issue has not previously gone unsolved. However, past attempts at generating the reduced array appear to consist of rather heavy and volatile set-ups of the OFFSET/ROW/INDIRECT-type.

However, such approaches are unnecessary, and we can instead employ the rather more elegant:

=LINEST(INDEX(B:B,N(IF(1,MODE.MULT(IF(A2:A11={"X","X"},ROW(A2:A11)))))))

**How does it work?**

Firstly, I should point out that the above set-up is not, in general, to be recommended. Not only is it not the most flexible construction available (it makes use of a static, in-formula criterion ({“X”,”X”}) not readily replaceable with an actual range reference (e.g. D1)), but also means that our choice of conditional statement is somewhat limited.

I will present a refinement a little further down this post, as well as a variant in which our criterion is perhaps not the verification of a given entry (“X”) in some other column but is, for example, a test for the numericalness of the entries in the range to be passed. I will also consider the case where more than one range must first be reduced (“de-Booleanised”) prior to processing; for example, if we were to also include a *known_x’s* parameter subject to conditions in the above LINEST construction.

For now, however, the above is sufficient to demonstrate the workings of the technique. And let’s start with the somewhat radical-looking construction:

IF(A2:A11={"X","X"},ROW(A2:A11))

which one might at first be tempted to claim is a typo on the intended:

IF(A2:A11="X",ROW(A2:A11))

until, that is, we look a little further and see that the result of this construction is being passed to MODE.MULT.

And since one of the properties of that function is that, if no one value within the array being passed occurs *at least twice*, an error (#N/A) is returned, this would be precisely the issue we would encounter were we to attempt the ‘normal’:

MODE.MULT(IF(A2:A11="X",ROW(A2:A11)))

since this is:

MODE.MULT({FALSE;3;FALSE;5;FALSE;7;FALSE;9;10;11})

and indeed, each distinct (numeric) value within this array occurs just once.

By using:

MODE.MULT(IF(A2:A11={"X","X"},ROW(A2:A11)))

however, in which we have made the minor yet significant amendment to artificially include the conditional value (“X”) a second time, we now have:

MODE.MULT(IF({FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;TRUE,TRUE;TRUE,TRUE},ROW(A2:A11)))

which this time is:

MODE.MULT({FALSE,FALSE;3,3;FALSE,FALSE;5,5;FALSE,FALSE;7,7;FALSE,FALSE;9,9;10,10;11,11})

and, having ensured that each of our returns from the ROW function occurs precisely twice, MODE.MULT is this time happy to process the above array, viz:

{3;5;7;9;10;11}

as desired.

Note that, in order to achieve this duplication of each of the values from the ROW function, we effectively convert (or redimension) the one-dimensional return from that function into a two-dimensional one. In other words, we take the result of:

ROW(A2:A11)

i.e.:

{2;3;4;5;6;7;8;9;10;11}

i.e. a 10-row-by-1-column array, and, by including a suitably-dimensioned array in IF’s *logical_test*, i.e.:

{"X","X"}

i.e. a 2-column-by-1-row array, we guarantee that the resolution of this IF clause – a clause which is now being forced to evaluate a 10-row-by-1-column array with respect to a 2-column-by-1-row array – will comprise a 2-column-by-10-row array, viz:

{FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;FALSE,FALSE;TRUE,TRUE;TRUE,TRUE;TRUE,TRUE}

And this is an important technical point to note. If, for example, we were in fact dealing with a *horizontal* range as opposed to our vertical one here, we would be advised to take care when constructing our solution. If the range in question was not A2:B11 but, for example, A2:J3, then, as well as naturally having to replace our ROW portion with a suitable COLUMN one, we would also need to address this matter of orthogonality appropriately.

As such, the following attempt:

=LINEST(INDEX(3:3,N(IF(1,MODE.MULT(IF(A2:J2={"X","X"},COLUMN(A2:J2)))))))

although correctly referencing the range 3:3 (in place of B:B) and rightly employing COLUMN (in place of ROW), would nevertheless be erroneous, since the dimensions of the array resulting from our IF statement, which is here resolving:

A2:J2

i.e. a 1-row-by-10-column array, with respect to:

{"X","X"}

i.e. a 1-row-by-2-column array, will not be as required, the resolution of that statement being:

{FALSE,TRUE,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A,#N/A}

the reason for which is that we carelessly neglected to ensure that the arrays either side of the equality in the statement being passed as IF’s *logical_test*:

A2:J2={"X","X"}

were orthogonal to each other.

Correct, of course, is this time to use:

{"X";"X"}

since this array is indeed orthogonal to the range (A2:J2) being considered.

In full, then, we would use the following:

=LINEST(INDEX(3:3,N(IF(1,MODE.MULT(IF(A2:J2={"X";"X"},COLUMN(A2:J2)))))))

when considering the equivalent horizontal set-up.

So far, so good, then. And, just before moving on to the more rigorous set-up I mentioned, it is worth pointing out that we can of course replace our conditional statement here with others of our choosing.

For example, to exclude blanks from the range A2:A11 (which here just happens to be equivalent to our current set-up), we could employ:

<>{"",""}

in place of:

={"X","X"}

But what if our criterion is not entered as a static constant within the formula, but rather (and more dynamically) within an actual worksheet cell? If, for example, we enter “X” in cell D1, then the attempt at reproducing the necessary doubling:

=LINEST(INDEX(B:B,N(IF(1,MODE.MULT(IF(A2:A11=D1:D1,ROW(A2:A11)))))))

unfortunately fails to impress Excel, who unsportingly insists on interpreting our creative attempt at generating a two-element array as one of just a single element. (In any case, even if we could convince it to accept D1:D1 as a two-element array, there’s no a priori reason why that array should be a row-vector and not a column-vector.)

Fortunately, the amended set-up:

=LINEST(INDEX(B:B,N(IF(1,MODE.MULT(IF(A2:A11=D1,{1,1}*ROW(A2:A11)))))))

is not only sound, but also more rigorous and more dynamic.

Indeed, it is this variation which I would recommend to readers of this post. Having removed the burden of responsibility for duplicating the values within our array from the IF clause, we can now do pretty much as we wish with that part, one random – and practical – example being:

=LINEST(INDEX(B:B,N(IF(1,MODE.MULT(IF(ISNUMBER(B2:B11),{1,1}*ROW(B2:B11)))))))

which considers only the numerical values within the range B2:B11.

And if, as mentioned, we require that a second parameter be similarly restricted, we simply repeat this technique with the necessary small amendments. For example, and continuing to use LINEST as our example, we might often face the situation in which we must similarly restrict the range being passed as that function’s second parameter, *known_x’s*.

As such, we might construct something like:

=LINEST(INDEX(B:B,N(IF(1,MODE.MULT(IF(A2:A11="X",{1,1}*ROW(B2:B11)))))),INDEX(C:C,N(IF(1,MODE.MULT(IF(A2:A11="X",{1,1}*ROW(C2:C11)))))))

amongst other possibilities.

Another post to follow shortly. Watch this space!

Pingback: IF Statement Within different Rows

@XORLX

Another way of not hard-coding “X” in the below

but instead referring it to a cell say D1 is

Is anyone still checking this post 4 years later?

Hi Chris,

Yes, I am! Why, have I somehow missed a previous comment of yours re this post?

Regards

What if you have multiple X variables and not just one?

Can you give a more precise definition of what you mean, perhaps using the example given in this post?

Hi,

I tried to use this formula with multiple regression but it’s error.

I tried to run regression Y, X1, X2 ,X3 series with criteria that column Z have value in between 0.5 to 0.7.

Thank you.

Hi,

Can you be a bit more specific about your requirements? If it helps matters, you can send me a workbook: xorlx99@gmail.com.

Regards