This is the fourth in a series of discussions on the techniques available for extracting numbers from an alphanumeric string.

In the first instalment in this series (which can be found here) I looked at extracting consecutive numbers which appear at the *start* of the string, e.g. **123**ABC456.

In the second instalment (here) I looked at extracting consecutive numbers which appear at the *end* of the string, e.g. 123ABC**456**.

In the third instalment (here) I looked at extracting all numbers from a string where each of those numbers was to be returned to a *separate* cell. For example, given the string **81**;**8.75**>@**5279**@**4**.=**45**>A?A; we extracted **81**, **8.75**, **5279**, **4** and **45** into individual cells.

In this post I will look at a technique for extracting *all* numbers from a string, but where those numbers are to be returned as a *single number* in a *single* cell.

I will use the same string as in the previous post, i.e. **81**;**8.75**>@**5279**@**4**.=**45**>A?A; though this time the desired resuilt will be:

**81****875****5279****4****45**

Note that, due to the fact that Excel’s worksheet functions do not provide a native means by which we can pass arrays of strings to a concatenating function, the only viable means by which we will be able to arrive at our result will be to generate a proper numerical return.

As such, unless we repeat the method outlined over several iterations, and then concatenate the results, our solution will fail if the original string contains more than 15 digits, since a number of such length would be beyond Excel’s limitations.

There are two main solutions to this problem and, although I shall present the formulas for both, I will restrict my deconstruction to just one of these. In fact, the one I shall dissect is by far the shorter and more efficient of the two, as well as being, as far as I know, still not that well-known. In fact, where questions of this nature arise on the various Excel forums around the internet, it is almost always the other solution which is presented.

True, the alternative construction is not at all obvious, and in fact most readers seeing it for the first time will probably feel a mixture of incredulity and awe. If what I’ve read on the internet is correct, it appears to have been first thought of (at least, a very slight variation of the version I will present) by someone called Lori.

Just before I come to that one, though, here’s the usual method in such cases, which is the following **array formula**:

=SUM(MID(A1,LARGE(IF(ISNUMBER(0+MID(A1,Arry1,1)),Arry1),ROW(INDIRECT("1:"&COUNT(0+MID(A1,Arry1,1))))),1)*10^(ROW(INDIRECT("1:"&COUNT(0+MID(A1,Arry1,1))))-1))

where *Arry1* is a row-relative Defined Name given by:

=ROW(INDIRECT("1:"&LEN($A1)))

Quite a lengthy and resource-heavy construction, then, for what appears on the face of it a reasonably simple task. So let’s have a look at the alternative without further ado, and that is the following **array formula**:

=NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,""))

**How does it work?**

We should hopefully be quite familiar with the ROW/INDIRECT construction:

ROW(INDIRECT("1:"&LEN(A1)))

which simply generates a list of integers from 1 up to the length of the string in A1, which is 24 in this case, i.e.:

{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

We subtract this array of values from the result of 1+LEN(A1), i.e. 25, so that:

1+LEN(A1)-{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

which is:

25-{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

becomes:

{24;23;22;21;20;19;18;17;16;15;14;13;12;11;10;9;8;7;6;5;4;3;2;1}

And so we are passing this array of 24 values to MID as its *start_num* parameter, which means that:

MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)

becomes:

MID("81;8.75>@5279@4.=45>A?A;",{24;23;22;21;20;19;18;17;16;15;14;13;12;11;10;9;8;7;6;5;4;3;2;1},1)

which is:

{";";"A";"?";"A";">";"5";"4";"=";".";"4";"@";"9";"7";"2";"5";"@";">";"5";"7";".";"8";";";"1";"8"}

as can easily be verified.

Dividing this array of values by 10 gives:

{#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;0.5;0.4;#VALUE!;#VALUE!;0.4;#VALUE!;0.9;0.7;0.2;0.5;#VALUE!;#VALUE!;0.5;0.7;#VALUE!;0.8;#VALUE!;0.1;0.8}

And applying the IFERROR results in:

{"";"";"";"";"";0.5;0.4;"";"";0.4;"";0.9;0.7;0.2;0.5;"";"";0.5;0.7;"";0.8;"";0.1;0.8}

And we are now finally in a position to process the NPV function over this array. Of course, I should at first explain precisely how this function operates, and why we have the rather strange-looking initial parameter of -0.9.

The first point to mention is that NPV has the nice property that it ignores any blanks within the range passed to it and operates only on any numerical values within that range, and in order from left to right.

I won’t go into too much of an analysis of the financial reasoning behind this function, since it is not particularly relevant to our discussion. Suffice to say that this function, whose syntax is:

NPV(rate,value1,value2,value3,,,)

is equivalent to calculating the following summation:

=value1/(1+rate)^1+value2/(1+rate)^2+value3/(1+rate)^3+...

Now, with this in mind, and knowing that, in order to generate our desired result, we need to multiply the entries in our array by successive powers of 10 and then sum the result, we can see that, if we choose a suitable value for the *rate* parameter, this formula will give us precisely that.

Hence the choice of -0.9, since not only is 1-0.9 obviously 0.1, but also, when we take successive powers of 0.1, beginning with an index of 1, we get:

0.1

0.01

0.001

0.0001

…

and when these values are reciprocated with unity, we get:

10

100

1000

10000

…

Hence, in our case, the first non-blank entry from our array, i.e. 0.5, will be multiplied by 10, the second (0.4) by 100, the third (0.4) by 1000, and so and and so on.

We then sum these results and so, finally:

=NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,""))

results in **818755279445**, as required.

Actually, it appears that this set-up may lead to rounding issues in some cases. Indeed, when I formatted the cell contents appropriately for this result, Excel displayed:

**818755279445.002**

As such, it’s probably wise to wrap this construction in e.g. INT, so as to ensure our output is an integer:

=INT(NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,"")))

Another instalment to follow shortly!

Thanks, very interesting, learned something here.

Great. Glad to hear it!

Thank you for this. I have seen this formula before but never seen the explanation presented in the minute detail you are notable for.

Thank you again.