Extracting numbers from a string 4: All numbers to a single cell 2

This is the fourth in a series of discussions on the techniques available for extracting numbers from an alphanumeric string.

In the first instalment in this series (which can be found here) I looked at extracting consecutive numbers which appear at the start of the string, e.g. 123ABC456.

In the second instalment (here) I looked at extracting consecutive numbers which appear at the end of the string, e.g. 123ABC456.

In the third instalment (here) I looked at extracting all numbers from a string where each of those numbers was to be returned to a separate cell. For example, given the string 81;8.75>@5279@4.=45>A?A; we extracted 81, 8.75, 5279, 4 and 45 into individual cells.

In this post I will look at a technique for extracting all numbers from a string, but where those numbers are to be returned as a single number in a single cell.

I will use the same string as in the previous post, i.e. 81;8.75>@5279@4.=45>A?A; though this time the desired resuilt will be:

818755279445

Note that, due to the fact that Excel’s worksheet functions do not provide a native means by which we can pass arrays of strings to a concatenating function, the only viable means by which we will be able to arrive at our result will be to generate a proper numerical return.

As such, unless we repeat the method outlined over several iterations, and then concatenate the results, our solution will fail if the original string contains more than 15 digits, since a number of such length would be beyond Excel’s limitations.

There are two main solutions to this problem and, although I shall present the formulas for both, I will restrict my deconstruction to just one of these. In fact, the one I shall dissect is by far the shorter and more efficient of the two, as well as being, as far as I know, still not that well-known. In fact, where questions of this nature arise on the various Excel forums around the internet, it is almost always the other solution which is presented.

True, the alternative construction is not at all obvious, and in fact most readers seeing it for the first time will probably feel a mixture of incredulity and awe. If what I’ve read on the internet is correct, it appears to have been first thought of (at least, a very slight variation of the version I will present) by someone called Lori.

Just before I come to that one, though, here’s the usual method in such cases, which is the following array formula:

=SUM(MID(A1,LARGE(IF(ISNUMBER(0+MID(A1,Arry1,1)),Arry1),ROW(INDIRECT("1:"&COUNT(0+MID(A1,Arry1,1))))),1)*10^(ROW(INDIRECT("1:"&COUNT(0+MID(A1,Arry1,1))))-1))

where Arry1 is a row-relative Defined Name given by:

=ROW(INDIRECT("1:"&LEN($A1)))

Quite a lengthy and resource-heavy construction, then, for what appears on the face of it a reasonably simple task. So let’s have a look at the alternative without further ado, and that is the following array formula:

=NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,""))

How does it work?

We should hopefully be quite familiar with the ROW/INDIRECT construction:

ROW(INDIRECT("1:"&LEN(A1)))

which simply generates a list of integers from 1 up to the length of the string in A1, which is 24 in this case, i.e.:

{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

We subtract this array of values from the result of 1+LEN(A1), i.e. 25, so that:

1+LEN(A1)-{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

which is:

25-{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24}

becomes:

{24;23;22;21;20;19;18;17;16;15;14;13;12;11;10;9;8;7;6;5;4;3;2;1}

And so we are passing this array of 24 values to MID as its start_num parameter, which means that:

MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)

becomes:

MID("81;8.75>@5279@4.=45>A?A;",{24;23;22;21;20;19;18;17;16;15;14;13;12;11;10;9;8;7;6;5;4;3;2;1},1)

which is:

{";";"A";"?";"A";">";"5";"4";"=";".";"4";"@";"9";"7";"2";"5";"@";">";"5";"7";".";"8";";";"1";"8"}

as can easily be verified.

Dividing this array of values by 10 gives:

{#VALUE!;#VALUE!;#VALUE!;#VALUE!;#VALUE!;0.5;0.4;#VALUE!;#VALUE!;0.4;#VALUE!;0.9;0.7;0.2;0.5;#VALUE!;#VALUE!;0.5;0.7;#VALUE!;0.8;#VALUE!;0.1;0.8}

And applying the IFERROR results in:

{"";"";"";"";"";0.5;0.4;"";"";0.4;"";0.9;0.7;0.2;0.5;"";"";0.5;0.7;"";0.8;"";0.1;0.8}

And we are now finally in a position to process the NPV function over this array. Of course, I should at first explain precisely how this function operates, and why we have the rather strange-looking initial parameter of -0.9.

The first point to mention is that NPV has the nice property that it ignores any blanks within the range passed to it and operates only on any numerical values within that range, and in order from left to right.

I won’t go into too much of an analysis of the financial reasoning behind this function, since it is not particularly relevant to our discussion. Suffice to say that this function, whose syntax is:

NPV(rate,value1,value2,value3,,,)

is equivalent to calculating the following summation:

=value1/(1+rate)^1+value2/(1+rate)^2+value3/(1+rate)^3+...

Now, with this in mind, and knowing that, in order to generate our desired result, we need to multiply the entries in our array by successive powers of 10 and then sum the result, we can see that, if we choose a suitable value for the rate parameter, this formula will give us precisely that.

Hence the choice of -0.9, since not only is 1-0.9 obviously 0.1, but also, when we take successive powers of 0.1, beginning with an index of 1, we get:

0.1
0.01
0.001
0.0001

and when these values are reciprocated with unity, we get:

10
100
1000
10000

Hence, in our case, the first non-blank entry from our array, i.e. 0.5, will be multiplied by 10, the second (0.4) by 100, the third (0.4) by 1000, and so and and so on.

We then sum these results and so, finally:

=NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,""))

results in 818755279445, as required.

Actually, it appears that this set-up may lead to rounding issues in some cases. Indeed, when I formatted the cell contents appropriately for this result, Excel displayed:

818755279445.002

As such, it’s probably wise to wrap this construction in e.g. INT, so as to ensure our output is an integer:

=INT(NPV(-0.9,IFERROR(MID(A1,1+LEN(A1)-ROW(INDIRECT("1:"&LEN(A1))),1)/10,"")))

Another instalment to follow shortly!

2 comments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s