Extracting Numbers of Set Length Only from Alphanumeric Strings 20

In this post I would like to present a solution to the practical problem of extracting a number of defined length from an alphanumeric string which may contain several numbers of varied lengths.

Indeed, the inspiration behind this post is in part derived from having personally witnessed many such requests on the various Excel forums, most of which involve the extraction of e.g. an account number of fixed length, 6 digits, say, from a longish string containing many other numbers.

As an example, given the following string:

20/04/15 - VAT Reg: 1234567: Please send 123456 against Order #98765, Customer Code A123XY, £125.00

we may wish to extract the one occurrence of a 6-digit number (123456) from that string.

We need to be aware that any solution which involves searching for 6 consecutive numerical characters may not, on its own, be sufficient. Although such an approach could give correct results, it is not at all guaranteed to do so: for the above string, for example, this would most likely lead to an (incorrect) return of 1234567, and not 123456.

The only means by which we can be certain we have correctly located the one 6-digit number within our string is by also considering the character immediately preceding and that immediately following any string of 6 consecutive numerics, and verifying that both of these are not themselves numerics.

I will present two variations on the solution here, the first of which is static, i.e. contains elements which are fixed and particular to the length of the number being extracted, the second of which is dynamic and allows for this length to be considered a variable.

The deconstruction will here concentrate on the static set-up, with the dynamic solution presented – without explanation – at the very end of this post. Indeed, given the choice between a dynamic and a static solution, it is not always the case that we should opt for the former in practice: the use of array constants offers us greater efficiency in terms of workbook calculation, something which may well be a factor if we imagine such extractions being required for several thousand strings.

What’s more, it is hoped that readers of this post, wishing to apply this technique but perhaps for numbers of a length different than that given here (i.e. 6), will be able to make the necessary amendments to the static set-up. In any case I shall explicitly mention which are the value-dependent parts within this construction.

And that construction is the following (non-array) formula:


How does it work?

Firstly, this part:


is a construction no doubt familiar to many, and simply generates an array of integers from 1 up to a value 5 less than the length of the string in A1, which is here 99, i.e.:


The choice of this value (5 here) must necessarily be one less than the length of the number we are seeking to extract, since it means that, when we now pass this array to MID for its start_num parameter, we ensure that we will consider all substrings of length 8 from the string in A1.

First, however – and the reason we are using 5 here and not 7 – we must ensure that we first catenate a single non-numeric character to the start and end of our string.

The reasoning is that, since the crux of this solution effectively involves testing all substrings of length 8 characters and verifying which, if any, consist of a non-numeric followed by 6 numerics followed by a non-numeric, then, were our 6-digit number to fall at the very start – or very end – of the string, then this strategy would fail if we did not first make these small amendments.

Which means that:


which here is:

MID("ζ20/04/15 - VAT Reg: 1234567: Please send 123456 against Order #98765, Customer Code A123XY, £125.00ζ",{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20;21;22;23;24;25;26;27;28;29;30;31;32;33;34;35;36;37;38;39;40;41;42;43;44;45;46;47;48;49;50;51;52;53;54;55;56;57;58;59;60;61;62;63;64;65;66;67;68;69;70;71;72;73;74;75;76;77;78;79;80;81;82;83;84;85;86;87;88;89;90;91;92;93;94},8)


{"ζ20/04/1";"20/04/15";"0/04/15 ";"/04/15 -";"04/15 - ";"4/15 - V";"/15 - VA";"15 - VAT";"5 - VAT ";" - VAT R";"- VAT Re";" VAT Reg";"VAT Reg:";"AT Reg: ";"T Reg: 1";" Reg: 12";"Reg: 123";"eg: 1234";"g: 12345";": 123456";" 1234567";"1234567:";"234567: ";"34567: P";"4567: Pl";"567: Ple";"67: Plea";"7: Pleas";": Please";" Please ";"Please s";"lease se";"ease sen";"ase send";"se send ";"e send 1";" send 12";"send 123";"end 1234";"nd 12345";"d 123456";" 123456 ";"123456 a";"23456 ag";"3456 aga";"456 agai";"56 again";"6 agains";" against";"against ";"gainst O";"ainst Or";"inst Ord";"nst Orde";"st Order";"t Order ";" Order #";"Order #9";"rder #98";"der #987";"er #9876";"r #98765";" #98765,";"#98765, ";"98765, C";"8765, Cu";"765, Cus";"65, Cust";"5, Custo";", Custom";" Custome";"Customer";"ustomer ";"stomer C";"tomer Co";"omer Cod";"mer Code";"er Code ";"r Code A";" Code A1";"Code A12";"ode A123";"de A123X";"e A123XY";" A123XY,";"A123XY, ";"123XY, £";"23XY, £1";"3XY, £12";"XY, £125";"Y, £125.";", £125.0";" £125.00";"£125.00ζ"}

We now pass each of these strings to a further MID construction, so that we may analyse the 8 characters within each, so that:



{"ζ","2","0","/","0","4","/","1";"2","0","/","0","4","/","1","5";"0","/","0","4","/","1","5"," ";"/","0","4","/","1","5"," ","-";"0","4","/","1","5"," ","-"," ";"4","/","1","5"," ","-"," ","V";"/","1","5"," ","-"," ","V","A";"1","5"," ","-"," ","V","A","T";"5"," ","-"," ","V","A","T"," ";" ","-"," ","V","A","T"," ","R";"-"," ","V","A","T"," ","R","e";" ","V","A","T"," ","R","e","g";"V","A","T"," ","R","e","g",":";"A","T"," ","R","e","g",":"," ";"T"," ","R","e","g",":"," ","1";" ","R","e","g",":"," ","1","2";"R","e","g",":"," ","1","2","3";"e","g",":"," ","1","2","3","4";"g",":"," ","1","2","3","4","5";":"," ","1","2","3","4","5","6";" ","1","2","3","4","5","6","7";"1","2","3","4","5","6","7",":";"2","3","4","5","6","7",":"," ";"3","4","5","6","7",":"," ","P";"4","5","6","7",":"," ","P","l";"5","6","7",":"," ","P","l","e";"6","7",":"," ","P","l","e","a";"7",":"," ","P","l","e","a","s";":"," ","P","l","e","a","s","e";" ","P","l","e","a","s","e"," ";"P","l","e","a","s","e"," ","s";"l","e","a","s","e"," ","s","e";"e","a","s","e"," ","s","e","n";"a","s","e"," ","s","e","n","d";"s","e"," ","s","e","n","d"," ";"e"," ","s","e","n","d"," ","1";" ","s","e","n","d"," ","1","2";"s","e","n","d"," ","1","2","3";"e","n","d"," ","1","2","3","4";"n","d"," ","1","2","3","4","5";"d"," ","1","2","3","4","5","6";" ","1","2","3","4","5","6"," ";"1","2","3","4","5","6"," ","a";"2","3","4","5","6"," ","a","g";"3","4","5","6"," ","a","g","a";"4","5","6"," ","a","g","a","i";"5","6"," ","a","g","a","i","n";"6"," ","a","g","a","i","n","s";" ","a","g","a","i","n","s","t";"a","g","a","i","n","s","t"," ";"g","a","i","n","s","t"," ","O";"a","i","n","s","t"," ","O","r";"i","n","s","t"," ","O","r","d";"n","s","t"," ","O","r","d","e";"s","t"," ","O","r","d","e","r";"t"," ","O","r","d","e","r"," ";" ","O","r","d","e","r"," ","#";"O","r","d","e","r"," ","#","9";"r","d","e","r"," ","#","9","8";"d","e","r"," ","#","9","8","7";"e","r"," ","#","9","8","7","6";"r"," ","#","9","8","7","6","5";" ","#","9","8","7","6","5",",";"#","9","8","7","6","5",","," ";"9","8","7","6","5",","," ","C";"8","7","6","5",","," ","C","u";"7","6","5",","," ","C","u","s";"6","5",","," ","C","u","s","t";"5",","," ","C","u","s","t","o";","," ","C","u","s","t","o","m";" ","C","u","s","t","o","m","e";"C","u","s","t","o","m","e","r";"u","s","t","o","m","e","r"," ";"s","t","o","m","e","r"," ","C";"t","o","m","e","r"," ","C","o";"o","m","e","r"," ","C","o","d";"m","e","r"," ","C","o","d","e";"e","r"," ","C","o","d","e"," ";"r"," ","C","o","d","e"," ","A";" ","C","o","d","e"," ","A","1";"C","o","d","e"," ","A","1","2";"o","d","e"," ","A","1","2","3";"d","e"," ","A","1","2","3","X";"e"," ","A","1","2","3","X","Y";" ","A","1","2","3","X","Y",",";"A","1","2","3","X","Y",","," ";"1","2","3","X","Y",","," ","£";"2","3","X","Y",","," ","£","1";"3","X","Y",","," ","£","1","2";"X","Y",","," ","£","1","2","5";"Y",","," ","£","1","2","5",".";","," ","£","1","2","5",".","0";" ","£","1","2","5",".","0","0";"£","1","2","5",".","0","0","ζ"}

Note that it is important that, this time, the array being passed as MID’s start_num parameter, i.e.:


be of a displacement orthogonal to that of the array being passed to MID as its text parameter. Since that array is a single-column array (a 94-row-by-1-column array, to be precise) we need to ensure that the array of 8 elements being passed as the start_num parameter is a single-row array. Readers not familiar with such matters may wish to evaluate the MID construction using instead:


for this parameter, and noting the results.

Having obtained our array consisting of each of the individual characters in each of our 8-character substrings, we need to now query each of these groups of 8 as to whether the first and last characters are non-numeric and the central six numeric.

We could use e.g. ISNUMBER for this purpose, though my choice of ISERR is equally good, since, when we attempt to coerce to numerics (with e.g. +0 as I use here), the only possible single-character strings which will not result in an error are the digits from 0-9.



will give (after also resolving the N function over the array of Boolean TRUE/FALSE returns from the ISERR clause):


Now, here comes the tricky part. Recall that we are searching for the one occurrence of six numerics surrounded by a single non-numeric at either end. In the above array, such an occurrence will be seen as the string:


(Highlighted in red.)

In order to identify the location of this sequence of 0s and 1s, we can use MMULT. And, since this will entail multiplying each of the 8 values within each of the 94 groups in the above matrix with the values in the corresponding position of whatever second matrix we choose, we can, by carefully selecting the values in that second matrix, ensure that the product of that matrix and our sought-after array above will be unique.

Because we know that each of the values in each of our 94 groups of 8 can only ever be either 0 or 1, we simply have to be sure, when choosing our second matrix for MMULT, that no other permutation of 1s and 0s will result in the same value as that obtained when using the array corresponding to our desired extraction, given above.

Naturally there are an infinite number of choices we can make for the values in this second array. I chose here:


for no particular reason other than because it works (perhaps also due to a subconscious fondness for the number 13?)

The justification is that, if we first look at the result of the single matrix multiplication comprising this array and that corresponding to our desired extraction, i.e. if we calculate:


the result is of course 26. For those not too sure of how matrix multiplication operates, the above is equivalent to:


Now, readers may want to investigate the veracity of my next claim, but, for all other permutations of 8 values – let’s call them a1, a2, a3, a4, a5, a6, a7 and a8 – where those values are all either 0 or 1, when we perform:


there are no such permutations such that the result of the above will also result in 26.

As I stated previously, the choice of this second matrix, i.e.:


and of its corresponding result of 26, are by no means the only such possibilities. Readers may wish to explore what other choices would suffice here.



which is here:


resolves to:


in which I have highlighted the one occurrence of 26.

We now pass this array to MATCH, with our value of 26 as the lookup_value, such that:


which is here:


returns 42.

Since this corresponds to the position within our string of the first of our 8 characters we were seeking, and recalling that the first of those 8 characters will here be a non-numeric, we simply shift along one place to the right, which will be the start of our desired six-digit numeric. And so, finally:


which is:

=0+MID("ζ20/04/15 - VAT Reg: 1234567: Please send 123456 against Order #98765, Customer Code A123XY, £125.00ζ",1+42,6)


=0+MID("ζ20/04/15 - VAT Reg: 1234567: Please send 123456 against Order #98765, Customer Code A123XY, £125.00ζ",43,6)



which is:


as desired.

To finish with, as promised here is a dynamic version of the formula which allows for the length of the number being sought to be a variable.

The only thing with which we need to be careful is that we construct the second array to be passed to MMULT such that, as with our example here, we ensure that the result of the matrix multiplication with that array and our “desired” array (which will always be that in which the first and last entries are unity and the remaining, central entries all zero) is unique.

Of course, the important difference with the dynamic set-up is that we cannot simply choose some arbitrary values for this second matrix which are sufficient for one particular case; instead, we must ensure that they will work whatever the length of the number being sought.

And so, if we assume that the length of the desired number to be extracted is in B1, the required construction is now:


the new parts in which I leave to the reader to investigate.

Another post to follow shortly. Watch this space!


  1. @Felipe

    Thanks! Glad you liked it!

    To be honest I haven’t worked out an ALT+ shortcut for that symbol. I simply insert it using Insert Symbol and then copy it.


  2. In Google Sheets, this can be accomplished with the REGEXEXTRACT function:

  3. wow…. I have to “borrow” the formula for my future reference. 🙂

    May I know the reason for using “ζ”? It seems the formula also works if we use some other symbols like “#”.

    For an easy way to input “ζ “, we may consider to put it into AutoCorrect, e.g. (z) changes to ζ. Only limitation is it applies to word, ppt, etc.


  4. For your {13;1;1;1;1;1;1;13} array I would suggest the following: {1;2;4;8;16;32;64;128}. This will give a unique result for all different combination of 0’s and 1’s in your search string.

    In your specific case, your result works, though the minimum values cold be (7;1…1;7}, since the middle values can only sum to 6, a 7 should do just as well as a 13 (and remove 2 digits from your formula lenght…)

    using my suggestion, you could re-purpose the same formula to any specific sequence of num/non-num characters.

  5. @MF

    No particular reason! I just have an aesthetic fondness for that Greek character, to be honest!

    And, yes, as I said, any non-numeric character will suffice.

    And thanks for the recommendation re autocorrect, though I guess we should really only consider going to such lengths if we’re going to be using such a symbol a lot in our formulas!

    Welcome to the site and thanks for your contribution!

  6. @Ikkeman

    Yes, that array would be a more logical construction. Thanks! And re the array I chose, as I said in the actual post, I only chose 13 for rather arbitrary reasons!

    And you’re correct that your array is also a more flexible choice, in that, as you say, it would allow us to search for any combination of numerics/non-numerics.

    Perhaps I will make that the subject of a follow-up post to this one at some point!

    Excellent contribution! Many thanks. 🙂

  7. Is there are solution for extracting number which may be with decimal point? For example, to extract 123.45 from “Parameter A is 123.45 on item #123456 since 01/01/15”?

    I think this formula from post won’t work, but cannot imagine solution right now…

  8. @Siva

    Thanks, but not sure I understand. What do you get with your formula when A1 contains the string I gave at the start of this post?


  9. =MID(A1,MATCH(1,INDEX(((TRIM(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),8))+0)>99999)*1,0),0)+1,6)

    this is the formula I used not

  10. Ok, but I’m afraid it still doesn’t work in general, and only works by pure coincidence for the string I provided. Did you not test on other strings?

    Change the string in A1 to e.g.:

    20/04/15 – VAT Reg: 2345678: Please send 123456 against Order #98765, Customer Code A123XY, £125.00

    and your formula returns 234567, which is obviously not correct.


  11. =MID(A1,MATCH(6,FREQUENCY(ISNUMBER(MID(A1,ROW(1:256),1)+0)*ROW(1:256),ISERROR(MID(A1,ROW(1:256),1)+0)*ROW(1:256)),0)-6,6)
  12. Better, though not guaranteed to work in all circumstances. For example, using 256 within the ROW construction, if the number of numerics and non-numerics are 250 and 6 respectively, then you get incorrect results, e.g.:


    I admit that this is an unlikely scenario, though of course we should be rigorous.

    If you can iron out this issue then it will certainly make a nice alternative. Well done.


  13. =MID(A1,MATCH(6,MMULT(ISNUMBER(MID("#"&A1&"#",{0,1,2,3,4,5,6,7}+ROW(INDIRECT("1:"&LEN("#"&A1&"#")-7)),1)+0)*1,{7;1;1;1;1;1;1;7}),0),6)

    I hope the above will solve the problem

  14. @Siva

    But – apart from a few minor details – that’s precisely the same solution as mine!

    I like your idea of using a single MID with a static array ({0,1,2,3,4,5,6,7}), though, rather than my double-MID approach. Very nice!


  15. Pingback: Extracting Certain Numbers for a String - Page 2

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.