Advanced Formula Challenge #7: Asterisk and Obelisk 17

The challenge this week is as follows: given a string in A2, return the single-word substring which immediately precedes the only occurrence within that string of a substring of the form XabcdY, where X and Y are either an asterisk or an obelisk and a, b, c and d are integers from 0 to 9.

For example:

Asterisk and Obelisk

Notes:

  • The string is delimited by single spaces
  • Excluding the one substring of the form XabcdY, each of the other substrings may or may not have an asterisk or an obelisk at each end (providing of course that this will not result in more than one substring of the form XabcdY)

Solution next week. Best of luck!

17 comments

  1. Hi again!

    I Create a Defined Name Arr, with this formula:

    SEARCH({"~*","†"}&"????"&{"~*";"†"},$A2)

    And I create the “main” formula in B2 (CSE):

    =TRIM(MID(SUBSTITUTE(A2," ",REPT(" ",LEN(A2))),MIN(IF(ISNUMBER(Arr),IF(ISNUMBER(--MID(A2,Arr+1,4)),LEN(REPLACE(A2,Arr-1,LEN(A2)-Arr+2,""))-LEN(SUBSTITUTE(REPLACE(A2,Arr-1,LEN(A2)-Arr+2,"")," ","")))))*LEN(A2)+1,LEN(A2)))

    I mention that I’m from Cali, Colombia and I love this website because there are many interesting techniques to learn. The formulas I produce in the Spanish, and translate them into English to post here. Sorry if I make mistakes in language or in my way of expressing myself. Thank you for sharing your knowledge!

    Blessings, and please continue like this!

  2. @Isai Alvarado

    Welcome to the site!

    And many thanks for the file. I’ll try to take a look at it at some point soon. Perhaps in future you could post some of your solutions here so that everyone can see them?

    Also, I’m afraid I don’t use Googledocs, and when I try to download your workbook as a normal Microsoft Excel file then I get a workbook with just values, and no formulas. How can I convert it so that I can see your original formulas?

    Thanks again!

  3. @John Jairo V

    Hi again, John. I just tried your solution and I don’t get all the correct results for the examples I gave.

    For the string Alpha †Beta* *123† Gamma Delta *2014† Epsilon *12* I get a result of Alpha using your set-up, though the result should be Delta.

    For 2013 Kappa *Lambda† †Mu† 2000* *0001* your formula gives 2013, though the result should be 2000*.

    And for Nu 1999 *Xi* Omicron *2011* Pi Rho 1997 your formula gives Nu, though the result should be Omicron.

    Remember, the desired extraction always immediately precedes the one case of a string where the 1st and 6th characters are either an asterisk or an obelisk, and the 2nd, 3rd, 4th and 5th characters are numbers.

    Regards

  4. Ah. I understand what happened. Apologies on behalf of the comment editor here!

    I have corrected it, both above and in your original post. I will have a look now at your solution.

    Many thanks.

  5. @John Jairo V

    Nearly. Works on the examples I gave, though will not work on all strings, unfortunately.

    You need to be careful with this technique for identifying numbers. For example, what if the string in A2 was:

    Alpha *Jan3† †Beta* *123† Gamma Delta *2014† Epsilon *12*

    ?

    (Or whatever the equivalent is in Spanish for Jan3 – Ene3?)

    Or:

    *Beta† †2e10* †Zeta †2004† Eta Theta 2008 †Iota†

    The point being that dates and scientific notation will equally meet your criterion:

    ISNUMBER(--MID(A2,Arr+1,4))

    See here for a discussion on the drawbacks of this technique for identifying numbers in strings:

    https://excelxor.com/2014/09/14/extracting-numbers-from-a-string-1-consecutive-numbers-at-start/

    Regards

  6. Hi!

    I understood your point and is valid totally. I tried correct that part and (isnumber–…) and I couldn’t (it seems indirect not work well with arrays). Maybe you can show me how to do it with my construction… and later, show your answer (I imagine that your’s is better than mine).

    Blessings!

  7. @XOR LX

    Unfortunately, the only way to see the full formulas (you can see part of them by clicking on the cells) is to make a copy by going to File -> Make a copy (after signing into your Google account). The solutions use a few functions that may not be in Excel, so exporting it may result in errors. Here is my solution for Challenge 7.

    =regexextract(A2,”((?:†|\*)?\w+(?:†|\*)?)[^\w]*(?:†|\*)\d{4}(?:†|\*)”)

    The regexextract function extracts matching substrings with regular expressions. Here I extract one or more consecutive word characters, \w+ ,with or without an asterisk or obelisk on either side, ((?:†|\*)?\w+(?:†|\*)?) . Also, it must be before 0 or more non-word characters, [^\w]*, and also 4 digits with an asterisk or obelisk on both sides, (?:†|\*)\d{4}(?:†|\*).

  8. I try this (in the context of the formula that I post before:

    = … MIN(IF(ISNUMBER(Arr);IF(COUNT(–MID(A2,ROW(INDIRECT(Arr+1&”:”&Arr+4)),1))=4; … )))

    The indirect part is not working…

  9. @John Jairo V

    Actually, John, I’ve noticed a bigger problem with your set-up which I didn’t earlier. And that is that you’re using the SEARCH function which, as you probably know, returns the position of the first occurrence of the search string within the string.

    This means that, if the pattern, say *????*, occurs more than once in the string, and the one containing the numbers is not actually the first such occurrence, but the second or third, then your set-up will not give the desired result. For example, this string:

    2013 Kappa *Lambda† †Mu† 2000* *Beta* *0001*

    I like your idea of using SEARCH, though, and of first forming an array of search strings with this concatenation. I’m just not sure you can tweak it to overcome this issue?

    Regards

  10. @Isai Alvarado

    Thanks, Shame about the Googledocs issue. I never really considered that platform when I set up this site, and I think it will be difficult to now expand it to incorporate the additional functions offered in the Google version.

    So unless you can adapt your solutions there to also work for Microsoft Excel then I’m afraid I won’t be able to analyse/comment upon them. Which is a shame as I’m sure you’ve put a lot of hard work in to developing your answers.

    Also, re regex, are you referring to a VBA approach here? If so, then sadly I’m going to have to disappoint you a second time: I set up this site with the aim of presenting ideas/techniques using worksheet formulas alone. (I know, it doesn’t say that anywhere on this site – perhaps I should add something to the subtitle…)

    I’m fully aware that such a philosophy means that I am not necessarily giving the most “efficient” techniques for a given problem, and that there are many “better” ways of obtaining the solutions to the challenges I set, e.g. VBA, Pivot Tables, etc., though I guess I just want to see what is possible using formulas alone, to push the boundaries of what can be achieved with that as a caveat.

    And I happen to think that it’s a lot more than most people imagine, too.

    Anyway, sorry again! I hope I haven’t completely dissuaded you from visiting again! 🙂 Your thoughts and solutions are always welcome – if only they could be purely (Excel) formula-based!

    Regards

  11. @XOR LX

    Hi,

    Thanks for the responses. All my answers are pure worksheet formulas (no pivot tables/VBA/JavaScript/etc.). However, they use some functions not in Excel, so I am sorry about that. Also, it is ok if you do not comment haha. I will continue to have fun coming up with answers to your fun challenges, although I like sharing what Google Sheets can do. When it comes to extracting text, the Google Sheets functions REGEXEXTRACT and REGEXREPLACE are very useful.

  12. Ah, apologies then Isai!

    I really know very little about the Google version. I had always presumed that it was practically identical to Excel, but it sounds like there are some significant differences. It also sounds as if some of those differences may be “improvements” over the Excel functions, especially the two REGEX examples you give.

    Perhaps I will one day learn to use this version myself. In the meantime, I’m glad that you’re still going to be visiting here and enjoying the challenges!

    Cheers

  13. @John Jairo V

    No. The issue is not so much that INDIRECT cannot process arrays – it can. It’s more that you cannot use the “:” syntax to create an array of range references.

    I think also that you will find it difficult to further manipulate the Defined Name you have created, Arr, mainly because this is not a one-dimensional range, but a 2-row-by-2-column array, e.g. of the form: {a,b;c,d}.

    Of course, this could be first re-dimensioned so that it is a one-dimensional array, e.g. to {a;b;c;d}, a 4-row-by-1-column array, so that you can then manipulate it more easily.

    However, you would still not be able to employ the “:” syntax above. You could, however, after it has been re-dimensioned, apply a further MID function to the resulting array with an array of start_num parameters from 1 to 4 and test that all of these are numerical.

    However, you still have the larger issue that I outlined in my previous comment, since using SEARCH in this way limits you to considering the first occurrence only of each of the four string types *????*, †????*, *????† and †????†.

  14. Pingback: Advanced Formula Challenge #7: Results and Discussion | EXCELXOR

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s