Sorting a List Alphabetically (Without Filters) 17

In this post I would like to present a formula-based solution which returns an alphabetically-sorted list of the entries from a given range. Effectively, then, the formula gives equivalent results to those obtained using the in-built sort feature (though which, for whatever reasons, we may not be in a position to use).

For example, given the unsorted list in A2:A11 as below, we will return the ordered results as given in B2:B11.

Sorting a List Alphabetically (Without Filters)

You can download the workbook here.

(It is also worth pointing out that duplicated values within the range – e.g. if “Belinda” occupied both A2 and A8 – will not cause any issues with this set-up.)

Usually we might do this with some form of array formula, though I prefer the following non-array construction (in B2):

=LOOKUP(1,0/FREQUENCY(ROWS($1:1),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

How does it work?

Let’s take the formula in B8 as an example, which is, in that row:

=LOOKUP(1,0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

(The only part having changed naturally being the reference passed to the ROWS function.)

This part:

COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)

returns an array of values corresponding to, for each of the strings within the range, the number of strings within the range which either precede or equal that string alphabetically. Hence, the above resolves to:

{3;4;9;5;7;2;1;6;10;8}

where, for example, the 7th element in this array (1), the result for “Belinda”, is due to the fact that only one string within our range either precedes or equals alphabetically that string, that string of course being “Belinda” itself.

Or, to take another example, the 2nd element in this array (4), corresponding to “Laquita”, is a result of there being four names which either precede or equal alphabetically the string “Laquita”, those being of course “Belinda”, “Bula”, “Cathy” and “Laquita” itself.

We now pass this result as the bins_array to FREQUENCY, with the relative row number of the row in question (here 7, as given by ROWS($1:7)) as the data_array.

(Hopefully none of my readers are still using the suboptimal-to-say-the-least ROW function in such cases. If you are, read here quickly!)

Hence:

FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11))

which is:

FREQUENCY(7,{3;4;9;5;7;2;1;6;10;8})

gives:

{0;0;0;0;1;0;0;0;0;0;0}

We then reciprocate this array with a suitable choice (I chose 0; 1 would also suffice, though note that in that case it would be prudent to use a lookup_value of 2, not 1), such that:

0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11))

gives:

{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!}

and we are finally able to pass this array to LOOKUP, such that:

=LOOKUP(1,0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

which is now:

LOOKUP(1,{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!},$A$2:$A$11)

gives, by virtue of the only numerical within this array occupying the 5th position within that array, “Raymonde”, as desired.

Some readers may be wondering why, after obtaining the array:

{0;0;0;0;1;0;0;0;0;0;0}

I did not first pass this to MATCH, with 1 as the lookup_value and a match_type parameter of zero, and then pass the result to INDEX, with $A$2:$A$11 as the array, i.e.:

=INDEX($A$2:$A$11,MATCH(1,FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),0))

which, thanks to the FREQUENCY function providing the necessary array coercion, does not require CSE either.

The answer is simply down to a matter of taste: the above is a perfectly good alternative. Perhaps, though, and assuming that avoiding CSE is something we deem worthwhile, it might be borne in mind that we may not always be so fortunate as to have some coercing function such as FREQUENCY here, and that, in general, a construction involving MATCH is more likely to require CSE than the equivalent LOOKUP set-up. Still, I leave it to readers to decide for themselves which they prefer.

More to follow shortly!

17 comments

  1. Is the below not much more simpler

    =INDEX($A$2:$A$11,MATCH(ROW()-1,COUNTIF($A$2:$A$11,"<="&$A$2:$A$11),0))
  2. Hi sam and thanks for your comment.

    That depends what you mean by “simpler”. I take it your construction is an array formula?

    And how does your formula fare if there are duplicate names in there, for example if both A4 and A5 contain “Tomeka”?

    Also, I’m not sure about your use of ROW()-1 here, as I mentioned in the actual post. What if someone using your formula had their range not in A2:A11, but, for example, in D11:D20? Would they necessarily know that they have to make an amendment to the value being subtracted/added to this ROW construction? Besides, as I point out here, this is completely unecessary, as using ROWS does not require any thought as to such matters, amongst other benefits:

    https://excelxor.com/2014/08/25/row-vs-rows-for-consecutive-integer-generation/

    Having said that, apart from the issue re duplicates, your formula does of course give correct results. As I said in my preamble to the post, “Usually we might do this with some form of array formula, though I prefer the following non-array construction…“.

    Each to their own, I guess! ๐Ÿ™‚

    Cheers!

  3. Luckily, Google Sheets has a function just for this.

    The formula for this example would be =sort(A2:A11) or =sort(A2:A11,1,1)

  4. Great formula!

    I like that it is a short regular formula, not an array formula.

    The FREQUENCY function returns {0;0;0;0;1;0;0;0;0;0;0}, why canยดt LOOKUP use this array?

  5. @Oscar

    Thanks! Glad you like it!

    As to your question, unfortunately to use LOOKUP in this “regular” way requires that the array be sorted in ascending order, which we can never guarantee here.

    Basically, the way LOOKUP functions is that:

    1 If the values are sorted in ascending order, then:

    1.1 If the lookup_value is found within the lookup_vector, it will return the element in the result_vector from the position corresponding to that of the lookup_value in the lookup_vector.

    1.2 If the lookup_value is not found within the lookup_vector, it will return the element in the result_vector from the position corresponding to that of the last (from left to right) numerical value in the lookup_vector which is less than the lookup_value.

    2 If the values are not sorted in ascending order, then, providing that the lookup_value is greater than all of the values within the lookup_vector, it will return the element in the result_vector from the position corresponding to that of the last (from left to right) numerical value in the lookup_vector.

    For example:

    =LOOKUP(3,{1,2,3,4,5,6},{"A","B","C","D","E","F"})

    would return “C”, since the values in the lookup_vector are sorted in ascending order, but:

    =LOOKUP(10,{1,3,2,4,1,6},{"A","B","C","D","E","F"})

    would return “F”, since, although the lookup_vector is this time not sorted in ascending order, our lookup_value is greater than all of the values within that array, and so we look for the last numerical value within that array, i.e. 6, and return the element in the result_vector from the position corresponding to that value, i.e. the 6th.

    (Edit: comment amended from previous version due to misinformation.)

    Cheers

  6. XOR LX,

    thank you for explaining.

    Why is this formula then

    =LOOKUP(3,{5;1;4;1;6},{"A";"B";"C";"D";"E"})

    returning #N/A? It should return D, 1 is the last value in the array that is smaller (or equal) than 3?

  7. Sincere apologies.

    My example was a very poor one, since of course we must guarantee that our choice of lookup_value is greater than all of the values within the lookup_vector.

    If this is not the case, then the result of the function is quite unpredictable: it may be an #N/A error, or one of the values present in the array (though not necessarily the desired one).

    Very sorry for the confusion. I will amend my earlier post so that other readers are not also confused.

    Regards

  8. Hello. I have a large set of data that I need to organize where I need to get rid of every id number that does not repeat itself. How can I do that?

  9. @Lattice123

    Hi and welcome to the site!

    Can you give a vastly reduced example to illustrate what you mean, together with expected results?

    Regards

  10. Great formula. Note there’s a big ‘gotcha’: It will return incorrect results if your list contains a number. So for that reason, I’m going to suggest that this is perhaps a safer contstruct:

    =LOOKUP(1,0/FREQUENCY(ROWS($1:1),MMULT(--(data>=TRANSPOSE(data)),ROW(INDIRECT("A1:A"&ROWS(data)))^0)),data) 
  11. Hi XOR,

    =LOOKUP(1,0/FREQUENCY(ROWS($1:1),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

    Great formula.
    It save lot of time for me ๐Ÿ™‚

    I have seen so many interesting articles here.

    Thank you so much.

    Khalid

  12. I am late to the “party”, but had to comment.

    The benefits of a single value in the FREQUENCY data_array has never occurred to me.

    I can’t help but wonder if this might be a useful strategy elsewhere. I look forward to finding out.

    Thank you for another “think-outside-the-box” lesson.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s