In this post I would like to present a formula-based solution which returns an alphabetically-sorted list of the entries from a given range. Effectively, then, the formula gives equivalent results to those obtained using the in-built sort feature (though which, for whatever reasons, we may not be in a position to use).

For example, given the unsorted list in A2:A11 as below, we will return the ordered results as given in B2:B11.

You can download the workbook here.

(It is also worth pointing out that duplicated values within the range – e.g. if “Belinda” occupied both A2 and A8 – will not cause any issues with this set-up.)

Usually we might do this with some form of array formula, though I prefer the following non-array construction (in B2):

=LOOKUP(1,0/FREQUENCY(ROWS($1:1),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

**How does it work?**

Let’s take the formula in B8 as an example, which is, in that row:

=LOOKUP(1,0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

(The only part having changed naturally being the reference passed to the ROWS function.)

This part:

COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)

returns an array of values corresponding to, for each of the strings within the range, the number of strings within the range which either precede or equal that string alphabetically. Hence, the above resolves to:

{3;4;9;5;7;2;1;6;10;8}

where, for example, the 7th element in this array (1), the result for “Belinda”, is due to the fact that only one string within our range either precedes or equals alphabetically that string, that string of course being “Belinda” itself.

Or, to take another example, the 2nd element in this array (4), corresponding to “Laquita”, is a result of there being four names which either precede or equal alphabetically the string “Laquita”, those being of course “Belinda”, “Bula”, “Cathy” and “Laquita” itself.

We now pass this result as the *bins_array* to FREQUENCY, with the relative row number of the row in question (here 7, as given by ROWS($1:7)) as the *data_array*.

(Hopefully none of my readers are still using the suboptimal-to-say-the-least ROW function in such cases. If you are, read here quickly!)

Hence:

FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11))

which is:

FREQUENCY(7,{3;4;9;5;7;2;1;6;10;8})

gives:

{0;0;0;0;1;0;0;0;0;0;0}

We then reciprocate this array with a suitable choice (I chose 0; 1 would also suffice, though note that in that case it would be prudent to use a *lookup_value* of 2, not 1), such that:

0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11))

gives:

{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!}

and we are finally able to pass this array to LOOKUP, such that:

=LOOKUP(1,0/FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),$A$2:$A$11)

which is now:

LOOKUP(1,{#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;0;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!},$A$2:$A$11)

gives, by virtue of the only numerical within this array occupying the 5th position within that array, “Raymonde”, as desired.

Some readers may be wondering why, after obtaining the array:

{0;0;0;0;1;0;0;0;0;0;0}

I did not first pass this to MATCH, with 1 as the *lookup_value* and a *match_type* parameter of zero, and then pass the result to INDEX, with $A$2:$A$11 as the *array*, i.e.:

=INDEX($A$2:$A$11,MATCH(1,FREQUENCY(ROWS($1:7),COUNTIF($A$2:$A$11,"<="&$A$2:$A$11)),0))

which, thanks to the FREQUENCY function providing the necessary array coercion, does not require CSE either.

The answer is simply down to a matter of taste: the above is a perfectly good alternative. Perhaps, though, and assuming that avoiding CSE is something we deem worthwhile, it might be borne in mind that we may not always be so fortunate as to have some coercing function such as FREQUENCY here, and that, in general, a construction involving MATCH is more likely to require CSE than the equivalent LOOKUP set-up. Still, I leave it to readers to decide for themselves which they prefer.

More to follow shortly!

Is the below not much more simpler

Hi sam and thanks for your comment.

That depends what you mean by “simpler”. I take it your construction is an array formula?

And how does your formula fare if there are duplicate names in there, for example if both A4 and A5 contain “Tomeka”?

Also, I’m not sure about your use of ROW()-1 here, as I mentioned in the actual post. What if someone using your formula had their range not in A2:A11, but, for example, in D11:D20? Would they necessarily know that they have to make an amendment to the value being subtracted/added to this ROW construction? Besides, as I point out here, this is completely unecessary, as using ROWS does not require any thought as to such matters, amongst other benefits:

https://excelxor.com/2014/08/25/row-vs-rows-for-consecutive-integer-generation/

Having said that, apart from the issue re duplicates, your formula does of course give correct results. As I said in my preamble to the post, “

Usually we might do this with some form of array formula, though I prefer the following non-array construction…“.Each to their own, I guess! ๐

Cheers!

Luckily, Google Sheets has a function just for this.

The formula for this example would be =sort(A2:A11) or =sort(A2:A11,1,1)

@isaiI should’ve guessed! ๐

Great Formula … which has already been widely adopted across the Channel…!!!

๐

Great formula!

I like that it is a short regular formula, not an array formula.

The FREQUENCY function returns {0;0;0;0;1;0;0;0;0;0;0}, why canยดt LOOKUP use this array?

@OscarThanks! Glad you like it!

As to your question, unfortunately to use LOOKUP in this “regular” way requires that the array be sorted in ascending order, which we can never guarantee here.

Basically, the way LOOKUP functions is that:

1If the values are sorted in ascending order, then:1.1If thelookup_valueis found within thelookup_vector, it will return the element in theresult_vectorfrom the position corresponding to that of thelookup_valuein thelookup_vector.1.2If thelookup_valueisnotfound within thelookup_vector, it will return the element in theresult_vectorfrom the position corresponding to that of thelast(from left to right) numerical value in thelookup_vectorwhich is less than thelookup_value.2If the values arenotsorted in ascending order, then,providing that the, it will return the element in thelookup_valueis greater than all of the values within thelookup_vectorresult_vectorfrom the position corresponding to that of thelast(from left to right) numerical value in thelookup_vector.For example:

would return “C”, since the values in the

lookup_vectorare sorted in ascending order, but:would return “F”, since, although the

lookup_vectoris this time not sorted in ascending order, ourlookup_valueis greater than all of the values within that array, and so we look for the last numerical value within that array, i.e. 6, and return the element in theresult_vectorfrom the position corresponding to that value, i.e. the 6th.(

Edit:comment amended from previous version due to misinformation.)Cheers

XOR LX,

thank you for explaining.

Why is this formula then

returning #N/A? It should return D, 1 is the last value in the array that is smaller (or equal) than 3?

Sincere apologies.

My example was a very poor one, since of course we must guarantee that our choice of

lookup_valueis greater thanallof the values within thelookup_vector.If this is not the case, then the result of the function is quite unpredictable: it may be an #N/A error, or one of the values present in the array (though not necessarily the desired one).

Very sorry for the confusion. I will amend my earlier post so that other readers are not also confused.

Regards

Hello. I have a large set of data that I need to organize where I need to get rid of every id number that does not repeat itself. How can I do that?

@Lattice123Hi and welcome to the site!

Can you give a vastly reduced example to illustrate what you mean, together with expected results?

Regards

Great formula. Note there’s a big ‘gotcha’: It will return incorrect results if your list contains a number. So for that reason, I’m going to suggest that this is perhaps a safer contstruct:

Hi XOR,

Great formula.

It save lot of time for me ๐

I have seen so many interesting articles here.

Thank you so much.

Khalid

@Khalid NGOVery kind of you to say so, my friend. ๐ Many thanks.

I am late to the “party”, but had to comment.

The benefits of a single value in the FREQUENCY data_array has never occurred to me.

I can’t help but wonder if this might be a useful strategy elsewhere. I look forward to finding out.

Thank you for another “think-outside-the-box” lesson.

@DaveThanks! There’s another practical use for this construction for finding the maximum:

Cheers