In this post I shall present a method for generating a unique, alphabetical list in a single column from data contained within a contiguous range comprising several columns.

For example, given the dataset below in A2:E5, we will return that list beginning in cell G1:

We do this as follows:

First go to Name Manager and define:

*Range1* as:

=$A$2:$E$5

*Arry1* as:

=ROW(INDIRECT("1:"&COLUMNS(Range1)*ROWS(Range1)))

*Arry2* as:

=1+INT((Arry1-1)/COLUMNS(Range1))

*Arry3* as:

=1+MOD(Arry1-1,COLUMNS(Range1))

*Arry4* as:

=INDEX(Range1,N(IF(1,Arry2)),N(IF(1,Arry3)))

The formula in H1, used to determine the number of returns, is:

=SUMPRODUCT((Range1<>"")/COUNTIF(Range1,Range1&""))

and the **array formula** in G1 is:

=IF(ROWS($1:1)>$H$1,"",INDEX(Arry4,MATCH(SMALL(IF(FREQUENCY(IF(Range1<>"",MATCH(Range1,Arry4,0)),Arry1),COUNTIF(Range1,"<"&Arry4)),ROWS($1:1)),IF(Arry4<>"",COUNTIF(Range1,"<"&Arry4)),0)))

**How does it work?**

The formula in H1 is reasonably straightforward, and a standard construction in such cases. We can see that:

=SUMPRODUCT((Range1<>"")/COUNTIF(Range1,Range1&""))

resolves to:

=SUMPRODUCT(({"Due","","Otto","","Otto";"","","","","Tre";"Sei","Cinque","","Quattro","Otto";"Due","","","Quattro","Otto"}<>"")/COUNTIF(Range1,Range1&""))

which is:

=SUMPRODUCT({TRUE,FALSE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,FALSE,TRUE,TRUE;TRUE,FALSE,FALSE,TRUE,TRUE}/COUNTIF(Range1,Range1&""))

and then resolving the COUNTIF portion, which is simply counting how many times each of the entries in *Range1* occurs within that range, gives:

=SUMPRODUCT({TRUE,FALSE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,FALSE,TRUE,TRUE;TRUE,FALSE,FALSE,TRUE,TRUE}/{2,9,4,9,4;9,9,9,9,1;1,1,9,2,4;2,9,9,2,4})

which becomes, after the division:

=SUMPRODUCT({0.5,0,0.25,0,0.25;0,0,0,0,1;1,1,0,0.5,0.25;0.5,0,0,0.5,0.25})

i.e. 6, as required.

And so, by referencing this value in the main formula as follows:

=IF(ROWS($1:1)>$H$1,"",

we know that, beyond the sixth row to which we copy this formula we will return a blank, as required.

So let’s now look at the main clause of that formula then, which recall is:

INDEX(Arry4,MATCH(SMALL(IF(FREQUENCY(IF(Range1<>"",MATCH(Range1,Arry4,0)),Arry1),COUNTIF(Range1,"<"&Arry4)),ROWS($1:1)),IF(Arry4<>"",COUNTIF(Range1,"<"&Arry4)),0))

This is actually a relatively standard construction for extracting unique, alphabetical values, the only (and important) difference being that our range is not a single-column, one-dimensional range, but rather a two-dimensional one.

However, in principle the technique is the same: what is first required is some “re-dimensioning” of our two-dimensional range into one of a single dimension, after which we can apply the usual constructions to obtain our desired results.

So let’s first look at how we achieve that then, which is given by *Arry4*, i.e.:

INDEX(Range1,N(IF(1,Arry2)),N(IF(1,Arry3)))

Here, we are simply going to INDEX each and every element in our two-dimensional range. However, we are going to do this in such a way so that the resulting array, although containing precisely the same elements as that two-dimensional range, will in fact comprise an array consisting of just one dimension, i.e. a single-column vector.

In order to deconstruct *Arry4*, we need to first look at *Arry2* and *Arry3*, which contribute to the *row_num* and *column_num* of the INDEX function respectively.

Each of these references *Arry1*, so let’s first look at that, which is:

=ROW(INDIRECT("1:"&COLUMNS(Range1)*ROWS(Range1)))

The number of columns and rows in *Range1* are 5 and 4 respectively, so this becomes:

ROW(INDIRECT("1:"&5*4))

which is simply:

{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20}

i.e. an array of integers from 1 up to the product of the number of rows and number of columns in our original dataset.

We can now look at *Arry2*, which we defined as:

=1+INT((Arry1-1)/COLUMNS(Range1))

Substituing in *Arry1* and also the number of columns in *Range1*:

1+INT(({1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20}-1)/5)

which is:

1+INT({0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19}/5)

and resolving first the division:

1+INT({0;0.2;0.4;0.6;0.8;1;1.2;1.4;1.6;1.8;2;2.2;2.4;2.6;2.8;3;3.2;3.4;3.6;3.8})

and then taking integer values gives us:

1+{0;0;0;0;0;1;1;1;1;1;2;2;2;2;2;3;3;3;3;3}

i.e.:

{1;1;1;1;1;2;2;2;2;2;3;3;3;3;3;4;4;4;4;4}

Before I explain the reason for generating this array, let’s first look at *Arry3*, which is in any case a very similar construction, though using MOD in place of INT, i.e.:

=1+MOD(Arry1-1,COLUMNS(Range1))

And, similarly to the resolution of *Arry2*, this becomes:

1+MOD({0;1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19},5)

i.e.:

1+{0;1;2;3;4;0;1;2;3;4;0;1;2;3;4;0;1;2;3;4}

which is finally:

{1;2;3;4;5;1;2;3;4;5;1;2;3;4;5;1;2;3;4;5}

If we look at *Arry4* now, inserting these arrays for our INDEX parameters, we have:

INDEX(Range1,N(IF(1,{1;1;1;1;1;2;2;2;2;2;3;3;3;3;3;4;4;4;4;4})),N(IF(1,{1;2;3;4;5;1;2;3;4;5;1;2;3;4;5;1;2;3;4;5})))

Readers who have read my post here will be aware of this technique for coercing INDEX to return an array of values, and so the above becomes:

INDEX(Range1,{1;1;1;1;1;2;2;2;2;2;3;3;3;3;3;4;4;4;4;4},{1;2;3;4;5;1;2;3;4;5;1;2;3;4;5;1;2;3;4;5})

And hopefully the reason for generating these arrays for each of INDEX’s parameters now becomes clear since, if we look at pairs from these two arrays which we will be passing to INDEX, we see that the above construction is equivalent to performing each of:

INDEX(Range1,1,1)

INDEX(Range1,1,2)

INDEX(Range1,1,3)

INDEX(Range1,1,4)

INDEX(Range1,1,5)

INDEX(Range1,2,1)

INDEX(Range1,2,2)

…

all the way up to:

INDEX(Range1,4,5)

In effect, then, we are doing nothing other than the equivalent of INDEXing each and every element in our array precisely once.

Hence, the end result is that *Arry4* will be:

{"Due";"";"Otto";"";"Otto";"";"";"";"";"Tre";"Sei";"Cinque";"";"Quattro";"Otto";"Due";"";"";"Quattro";"Otto"}

Now, if I just temporarily present this alongside the Excel interpretation of *Range1*, which is:

{"Due","","Otto","","Otto";"","","","","Tre";"Sei","Cinque","","Quattro","Otto";"Due","","","Quattro","Otto"}

we can see that there is no difference whatsoever in the values within these two arrays. The only thing which *is* different is that, whereas *Range1* comprises a 4-row-by-5-column two-dimensional array, *Arry4*, which was derived by simply INDEXing each and every element in *Range1*, is actually a one-dimensional 20-row-by-1-column range.

Of course, we could equally have made it so that *Arry4* was 1-row-by-20-column range, but that’s a moot point: whichever we choose, the net result is that we have achieved the “re-dimensioning” of our range from one of two dimensions to one of just one.

This means that we are now at liberty to operate on this array with all the usual techniques at our disposal which are applicable to one-dimensional ranges though which fail when extended to arrays comprising e.g. multiple columns.

If we return to our main construction:

INDEX(Arry4,MATCH(SMALL(IF(FREQUENCY(IF(Range1<>"",MATCH(Range1,Arry4,0)),Arry1),COUNTIF(Range1,"<"&Arry4)),ROWS($1:1)),IF(Arry4<>"",COUNTIF(Range1,"<"&Arry4)),0))

we can now start to tackle it with no fear of our two-dimensional range being an issue.

Taking this part first:

IF(Range1<>"",MATCH(Range1,Arry4,0))

and first resolving the condition being passed to IF:

IF({TRUE,FALSE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,FALSE,TRUE,TRUE;TRUE,FALSE,FALSE,TRUE,TRUE},MATCH(Range1,Arry4,0))

and then inserting our known arrays *Range1* and *Arry4* into the MATCH function:

IF({TRUE,FALSE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,FALSE,TRUE,TRUE;TRUE,FALSE,FALSE,TRUE,TRUE},MATCH({"Due","","Otto","","Otto";"","","","","Tre";"Sei","Cinque","","Quattro","Otto";"Due","","","Quattro","Otto"},{"Due";"";"Otto";"";"Otto";"";"";"";"";"Tre";"Sei";"Cinque";"";"Quattro";"Otto";"Due";"";"";"Quattro";"Otto"},0))

Resolving this array of MATCH functions:

IF({TRUE,FALSE,TRUE,FALSE,TRUE;FALSE,FALSE,FALSE,FALSE,TRUE;TRUE,TRUE,FALSE,TRUE,TRUE;TRUE,FALSE,FALSE,TRUE,TRUE},{1,#N/A,3,#N/A,3;#N/A,#N/A,#N/A,#N/A,10;11,12,#N/A,14,3;1,#N/A,#N/A,14,3})

and then finally the IF function gives us the following array:

{1,FALSE,3,FALSE,3;FALSE,FALSE,FALSE,FALSE,10;11,12,FALSE,14,3;1,FALSE,FALSE,14,3}

This array is then passed to FREQUENCY as its first parameter, with *Arry1* as its second, so that:

FREQUENCY(IF(Range1<>"",MATCH(Range1,Arry4,0)),Arry1)

becomes:

FREQUENCY({1,FALSE,3,FALSE,3;FALSE,FALSE,FALSE,FALSE,10;11,12,FALSE,14,3;1,FALSE,FALSE,14,3},Arry1)

and putting in *Arry1* in full:

FREQUENCY({1,FALSE,3,FALSE,3;FALSE,FALSE,FALSE,FALSE,10;11,12,FALSE,14,3;1,FALSE,FALSE,14,3},{1;2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20})

and finally resolving this FREQUENCY construction produces the following array:

{2;0;4;0;0;0;0;0;0;1;1;1;0;2;0;0;0;0;0;0;0}

This is quite a standard technique we have used: the position of the non-zero values in the above array represent the first occurrence within that array of each distinct value in our range, and so gives us a means by which to return unique values only.

And so we use this array as the condition within an IF statement:

IF(FREQUENCY(IF(Range1<>"",MATCH(Range1,Arry4,0)),Arry1),COUNTIF(Range1,"<"&Arry4))

which is now:

IF({2;0;4;0;0;0;0;0;0;1;1;1;0;2;0;0;0;0;0;0;0},COUNTIF(Range1,"<"&Arry4))

The COUNTIF, which is going to be used to determine our alphabetic sorting, can now be resolved:

IF({2;0;4;0;0;0;0;0;0;1;1;1;0;2;0;0;0;0;0;0;0},{1;0;3;0;3;0;0;0;0;10;9;0;0;7;3;1;0;0;7;3})

which becomes:

{1;FALSE;3;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;10;9;0;FALSE;7;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE}

Our construction now looks like:

INDEX(Arry4,MATCH(SMALL({1;FALSE;3;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;10;9;0;FALSE;7;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE},ROWS($1:1)),IF(Arry4<>"",COUNTIF(Range1,"<"&Arry4)),0))

and, resolving the SMALL function, whose *k* parameter is here given by ROWS($1:1), i.e. 1:

INDEX(Arry4,MATCH(0,IF(Arry4<>"",COUNTIF(Range1,"<"&Arry4)),0))

The COUNTIF construction we have just dissected, so we can insert that here as well:

INDEX(Arry4,MATCH(0,IF(Arry4<>"",{1;0;3;0;3;0;0;0;0;10;9;0;0;7;3;1;0;0;7;3},0))

and resolving the IF statement:

INDEX(Arry4,MATCH(0,{1;FALSE;3;FALSE;3;FALSE;FALSE;FALSE;FALSE;10;9;0;FALSE;7;3;1;FALSE;FALSE;7;3},0))

which is clearly:

INDEX(Arry4,12)

And if we put in the values from *Arry4*:

INDEX({"Due";"";"Otto";"";"Otto";"";"";"";"";"Tre";"Sei";"Cinque";"";"Quattro";"Otto";"Due";"";"";"Quattro";"Otto"},12)

which is “Cinque”, as required.

In Google Sheets, you could perform this task with

You add char(9) to the end of all the cells, then bring them together with concatenate and split them to create one row of values. Then you transpose, remove duplicates with the unique function, and sort it. If you wanted to limit to 6, since this is text you could use

Thanks again, Isai!

And yet again, you leave me thinking that I must be missing out on something by not having made the transition/extension to Google Sheets – everything looks so effortless!

Even if I can’t fully appreciate the solutions you post here, I sincerely hope that there will be others who visit this site who can.

Many thanks and regards

Hello

If I translate this part …

MATCH(Range1,Arry4,0)

… into German and resolve it the result is:

{1.#NV.3.#NV.3;#NV.#NV.#NV.#NV.10;11.12.#NV.14.3;1.#NV.#NV.14.3}

If I replace Arry4 with the array of values …

{“Due”;0;”Otto”;0;”Otto”;0;0;0;0;”Tre”;”Sei”;”Cinque”;0;”Quattro”;”Otto”;”Due”;0;0;”Quattro”;”Otto”}

… then it resolves to:

{1.2.3.2.3;2.2.2.2.10;11.12.2.14.3;1.2.2.14.3}

. is for column

; is for row

Hi. I’m not quite sure what you mean here. Can you explain how, when you “resolved”:

you arrived at an array of:

I think the key question here is: exactly

howdid you resolve it?Regards

I used F9.

Can you post the “resolution” of the formula:

when you first resolve, using F9 within the formula bar, just the parts

Range1andArry4?Doing this, I get:

as I would expect. Further resolution (again using F9) gives:

Regards

If I resolve both Range1 and Arry4 I get:

=VERGLEICH({“Due”.0.”Otto”.0.”Otto”;0.0.0.0.”Tre”;”Sei”.”Cinque”.0.”Quattro”.”Otto”;”Due”.0.0.”Quattro”.”Otto”};{“Due”;0;”Otto”;0;”Otto”;0;0;0;0;”Tre”;”Sei”;”Cinque”;0;”Quattro”;”Otto”;”Due”;0;0;”Quattro”;”Otto”};0)

. is for column

; is for row

I’m afraid I’m not quite sure what you’re doing. Some issue with “resolving” “blanks” to “zeroes”. In any case, I take it you’re aware of the potential dangers of “resolving” formulas this way (i.e. using F9) and not via the

Evaluate Formulatool? See here for more:https://excelxor.com/2014/10/09/coercing-array-returns-from-cse-resistant-formulas/

Regards

My problem is that the formula gives a different result for cell G1.

So I tried to locate the problem.

You are right. Where you have a “” I get a 0. And it’s doesn’t matter whether I use F9 or Evaluate Formula tool.

When I replace the blank cells in Range1 with =”” my results are the same as yours.

Sincere apologies. You were right. I was still using “” for the blanks for testing, which of course is a completely different matter.

The correct formula should be:

I will have to amend the post and analysis to reflect this important difference.

And I can’t thank you enough for having brought it too my attention. And apologies once again for the confusion.

Regards

Hi XOR LX,

Thanks for welcoming me here. Google Sheets is cool depending on what you wanna do. Excel is much better when it comes to print options, format options, pivot table options, and usually when you want more options haha. However, Google Sheets does have a lot of useful functions that Excel does not have, which may make some type of formulas a little shorter and perhaps a little easier to make after some practice. However, I know of a couple useful things you can’t do in Google Sheets, like 3D-references or combining the subtotal function with other functions to do conditional subtotaling (I am sure there is more since I do not know Excel very well).

Hello XOR LX,

Just found out about this very very interesting example … Really Great …!!!

Is there a tweak which could allow to produce the same list … but sorted by descending Frequency ?

Cheers

James

Hi James.

As per my e-mail to you, currently writing up your request in an actual post. Watch this space!

Cheers

Hi XOR LX,

The same formula works for a list of values in one column as well. Do you have any simpler formula to list down the unique values from one column or one row?

Regards,

Prasad DN

Is the best I could get by replacing the defined names

@ Prasad DNSure!

Using

Range1for your vertical source range (the formula would be different if instead your range was a horizontal one), the formula in H1 for the expected number of returns would this time be (CSE):And the main formula in G1 (again, CSE):

Copy down as required.

Hope that helps!

Cheers

@Prasad DNJust seen your posted formula. Excellent construction! Well done! I’m very impressed.

I would mention two small points. Firstly, it’s probably best to avoid the volatile INDIRECT set-up you use, as it’s not actually necessary here. See my construction, for example, for generating the

bins_arrayfor FREQUENCY.Secondly, you don’t actually need to repeat the conditional IF statement for MATCH’s

lookup_arrayhere (though you usually do, so I don’t blame you for including it again!).Great stuff! Thanks.

Hi Again,

I figured out one more method if data is single column:

Note: H10 will have first data entered manually.

and if data is single row, same formula with index(), skipping row_num:

Note: H10 will have first data entered manually.

Regards,

Prasad DN

@XOR LX Thank you for your inputs, will note them.

Regards,

Prasad DN

@Prasad DNI don’t understand your INDEX construction. How is that supposed to return a

unique,alphabeticallist?Can you give some examples of data in

Range1for which that works?Regards

Hi,

I was referring to the post in chandoo site:

http://chandoo.org/forum/threads/listing-smallest-pay-periods-in-order-multiple-duplicates-chandoo.23052/

I am sorry to have bypassing your main topic here with my query of listing unique items from single column/row data.

The formula will not sort, as I mentioned caveat in the post above, the data should be sorted.

Regards,

Prasad DN

@Prasad DNSo that solution requires

boththat the original list be sortedandthat you manually enter the first value?Forgive me, but that does not sound like the most dynamic or flexible of solutions!

Regards

Hi XOR LX,

I agree that it was not good solution. I am happy to share another solution which I learned from you post in below thread:

https://excelxor.com/2015/03/30/counting-rows-where-at-least-one-condition-is-met/#more-4557

Regards,

PrasadDN

@Prasad DNNot sure I understand. If I use FREQENCY on a range of text values (as they are in this post), then of course I don’t get the desired results.

Can you clarify? How does this new formula return the list starting in G1?

Regards

Pingback: Using the Excel TEXTJOIN Function To Return Unique Items In A One-Cell Delimited String From A 2D and 3D Range By David Hager | Excel For You