Unique, Ordered List of Most Frequent Numbers in a Two-Dimensional Range 9

I recently received a request from James, who was interested in a formula-based solution to the following problem: given a two-dimensional range containing a mixture of numbers and empty cells (which I am defining as being either “genuinely” empty or as containing the null string “” as a result of formulas in those cells), generate a unique list of those numbers in order of their frequency within that range, with the most frequent first. What’s more, if two or more numbers occur the same number of times within that range, then they should be listed in order of their size from smallest to largest.

For example, for the dataset in A1:F6 below, we would return the list as given beginning in I1.

Unique, Ordered List of Most Frequent Numbers in a Two-Dimensional Range

(You can download the workbook here.)

We can do this as follows:

First go to Name Manager and define:

Range1 as:


(Or whatever the range in question happens to be.)

Arry1 as:


Arry2 as:


The formula in H1, used to determine the number of expected returns, is:


and the array formula** in I1 is:


How does it work?

I should first confess something to my readers, and that is that the above will almost certainly not present the most straightforward solution to this problem. In fact, that solution will be one in which the calculations used to determine the return for a given row will use the result of the formula in the preceding row. Readers may well be familiar with such constructions for returning unique values, which usually consist of some set-up along the lines of MATCH(0,COUNTIF(I$1:I1…, etc.

So what are my reasons for neglecting such a simple solution? Why go to the length of constructing an evidently more complex set-up when such an alternative exists?

Well, the reasons are two-fold: first of all, every time that I see that construction I feel a slight pang of distaste at the idea that the list of returns cannot begin in row 1. Pedantic? Perhaps, though it’s somehow always irked my rather idealistic streak. Secondly, and more importantly, I firmly believe that we should be able to generate a finalised array of values in-formula, the contents of which do not depend on the results of previous formulas. There is, for me, an almost “quasi-circular” idea to a set-up in which the formulas in a series are co-dependent: not quite “cheating”, as such, but nevertheless somehow non-ideal.

I’m aware that many of my readers may well disagree with me philosophically on this point, not to mention pragmatically. (Apologies especially to you, James, for whom this post was intended as a practical solution to your problem, but which you may now be thinking is something a touch more convoluted than it might have been.) However, it’s a subject on which I am quite principled (stubborn?), and so will take some convincing otherwise. Besides, there is, I hope readers will appreciate, value – both theoretical and applicable – in being able to derive arrays without the need for such a co-dependent set-up.

Anyway, to the solution, then. The formula in H1 is straightforward enough: a standard construction used to determine the number of distinct entries within the range. However, it is nevertheless worth deconstructing, especially as some readers may not be aware of the reason for the concatenation of the criteria parameter in COUNTIF with a null string. In this case, its resolution would be, dealing first with the numerator:


which is:


The denominator simply counts the number of times that each of the entries in Range1 occurs within that range. Readers should note the importance of first catenating the entries in the criteria parameter with the null string “”. The reason for this is that, when Excel evaluates a statement such as:


it first “resolves” the entries in the criteria parameter, and, since empty cells in worksheet ranges are, by default, “resolved” to zeroes, the above becomes:


Perhaps oddly, the range parameter is not similarly first “resolved” as such, and so the blanks within that array remain just that, the above then being equivalent to:


and so, since there are clearly no zeroes in the first array, the result here would be:


which would mean that, when we came to performing our division, the resulting array would contain several #DIV/0! entries, which of course would cause SUMPRODUCT itself to error.

By first catenating a null string to the end of the entries in our range, we coerce Excel into resolving those blanks as null strings, not as zeroes. Technically, when evaluating the formula (e.g. using the Evaluate Formula tool), it can be seen that the concatenation in:


is resolved as:


The paradoxical feature here is that, this time, Excel has not first resolved any empty cells within the range as zeroes, prior to concatenation with a null string. In fact, if this had been the case, then we would have:


and this time, since 0&”” resolves to the text string “0”, not “”, the above would become:


and so we would here be no better off than when we formed this construction without any prior concatenation.

The technical justification for such behaviour is perhaps not so easy to explain. Although following the steps with Evaluate Formula does indeed display an intermediate resolution of the range, in which we clearly see any blanks being “resolved” to zeroes, further concatenation of these zeroes does not produce a series of text strings, viz “0”, but rather the desired null strings, “”. Suffice to say that this presents another example of the dangers of “evaluating” formulas by repeated pressing of F9 within the formula bar, which would, in this case, lead to the erroneous results as described.

Returning to our “correct” version, we can now see that:




which is:


i.e. 6, as desired.

And so, by referencing this value in our main formulas, we can design it so that, for the sixth such formula and beyond, a blank will be returned. Hence the initial clause:


Let’s look at the main part of that construction, then, which is:


This part:


will generate an array of counts for each of the entries within our range, which makes sense since the crux of this problem is to return values based upon their frequencies within that range. The reason for the additional clause is to give us a means by which we can differentiate cases where two or more values occur equally frequently within our range. What’s more, this clause is designed so that, in such cases, the smaller of the values will be returned first, as stipulated.

The above resolves to:


Note that, were there any null strings (“”) within our range, then these would here resolve to #VALUE! errors upon multiplication. However, the clause preceding this part of the formula – IF(Range1<>”” – will in any case mean that these errors are not considered, and so we need not worry unduly at this stage.

The above then becomes:




Again, any errors (either #VALUE! from null strings or #DIV/0! from “genuinely” blank entries) will be resolved in the next step, such that:


resolves to the following, error-less array:


In this array, for example, the values of 3.000001, 3.00000016666667 and 3.00000025 represent, respectively, the three occurrences within Range1 of the values 1, 6 and 4, their decimal portions having been generated, as described, to allow for differentiation.

We now need a method which we can use to identify the unique values from this array and place them in descending order, viz:


and then match these to our original values, knowing that the above represent the six occurrences of 5, five of 2, three of 1, three of 4, three of 6 and one of 3 respectively.

In order to restrict our array to being one in which we consider unique values only, we use this part:


Before I explain the reasons for the slightly unconventional additions (+0, &0) to an otherwise standard construction, let’s first look at what this actually resolves to, which is the following array:


I have highlighted the non-zero values in the above and, if I just present this alongside our range, and highlight the entries “corresponding” to those non-zero values (strictly-speaking there is as yet no such correspondence: the above is a 37-row-by-1-column array, the below a 6-row-by-6-column array):


we can see that each of those non-zero values within the first array corresponds (again, ignoring the dimensionality of the second array for the time being) to the first occurrence within the second array of each of our distinct values. This applies equally to the value in the 2nd position of our first array, 15, which corresponds to the first occurrence of a blank within our second array.

And so we can use FREQUENCY in this way to give us a means by which we can reduce an array to one containing single occurrences only of each value within that array.

And, if we wish to use it in this way, it is important that we make the small amendments as I have here. The reason being that, if, for example, we had instead used simply:


then this would resolve to:


which would not be very useful when used in conjunction with an IF clause for generating an array of unique entries, since not only does it consist of a mere 22 entries, far less than the 36 in our range, but the non-zero elements in the above array are also, unlike as in the above case, not in correspondence with the first occurrences of each of our numbers within the range.

The reason for this reduced array is due to the functioning of FREQUENCY. Any blanks in the bins_array are ignored: as such, effectively we are evaluating this expression just as if that parameter were in fact:


And of course using a bins_array parameter comprising 21 elements will generate an array comprising one more than that number, i.e. 22, when passed to FREQUENCY.

By first coercing the values in our range to numerics (here via the addition of zero), so that, as we saw previously, any blanks are now considered numerical zeroes, we force FREQUENCY to consider equally those zeroes as legitimate values within the bins_array. Hence:


would here resolve to:


which is precisely the desired array as generated previously.

And the reason for the concatenation (&0) is to account for the possibility of any null strings ("") within our range, since, if we did not first perform this concatenation, then an expression such as ""+0 would produce an error, whereas (""&0)+0 would give the valid “0”+0, i.e. 0.

Note that this concatenation of our values with zero will technically enlarge them all by a factor of 10 (e.g. 6&0=60, etc.), though this is not something which should concern us since, in any case, the operation is being performed consistently over all elements within that array.

And so we can now pass this array generated from FREQUENCY to some IF statement such that the resulting array consists of distinct elements only. However, we are not quite there yet, the reason being that, if we were to attempt a construction such as:


which is what we are aiming for, this would resolve to:


which is:


then we might be disappointed on seeing that this resolves to:


Not at all the 36-element array we were hoping for!

The reason being of course that the two arrays being passed to IF here – that for the logical_test and that for the value_if_true parameter – are not of the same dimensions, the former being a 37-row-by-1-column array:


the latter a 6-row-by-6-column array:


And so, if we want to perform such a comparison, we must first re-dimension one of our ranges so that it is of a dimension equal to that of the other.

And, usually, the way in which we would approach such a scenario would be to take the two-dimensional matrix and transform it into one of a single dimension. Indeed, that is the approach I have generally taken, as can be seen from many of my other posts here in which such a re-dimensioning was necessary.

Here, however, I took the decision to perform a re-dimensioning in the opposite direction, i.e. to take the one-dimensional array and re-work it into one of two dimensions. Or, to be specific, to take our 37-row-by-1-column array and re-fashion it into a 6-row-by-6-column array, the end result of which will be that our IF statement will now be syntactically legitimate.

The reason for this decision was based on a mixture of practicality and curiosity: practical, since, in our required construction, the number of two-dimensional arrays is significantly greater than that of one-dimensional arrays, and so it seems like the more logical choice to transform the fewer cases of the latter into the former than vice versa. And curious, since the challenge of performing such a re-dimensioning was not something that I had yet attempted.

So – how exactly do we re-dimension a single-column array into a 6×6 matrix?

Well, in essence we simply INDEX that array with an appropriately-dimensioned and -valued array to pass as the row_num parameter. The resultant array will be one consisting of six rows and six columns, in which the entries are precisely those 36 as generated by our FREQUENCY construction (we can also here do away with the extraneous, 37th element generated by that function): the first element in that single-column array will now occupy the position in the first row, first column of our new matrix, the seventh element will now occupy row 2, column 1, the 34th row 6, column 4, etc., etc.

In fact, all in all our 37-row-by-1-column array, derived from FREQUENCY, i.e.:


will now become the following 6-row-by-6-column array:


and this will be achieved, as described, via passing an array of values to the row_num parameter of an INDEX function, appropriately coerced to return an array of values, that parameter array being:


(Note the dimensionality (6-by-6) of this array.)

How do we generate such an array, short of manually constructing it by hand? Generating a single-row or single-column vector consisting of the integers from 1 to 36 is straightforward enough (most of us are by now past masters with ROW(INDIRECT(“1:”&… constructions), but what about a 6-row-by-6-column matrix of these numbers, arranged as given above?

Well, quite possibly in many ways, and the one I chose is just one of those many, achieved via the following construction:


Firstly, recall that we defined Arry1 and Arry2 as:




respectively, which resolve to:




(The two naturally being co-orthogonal.)

We then insert these into our construction, which is now:


The number of columns in our range is simply 6, so this becomes:




and performing the transposition:


which is:


and, importantly, since the second array in this subtraction is orthogonal to the first, we guarantee that we will be performing, not just 6, but 36 different subtractions, each of the six elements in the second array being subtracted, in turn, from each of the six elements in the first. The final result being:


as desired.

We now pass this array as the row_num parameter to INDEX, as described, making sure that we employ the necessary technique to coerce INDEX to operate over an array of values (see here for more), such that:


is now:




which of course consists of precisely the same values as the non-INDEXED construction:




though appropriately re-dimensioned and minus that extraneous, 37th element

And we are now in a position to begin putting some of these pieces back into the original construction, such that:


is now:


which is, resolving the IF clause within the LARGE construction:


and resolving that LARGE function, here with ROWS($1:1) as the k parameter (i.e. 1):


which is:


and putting in the values from Range1:




i.e. 5, as required, the MIN function being used here simply to generate a single return from this array.

Thanks to James for being the inspiration for this post. More to follow shortly!


  1. Nice work! I struggled with trying to create a single array formula for this but didn’t get very far. I wonder how long it took to come up with this method? After working through the solution i think there may be a couple of other options as well:

    a) the named variables Arry1 and Arry2 could possibly be removed by making the following replacement in the formula:




    which returns an array of consecutive numbers with any type of data in Range1 (http://forum.chandoo.org/threads/create-an-array-of-numbers.13433/)

    b) the choice of 1/(10^6*Range1) is neat but won’t allow for 0 or small numbers less than 1E-6 reliably. A little longer but more robust might be

    1/RANK(Range1,Range1,1) or even -T.DIST(Range1,1,2).

    Great method overall anyway. (PS. could scroll bars be added for viewing longer formulas, i could only view them by copying and pasting. Thanks!)

  2. Wow! And I was honestly of the belief that we’d be stuck with ROW constructions for such tasks for eternity! Fascinating and novel approach.

    I confess that I need to brush up on my statistical functions – I have no idea why you’re using a value of 73 there!

    Still, this approach will open many new vistas I feel – it must have taken you some time (and creativity) to develop such a technique, no?

    And yes, I also confess to having been quite “lazy” with my 1/10^6 approach. Again, statistical alternatives prove far more appropriate and rigorous, and I especially like your use of T.DIST.

    Apologies re the new layout – I’m looking into the issues I’m experiencing now and will hopefully have it remedied soon. Can you just let me know in the meantime if what you see is affected by different zoom levels?

    Many thanks again for an especially insightful contribution. I look forward to hopefully implementing this TREND set-up (and perhaps others derived from it) at some point soon.


  3. Glad this was helpful. It’s not too difficult conceptually to come up with the idea. If we set intercept=0 in TREND, then to generate the required array x={1;2;…;n} we need to find a constant array of values y={k;k;…;k} which has slope=1. To find the value k requires some high school algebra…

    From the definition of least squares fit, the slope is:

    =(k*1 + k*2 + ... + k*n)/(1*1 + 2*2 + ... + n*n)
    =3k/(2n + 1)

    using the two standard formulas:

    1 + 2 + ... + n = n(n + 1)/2
    1² + 2² + ... + n = n(n + 1)(2n + 1)/6

    Therefore setting k=(2n+1)/3 gives a slope of 1. And in this case when n=36, k=73/3.

    fwiw, here’s a single array formula variation on the theme (after further struggling!):

    =LARGE(Range1,MODE.MULT(IFERROR(MOD(LARGE(IF(HLOOKUP(0,-FREQUENCY(Range1,N(IF({1},Range1))),intRange),COUNTIF(Range1,Range1)*10^6+RANK(Range1,Range1 )),intRange),10^6),""),intRange))



    Lastly, re:layout, zooming didn’t seem to make a difference but on android chrome i can see formulas better.

  4. Excellent explanation – thanks.

    And a nice, alternative solution, much more elegant than mine.

    I like the use of HLOOKUP here, and I’m also now starting to appreciate your use of MODE.MULT in such constructions, which allows us to return a vector of values from a two-dimensional range, whilst preserving the “order” of values within that range. Very clever. And of course we can simply pass the array generated from MODE.MULT to INDEX if we want the equivalent single-cell array set-up.

    Great stuff! Many thanks.

  5. Just for the fun of being post-creative.

    =SUBSTITUTE(LARGE(FREQUENCY(Range1;$G$1:$G$8)*100+$G$1:$G$8;G1);0;"no of ")&"s"

    –> 6no of 5s, 5no of 2s, …

    Neither 100% according to the instructions, or foolproof + added 1to7 in col. G.

  6. Hi.

    Not sure I understand. Your formula contains a reference to the range G1:G8, though this is not used in the original example I gave. Can you clarify?


  7. Sorry for the short comment; “…+added 1to7 in col.G”.

    I added 1 in G1, 2 in G2 … 7 in G7 and empty in G8.

    It was a very quick and dirty formula, just for fun (nothing more) – fully aware of all its ‘flaws’. It’s perfectly ok to remove it.

  8. Thanks, but still not sure I understand. Your formula returns “6no of 1s” in row 1, but there are only 3 1s within the range (?).

    It also returns, in row 3, “3no of 3s”, but there is only 1 occurrence of the number 3 within the range.

    Am I missing something here? Can you clarify?


  9. Pingback: #Excel Super Links #28 – shared by David Hager | Excel For You

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s