# Shortest Formula Challenge #3: World Populations 47

The challenge this week is as follows: based on the below data extract of populations for a selection of countries for the four years from 2010-2013 inclusive, a single formula in A1 to identify the Region which had the largest average yearly percentage increase in population over that period.

The answer here would be Sub-Saharan Africa. For readers wishing to corroborate their results, the average yearly percentage increase in population for that region was 2.9840070521% to 10 d.p.

Note that this is a shortest formula challenge, which means that readers should attempt to find not only a correct solution to the problem but also one which has the least number of characters as possible.

Any ranges must include both a row and column reference: A:F or 3:12, for example, are not acceptable. Named Ranges are also not permitted.

Solution next week. Best of luck!

1. ChrisBM says:

Feeling distinctly sluggish today but here is a start. (Terribly clunky I know)

`=INDEX(\$B\$4:\$B\$12,MATCH(MAX(SUMIF(\$B\$4:\$B\$12,B4:B12,\$F\$4:\$F\$12)/SUMIF(\$B\$4:\$B\$12,B4:B12,\$C\$4:\$C\$12)),SUMIF(\$B\$4:\$B\$12,B4:B12,\$F\$4:\$F\$12)/SUMIF(\$B\$4:\$B\$12,B4:B12,\$C\$4:\$C\$12),0))`
2. ChrisBM says:

Sorry, didn’t tidy up removing all \$ and 0… (still bad but at least not as long) 🙂

`=INDEX(B4:B12,MATCH(MAX(SUMIF(B4:B12,B4:B12,F4:F12)/SUMIF(B4:B12,B4:B12,C4:C12)),SUMIF(B4:B12,B4:B12,F4:F12)/SUMIF(B4:B12,B4:B12,C4:C12),))`
3. James says:

Hi again,

It seems my initial post has disappeared …

`=INDEX(B3:B12,MATCH(MAX((F3:F12/C3:C12)^(1/4)-1),(F3:F12/C3:C12)^(1/4)-1,0))`
4. Thanks, Chris. A few points:

1) The range is B3:B12, not B4:B12.

2) I need the average yearly change. What you have based your calculations on is the 2010 and 2013 figures only. Whilst I agree that, in general, the Region with the largest proportional increase in its population figure from 2010 to 2013 is almost certain to be the same Region which has the highest average yearly change, can you assert that this will always be the case?

For example, as unlikely a scenario as it is (or perhaps not?), let’s assume that the total population for a given Region was:

2010: 8,500,000
2011: 10,000,000
2012: 10,000,000
2013: 8,500,000

Based on your calculations, the ratio assigned to this region would be 1, since 8,500,000/8,500,000=1. Or, in other words, a 0% increase.

However, if we look at the year-on-year % changes, we get:

2010-11: 17.65%
2011-12: 0.00%
2012-13: -15.00%

And so the average change would be 0.88%.

Note that this figure is NOT zero, since, as a proportion of the starting value, going from 8,500,000 to 10,000,000 is a larger jump than going from 10,000,000 to 8,500,000; hence the fact that we see 17.65% and -15.00%, and not figures which “cancel each other out”, i.e. 17.65% and -17.65%, or 15.00% and -15.00%,

3) Do you really want to be using absolute references in a Shortest Formula Challenge? 🙂 EDIT: just seen your updated post, so ignore!

Regards

5. GreasySpot says:

For readers wishing to corroborate their results, the average yearly percentage increase in population for that region was 2.9840070521% to 10 d.p.

What does d.p. mean?

6. @GreasySpot

Good point – thanks. It’s an abbreviation for “decimal points”.

Cheers

7. GreasySpot says:

Ok call me an idiot if you must, but can I get the mathematical equation for the % growth from year to year. I am using (b-a)/a*100 which comes out with the correct answers using your explanation to ChrisBM. However I don’t come up with 2.9840070521% as you state above in the original problem set. I come up with 3.1761202070% for Sub-Saharan Africa.

8. ChrisBM says:

XOR LX – Thanks about the range, sorry, it seems like I’ve been very sloppy recently.

I agree conceptually that if you are talking about arithmetic average then the correct answer in your example is non-zero, but in reality it has not grown at all. I was considering the geometric average growth rate.

In essence, my formula should give the area with the highest average growth rate where the growth rate is the rate at which it would have to have grown from time 0 to time t+3. I feel it certainly is the most fair reflection of average growth rate.

Wikipedia discussion on the subject
http://en.wikipedia.org/wiki/Inequality_of_arithmetic_and_geometric_means

Chris

9. John Jairo V says:

Hi again!

This is my solution (not require CSE):

`=INDEX(B3:B12,MATCH(MAX(MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))-1,{1;1;1})/3),MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))-1,{1;1;1})/3,))`

Many characters… because repeat MMULT twice.

Blessings!

10. GreasySpot says:

WOW! Talk about a brain spasm. Reading through that material will definitely give you one.

11. @GreasySpot

For Sub-Saharan Africa:

2010 Total: 21,558,045
2011 Total: 22,210,006
2012 Total: 22,872,070
2013 Total: 23,546,083

2010-2011: = (22,210,006-21,558,045)/21,558,045 = 3.02421207488898%
2011-2012: = (22,872,070-22,210,006)/22,210,006 = 2.98092670483745%
2012-2013: = (23,546,083-22,872,070)/22,872,070 = 2.94688237662791%

Average change = (3.02421207488898%+2.98092670483745%+2.94688237662791%)/3 = 2.98400705211811%

What am I missing?

Regards

12. @ChrisBM

I completely agree with you that this may well not be the best statistical measure here. I confess that I did not give too much consideration to this matter, though I do feel (hope) that my preamble made it clear what type of average I was seeking (whether, practically-speaking, that may be an inferior measure or not to the one you outline).

In future I will be certain to give more credence to my challenges. For the time being, however, and whether or not you disagree with the statistical merits of such a result, the challenge will remain as it is. The idea was constructed more to challenge various Excel skills (as always here), rather than to begin a debate on the various forms of statistical representation. Having said that, I take your point, and will certainly, as I said, make future challenges more “realistic”.

Regards

13. GreasySpot says:

Well for one your original population numbers are different.

14. GreasySpot says:

Using the numbers in the last post you made I concur with your results but that was not the numbers you posted in the picture or the file.

15. John Jairo V says:

I Think that even if the following formula is not the percentage calculated as such, it could shorten a little to find the region as follows:

`=INDEX(B3:B12,MATCH(MAX(MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10)),{1;1;1})),MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10)),{1;1;1}),))`

Blessings!

16. GreasySpot says:

I will move forward knowing I have the math correct and you will sort out the data set. Cheers.

17. @John Jairo V

I get “East Asia & Pacific” using that formula (?)

Also, this part:

`MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))-1,{1;1;1})`

resolves to the array:

`{0;0;0;0;0;0;0;0;0;0}`

Is this what you intended?

Regards

18. @GreasySpot Different to?

19. GreasySpot says:

See above posts.

20. ChrisBM says:

You are of course right, I should not be trying to change the challenge. I apologise if it seemed rude.

Modifying my already ugly solution to an even longer more clunky one:

`=INDEX(B3:B12,MATCH(MAX((SUMIF(B3:B12,B3:B12,D3:D12)/SUMIF(B3:B12,B3:B12,C3:C12))+(SUMIF(B3:B12,B3:B12,E3:E12)/SUMIF(B3:B12,B3:B12,D3:D12))+(SUMIF(B3:B12,B3:B12,F3:F12)/SUMIF(B3:B12,B3:B12,E3:E12))),(SUMIF(B3:B12,B3:B12,D3:D12)/SUMIF(B3:B12,B3:B12,C3:C12))+(SUMIF(B3:B12,B3:B12,E3:E12)/SUMIF(B3:B12,B3:B12,D3:D12))+(SUMIF(B3:B12,B3:B12,F3:F12)/SUMIF(B3:B12,B3:B12,E3:E12)),))`
21. @ALL

Sincere apologies to everyone who has been posting today. I’ve had a few issues with comment posting and so not all may have appeared in the correct chronological order, which has obviously been extremely confusing for everyone. Sorted now, so hopefully won’t happen again. Sorry!

22. @GreasySpot

Sincere apologies. I did have a slightly different dataset when I first created this post: I retracted the post almost immediately in order to edit the data, and I (wrongly) presumed that no-one could have read it in that short time. However, evidently you did.

Very sorry for that.

Regards

23. @ChrisBM

No! Please – keep those sorts of comments coming! Despite the fact that I don’t want to now change the Excel challenge, that doesn’t mean that the points you make aren’t perfectly relevant, even instructive.

Besides, you’re evidently of a more statistical persuasion than I am (I knew I shouldn’t have used such a scenario for a formula challenge – I was never very strong in that field!). So thanks for the education in this respect.

Regards

24. @John Jairo V

Bit confused – the formula you posted is quite different to the one in that file.

It appears to be the OFFSET construction which differs: in your post here, this function is using a single cell as the reference parameter, e.g. D3, and there is also a height parameter of 10; in the version in your file, however, the OFFSETs are using a range as the reference parameter, e.g. D3:D12, and this time with no height parameter.

Can you clarify? And perhaps, when you’ve decided which version is the correct one, post it in a comment here?

Thanks a lot

25. @James

Yes – very sorry. Your post appears to have been caught in a backlog, even though you made it some time ago. Apologies.

Am I wrong, or – like ChrisBM – are you too making your own judgment on how this average should be measured here?

That’s fair enough, in real terms at least, and perhaps you could explain your reasoning for this construction with ^(1/4)? I for one would certainly be interested as I’ve clearly not grasped any of the statistical concepts/methods involved in such a question.

However, as I said to ChrisBM, despite my lack of understanding in this field (and so most likely a terrible choice of means with which to determine such an average), I would like it if people had a go at obtaining the results using the logic that I have, if possible.

Many thanks.

26. John Jairo V says:

@XOR LX

I post 3 version’s, all works for me. I don’t know if I make a mistake in the translation of the formula. Then I post the formula in spanish version:

Option 1:

`=INDICE(B3:B12;COINCIDIR(MAX(MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3:D12;;{0\1\2}))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3:C12;;{0\1\2}))-1;{1;1;1})/3);MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3:D12;;{0\1\2}))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3:C12;;{0\1\2}))-1;{1;1;1})/3;))`

Option 2:

`=INDICE(B3:B12;COINCIDIR(MAX(MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3;;{0\1\2};10))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3;;{0\1\2};10))-1;{1;1;1})/3);MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3;;{0\1\2};10))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3;;{0\1\2};10))-1;{1;1;1})/3;))`

Option 3:

`=INDICE(B3:B12;COINCIDIR(MAX(MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3;;{0\1\2};10))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3;;{0\1\2};10));{1;1;1}));MMULT(SUMAR.SI(B3:B12;B3:B12;DESREF(D3;;{0\1\2};10))/SUMAR.SI(B3:B12;B3:B12;DESREF(C3;;{0\1\2};10));{1;1;1});))`

And I explain:

Option 1 and Option 2 has the same characters. Of those two , I prefer the second (looks shorter).

The third option has 8 characters less… even is not the percentage calculated as such… it calculates the region with the max average.

The final choice is Option 3. Please comment! Blessings!

27. @John Jairo V

Thanks. Think I understand now. Let me have a look at each of your solutions and get back to you.

Regards

28. John Jairo V says:

Jajajajaja! Sorry! I made a mistake in the references in the second SUMIF, for the translation.

Here the formulas in English:

Option 1 (length: 244)

`=INDEX(B3:B12,MATCH(MAX(MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3:D12,,{0,1,2}))/SUMIF(B3:B12,B3:B12,OFFSET(C3:C12,,{0,1,2}))-1,{1;1;1})/3),MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3:D12,,{0,1,2}))/SUMIF(B3:B12,B3:B12,OFFSET(C3:C12,,{0,1,2}))-1,{1;1;1})/3,)) `

Option 2: (length: 240)

`=INDEX(B3:B12,MATCH(MAX(MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2},10))-1,{1;1;1})/3),MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2},10))-1,{1;1;1})/3,))`

Option 3: (length: 232)

`=INDEX(B3:B12,MATCH(MAX(MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2},10)),{1;1;1})),MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2},10)),{1;1;1}),))`

my apologies! Blessings!

29. Hi

My humble contribution.

`=INDEX(B3:B12,MATCH(MAX(MMULT(N(INDIRECT({"D","E","F"}&ROW(3:12)))/N(INDIRECT({"C","D","E"}&ROW(3:12))),ROW(\$1:\$3)^0)/3),MMULT(N(INDIRECT({"D","E","F"}&ROW(3:12)))/N(INDIRECT({"C","D","E"}&ROW(3:12))),ROW(\$1:\$3)^0)/3,0))`
30. ChrisBM says:

@XLarium

While this gets the right result with this data set I think the calculation is slightly wrong. I think you are looking for the maximum single row growth as opposed to region.

The maximum number your formula finds is 3.1761% (which is the growth rate for Angola – row 6) as opposed to the 2.984% that it should find. It then returns the region for this row – Sub-Saharan Africa

31. James says:

@XOR LX

Mainly in Finance, there is the need to work out the annual average growth rate over a given number of years … ( technically it is called the Compound Annual Growth Rate : CAGR)

The generic formula is: (F3/C3:)^(1/4)-1 or (FinalAmount/InitialAmount)^(1/nbyears)-1

Is this explanation clear enough ?

32. Hello Chris

33. I can confirm that, so far, two readers have offered correct solutions: John Jairo V and Chris BM.

The leading contender for shortest solution is currently John, at 232 characters.

Regards

34. @James

Many thanks for the explanation. Again, I’m sure that’s a much better statistical representation of what I was wanting to show than that which I used, and I apologize for my ignorance in this field, in which you clearly know what you’re talking about!

Regards

35. @XLarium

Despite it not being quite right for this challenge, I have to say that I like this alternative to the standard OFFSET construction that you employ using INDIRECT and ROW. Thanks for sharing it – one to remember.

Regards

36. GreasySpot says:

@ XOR LX
@XLarium

I was away from the office yesterday afternoon so therefore not able to participate in the further discussions, but I believe XLarium’s code doesn’t meet the criteria for ranges.

Any ranges must include both a row and column reference: A:F or 3:12, for example, are not acceptable.

`=INDEX(B3:B12,MATCH(MAX(MMULT(N(INDIRECT({"D","E","F"}&ROW(3:12)))/N(INDIRECT({"C","D","E"}&ROW(3:12))),ROW(\$1:\$3)^0)/3),MMULT(N(INDIRECT({"D","E","F"}&ROW(3:12)))/N(INDIRECT({"C","D","E"}&ROW(3:12))),ROW(\$1:\$3)^0)/3,0))`
37. @GreasySpot Good spot! Forgetting my own rules!

38. @GreasySpot

Well, technically they are not ranges but individual cell references.

🙂

Here is my formula for Google Sheets:

Formula

`=index(sort(query(B3:F12,"select B,((sum(D)-sum(C))/sum(C)+(sum(E)-sum(D))/sum(D)+(sum(F)-sum(E))/sum(E))/3group by B"),2,0),2,1)`

Length 128

Here I use the QUERY function to basically type out the formula for the average, grouped by region (column B). I SORT it by the second column so that the highest number is at the top, then I use INDEX to select the region name that is at the top after the sorting. I got greedy and did not put a space between the 3 and group so it looks like 3group, but the formula still works.

Well I was able to shave off 6 characters by replacing “sum(” with the number 1 and then using the substitute function to replace it back

`=index(sort(query(B3:F12,substitute("select B,((1D)-1C))/1C)+(1E)-1D))/1D)+(1F)-1E))/1E))/3group by B",1,"sum(")),2,0),2,1)`

Length 122

It makes it much harder to understand at first glance though XD

41. Lori says:

I like John Jairo V’s (non-CSE) option, which perhaps could be reduced further? (131 chars)

`=LOOKUP(1,1/FREQUENCY(0,1/MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2}))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2})),{1;1;1})),B3:B12)`

Or maybe for more flexibility only referring to complete ranges (with CSE)? (186 chars)

`=LOOKUP(1,1/FREQUENCY(0,1/MMULT(SUMIF(B3:B12,B3:B12,OFFSET(C3:F12,,MATCH(C2:F2+1,C2:F2+0)-1,,1))/SUMIF(B3:B12,B3:B12,OFFSET(C3:F12,,MATCH(C2:F2,C2:F2)-1,,1)),TRANSPOSE(C2:F2)^0)),B3:B12)`

Note this is just a tweak to his original idea – credit goes to him.

42. Lori says:

i accidentally left off the OFFSET height argument (10) in the first formula which would make it 137 chars.

43. @Lori

Absolutely wonderful.

Until now I had not known that there existed such an alternative (and elegant) construction to the standard (and lengthy) MATCH(MAX([long_formula]),[long_formula],0), etc.

1/FREQUENCY(0,… with a reciprocation of the bins_array is sublime.

44. P.S. We may be able to shave off an extra character (at a slight loss of readability) by making the lookup_value 0 (i.e. omitted) and reciprocating the FREQUENCY construction with 0 rather than 1, i.e.:

`=LOOKUP(,0/FREQUENCY(0,1/MMULT(SUMIF(B3:B12,B3:B12,OFFSET(D3,,{0,1,2},10))/SUMIF(B3:B12,B3:B12,OFFSET(C3,,{0,1,2},10)),{1;1;1})),B3:B12)`
45. Lori says:

Yes, =FREQUENCY(-LargeNum,-Array) can be used to find the maximum in an Array, or when data is positive =FREQUENCY(0,1/Array).

Also, my original formula does actually seem to work without the height argument since =SUMIF(B3:B12,B3:B12,C3) automatically expands the last argument In a similar way you can also replace B3:B12 by B3:B4 since LOOKUP expands the range like SUMIF (but note that if only B3 is used in the last argument LOOKUP expands the range horizontally not vertically.)

So, as you say, using 0/FREQUENCY allows one to remove the leading 1 and also changing B12 to B4 would save 2 further characters i.e. 129. Still not quite up to Isai’s Google Sheets solution but only 7 characters less.

46. John Jairo V says:

@Lori

Your “tweak” is awesome! Thanks for show us this marvelous option. I didn’t remember that the last argument for SUMIF expands with the dimension of the second argument. Elegant, short, and great! Congratulations.

This site uses Akismet to reduce spam. Learn how your comment data is processed.