R1C1: Excel Musings – Array Formulae

Formulae lie at the heart of what makes Excel such a powerful calculation tool and although standard Excel formulae can solve most problems, situations still arise when we need extra intermediate calculations or ‘helper cells’. Excel has at least two powerful tools for situations like this, one of which is array formulae (the other is user defined functions). Array formulae are a huge topic and I’ll just be scratching the surface here.

WHAT IS ARRAY?

Before we can talk about array formulae, we should first understand what we mean by the term ‘array’. In the context of Excel, and this discussion, an array is simply a rectangular range of data. They are somewhat analogous to the idea of a matrix in mathematics. An array could simply just be a block of data three rows high by five columns wide. Arrays can also be several rows high by only one column wide, or vice versa, equivalent to what mathematicians would call a vector (which is really just a special type of matrix).

OK, SO WHAT IS AN ARRAY FORMULA?

An array formula is a special type of Excel formula that allows functions to operate on a range (or array) of data in a way that could not happen with a standard formula. You can designate to Excel that the formula in your formula bar should be an array formula via the key combination of Ctrl+Shift+Enter, instead of just pressing Enter. For this reason they are sometimes referred to as ‘CSE formulae’. Once entered, {braces} will appear on the outside of the formula in the formula bar, informing the user that an array formula has been used.

There are two broad types of array formulae – Multi cell array formulae and Single cell array formulae. In my view, the single cell array formulae are more interesting to discuss, and a more useful tool to have in our Excel toolbox, so I will be very brief on the multi cell variety.

A multi cell array formula is used when the result of the formula is a range that takes up more than one cell. Some Excel functions are required to be entered this way, the most common of which is the TRANSPOSE function. Multi cell array formulae must be entered over a range of cells simultaneously, for example by selecting a range of cells equal in size to the desired output, typing the formula in the formula bar, and finally pressing Ctrl+Shift+Enter. It can also be useful to return entire lists of data this way using a single fast formula.

SINGLE-CELL ARRAY FORMULAE

Essentially, a single cell array formula allows a function that normally acts on a single cell or case, such as the IF or ABS functions, to act on an entire range of cells or cases. Let me give you two examples, both based on real situations from real models.

(You can download the examples at www.actuaries.asn.au/Library/AAArticles/2014/R1C1ArrayExamples.xlsx).

Let’s say we have a Balance Sheet in our financial model and every time period has a ‘check’ cell which calculates Net Assets less Total Equity. This check should be zero, for every one of the, say, 200 time periods (i.e. columns) in the model.

We want to build in a single check cell that can report on all 200 individual period checks and alert us if any are non-zero.
We don’t want to just sum all 200 period checks together and see if that sum is zero, because there could be 2 or more non-zero checks that cancel each other out. But if we could sum the absolute value of each period check, that would be sufficient. It turns out we can do this very elegantly using an array formula. If our individual checks are in cells M12:HD12, then this formula (using Ctrl+Shift+Enter to apply it) does the job =SUM(ABS(M12:HD12))

Using an array formula tells Excel that we want the ABS function to act 200 individual times, and return a 1 x 200 vector array equal to the absolute value of each of the 200 cells, M12:HD12. We then have a SUM function wrapped outside of the 1 x 200 resulting array, which just lives in Excel’s memory and not on the actual worksheet. The SUM function is able to act on these 200 values and resolve them down to a single cell answer.

In this case, we hope the answer will be zero. If not, at least one of the individual checks is non-zero. This approach also avoids the potential problem of two non- zero values cancelling each other out.

The next example, which I will mention only briefly, is a way to write a MIN IF function. Here, we want the minimum value from a range of data, subject to meeting a certain condition. This is not a native Excel function, but we can do it with an array formula, by wrapping a MIN function around an IF function, and using range references rather than single cell references inside the IF function.

Let’s say cells A4:A123 contain months, cells B4:B123 contain sales figures, and we want to find the smallest sales figure that occurred in January. Then we can use: =MIN(IF(A4:A123=”January”,B4:B123,FALSE))

Note that by having the IF function return ‘FALSE’ instead of the sales data for all non-January rows, we are allowing the MIN function to ignore these values. With this basic construct, we can write MIN IF and MAX IF formulae based on almost any evaluation criteria we desire.

ArrayFormulae

SHOULD WE AVOID ARRAY FORMULAE IN OUR MODELS?

This is an interesting and often debated question. I work as a financial modelling consultant, and I have discussed this question at length with colleagues in the industry. A common school of thought is yes, when writing models that other people may use, we should avoid using array formulae because “clients don’t understand them”.

This argument can also apply for in- house model developers who may require a colleague to use or one day inherit their model. It’s an argument I understand, but not one that I blindly agree with.

Drawing the line at array formulae is arbitrary. I’ve met many, many Excel users who make extensive use of VLOOKUP but are unfamiliar with the INDEX and MATCH functions. I have never had anyone suggest to me that when preparing a model I should avoid INDEX and instead use VLOOKUP because people are more comfortable with it.

I should use the best tool for the job, and in the case of looking up values from a table that tool is almost exclusively INDEX and MATCH. Typically, the client is also delighted to have learned a new feature for their Excel arsenal. So why should array formulae be any different?

I also think part of the job of a consultant is to not only produce the best output they can, but also demonstrate some improved ways of doing things. People also tell me that array formulae are slow to calculate, but this is not true of all array formulae. With skilled design, they can be extremely fast and extremely useful. Array formulae are not scary, and they’re not mystical. They are not commonly needed, but they are a valuable feature of Excel to exploit (with care) when the time calls for it. If you have an opinion on this topic, please send me an email.

AN EXCEL FORMULA WRITING CHALLENGE!

If you are interested in learning more about array formulae, there is a superb book on the subject that came out last year, easily purchased online. The book is called Ctrl+Shift+Enter – Mastering Excel Array Formulas by Mike Girvin. Thanks to the generosity of the publisher, Holy Macro! Books, I have a copy to give away to whichever reader submits the most elegant single-cell array formula solution to the posted challenge. Two key criteria for elegance will be brevity of formula length, and flexibility of the formula to also work on a different sized data set. A formula that can easily be extended to a 100*100 block of data without the need for extra terms will be more elegant than one that can’t be so easily extended. Be warned, this is not an easy challenge! Send your responses to me via email by 5 September.

You can download the challenge file at www.actuaries.asn.au/Library/AAArticles/2014/R1C1ArrayFormulaChallenge.xlsx.

CPD: Actuaries Institute Members can claim two CPD points for every hour of reading articles on Actuaries Digital.