Ten Essential Excel Functions for Data Analysts

You don’t need Python for every data task

Curtis
Towards Data Science

--

Although Python is the dominant tool in the field of data science, Excel is a handy and accessible way to perform analysis or display information to stakeholders. Since Microsoft Excel is widespread in the business world, and the interface is familiar to many without a data background, it is very useful for analysis and reports that need to be shared with other departments in your business.

In this article, we will explore ten Excel functions to get even more power out of the software and perform common data analyst tasks quickly in Excel.

COUNTBLANK

Raw data can be messy, and sometimes it can include missing values, especially when this data is collected in the real world. In Python, we can use a method such as isna() to detect missing values. However, this is also possible with Excel using the COUNTBLANK function.

COUNTBLANK will count the number of blank cells in a given range.

Using the COUNTBLANK function to check for missing values.
Using the COUNTBLANK function to check for missing values. Image by author.

The formula for this Excel function is:

=COUNTBLANK(range)

SUMIFS

One of the best-known and most basic Excel functions is the SUM function. Sometimes we may want to use SUM, but also exclude cells from our summation using a criterion. This is where SUMIFS comes in.

SUMIFS allows us to sum values in a given range, but only sums the values that match given criteria. As many criteria can be given to this function as desired.

Using the SUMIFS function to count employees in different locations.
Using the SUMIFS function to count employees in different locations. Image by author.

The formula for this Excel function is:

=SUMIFS(sum_range, criteria_range1, criteria1, …)

RANK

The RANK function can be used to return the rank of a numerical value when compared to a list of other numerical values. Ranking can be very important in data analysis to know where a particular value would fall in an ordered array. There also exists a similar function known as PERCENTRANK which returns the ranking as a percentage of the dataset’s range.

It is important to note that if a value cannot be found in the given list or array, Excel will return a #N/A error. This is not true for PERCENTRANK.

Using the RANK function to rank locations by number of employees.
Using the RANK function to rank locations by the number of employees. Image by author.

The formula for this Excel function is:

=RANK(number, list)

VLOOKUP

VLOOKUP is one of the most important functions for any data analyst to know. It can be used to retrieve, or look up, data in a table that is organised vertically. This is incredibly useful as it can be used to automatically find data in another spreadsheet, as long as each row has an ID.

The ID column for the table must be the first column. The third parameter in the VLOOKUP function can then be used to refer to the column containing the data that you want to retrieve. It is important to note that this parameter is 1-based in Excel. This means that a value of 2 will get the 2nd column, and a value of 3 will get the third column, and so on. This is different to list indexing in Python, which is 0-based.

The other two parameters in the VLOOKUP function refer to the lookup value (or the ID of the row you are retrieving) and the range of the table itself.

Using the VLOOKUP function to find data on different locations.
Using the VLOOKUP function to find data on different locations. Image by author.

The formula for this Excel function is:

=VLOOKUP(lookup_value, table_array, col_index_number)

IFERROR

If you are presenting your Excel spreadsheets to other people in your business, then it can be useful to have default values in cells when errors occur in your functions. This can be particularly useful when other users do not understand the meaning of errors, or the value of the cell would be known if an error occurred in calculations (such as defaulting to 0).

The IFERROR function can do exactly that. This function simply takes two parameters, the first is the function, and the second is the default value if the function throws an error. If the function does not throw an error, then the value will be outputted as normal.

Using the IFERROR function to handle Excel errors.
Using the IFERROR function to handle Excel errors. Image by author.

The formula for this Excel function is:

=IFERROR(value, value_if_error)

DAYS

If you have ever worked with data involving time, you will know that there are several key calculations that will keep reoccurring in your work. One of these is calculating the number of days between two dates. In Excel, we can do this with the DAYS functions.

The DAYS function takes two dates as parameters and returns the number of days between them as an integer.

One scenario where this function would be useful is calculating the time between when a product is ordered, and when that product is delivered. This is an important metric for businesses that sell products, and so it is great that Excel provides an easy way to calculate it.

Using the DAYS function to calculate days to deliver a product.
Using the DAYS function to calculate days to deliver a product. Image by author.

The formula for this Excel function is:

=DAYS(end_date, start_date)

MAXIFS

Finding the maximum value that a variable can take is extremely important in business. From knowing the busiest day of the year to the maximum amount of profit a business made in a day — there are many reasons to calculate the maximum.

Sometimes, you might only want to calculate the maximum on a subset of your data. In that case, you can use the MAXIFS function to put constraints on the data that you want to take the maximum of.

In the example below, we use MAXIFS to find the maximum days to deliver each product. The constraint is placed on the product in each row of our dataset.

Using the MAXIFS function to get maximum delivery times for each product.
Using the MAXIFS function to get maximum delivery times for each product. Image by author.

The formula for this Excel function is:

=MAXIFS(max_range, criteria_range_1, criteria_1, …)

AVERAGEIFS

Earlier we explored the use of SUMIFS to get filtered sums of our data. We have also looked at MAXIFS to do the same but with the maximum values. By now you may have worked out that there are many different IF functions that can be used in Excel.

Some other useful IF functions include COUNTIFS, MINIFS, and the plain-old IF function. One function that does appear often is the AVERAGEIFS function, which is used to calculate averages.

In the example below, we use the AVERAGEIFS function to calculate the average delivery time for each product.

Using the AVERAGEIFS function to get average delivery times for each product.
Using the AVERAGEIFS function to get average delivery times for each product. Image by author.

The formula for this Excel function is:

=AVERAGEIFS(average_range, criteria_range_1, criteria_1, …)

MATCH

Sometimes it can be useful to know the specific column or row that a data value occurs in. We will look at one use-case later in the article, but first, we will introduce the function that allows us to do it.

The MATCH function is used to determine the position of a value in a given array. It is important to remember that this result will be relative to the start of the array, so a result of 4 does not necessarily mean the value you were looking for is in the 4th column or row of the spreadsheet; it would be in the 4th position of the given range.

Using the MATCH function to get a specific row in the dataset.
Using the MATCH function to get a specific row in the dataset. Image by author.

The formula for this Excel function is:

=MATCH(value, array)

INDEX

INDEX is a function that is commonly used with the previous MATCH function. Together, they can be used to get a value in a lookup table based upon given criteria, similar to the VLOOKUP function. However, these two functions can be used in some cases where the singular lookup functions fail, for example when the lookup value is not in the first row/column of the range.

On its own, the INDEX function is used to return the value of a cell at a given row number and index number.

In the example below, we can see that the MATCH function is used to get the row where the product is delivered in 11 days, and then the INDEX function is used to get the product description of that row.

Combining the MATCH and INDEX functions for a lookup.
Combining the MATCH and INDEX functions for a lookup. Image by author.

The formula for this Excel function is:

=INDEX(array, row_number, column_number)

Conclusion

And there you have it! Ten Excel functions that are essential to know for anyone performing data analysis with the software. These ten functions will appear very frequently if you do a lot of work in Excel, and there is no reason not to use the software now that you know how to perform common tasks.

Although you will get good use out of all the functions mentioned in this article, I would recommend paying close attention to the pairing of MATCH and INDEX. This is an extremely common function pairing in the Excel world and one that is easy to do with a little bit of practice.

If the MATCH/INDEX pairing is proving difficult for you to use then start with VLOOKUP, and understanding the different use cases of that function.

Are there any more functions that haven’t been mentioned yet that you believe are essential to know for Microsoft Excel? Mention them below in the comments.

--

--