The Excel team introduced support for regular expressions in May 2024 with three new functions

  1. REGEXREPLACE, which ‘allows you to replace text from a string with another string, based on a supplied regular expression (“regex”)’.
  2. REGEXEXTRACT, which ‘allows you to extract text from a string based on a supplied regular expression. You can extract the first match, all matches or capturing groups from the first match’.
  3. REGEXTEST, which ‘allows you to check whether any part of supplied text matches a regular expression (“regex”). It will return TRUE if there is a match and FALSE if there is not’.

The functions were announced in this post on the Microsoft 365 Insiders blog.

As well as these new functions, we learned that we will be able to use regular expressions with XLOOKUP and XMATCH in the near future.

Regex coming soon to XLOOKUP and XMATCH

We will also be introducing a way to use regex within XLOOKUP and XMATCH, via a new option for their ‘match mode’ arguments. The regex pattern will be supplied as the ‘lookup value’.

This will be available for you to try in Beta soon, at which point we’ll update this blog post with more details.

While we wait for that, we can mimic that functionality with lambda functions!

XMATCH.NEW

XMATCH.NEW = LAMBDA(lookup_value,lookup_array,[match_mode],[search_mode],
    IF(
        match_mode=3, // use regex
        FILTER(SEQUENCE(ROWS(TOCOL(lookup_array))),REGEXTEST(lookup_array,lookup_value)),
        XMATCH(lookup_value, lookup_array, match_mode, search_mode)
    )
);

XMATCH.NEW works very similarly to XMATCH. In fact, unless we pass match_mode=3, it works identically to XMATCH.

If we pass match_mode=3, XMATCH.NEW interprets the lookup_value as a regex pattern. The pattern is tested against the lookup_array using REGEXTEST. This returns an array of TRUE/FALSE values. This array, or mask if you prefer, is passed as the include argument in the FILTER function, where the array being filtered is the positional indices of the lookup_array – SEQUENCE(ROWS(TOCOL(lookup_array))).

As a simple example, let’s use this pattern ^(?![^@\s]+@[^@\s]+.[^@\s]{2,}\s*$).+$ to find the positional indices of the invalid email addresses in this made-up dataset.

The pattern is passed as the lookup_value, the array is the range of email addresses in column C, and the match_mode is set to 3. So, the function returns the indices (shown in column G) of those rows where REGEXTEST returns TRUE (shown in column F). Whereas XMATCH only returns the first index of a valid match, XMATCH.NEW with match_mode 3 will return all matching indices.

XLOOKUP.NEW

XLOOKUP.NEW = LAMBDA(lookup_value,lookup_array,return_array,
                    [if_not_found],[match_mode],[search_mode],
    IF(
        match_mode=3, // use regex
        IFERROR(
            FILTER(return_array,REGEXTEST(lookup_array,lookup_value)),
            IF(ISOMITTED(if_not_found),#N/A,if_not_found)
        ),
        XLOOKUP(lookup_value,lookup_array,return_array,if_not_found,match_mode,search_mode)
    )
);

XLOOKUP.NEW will behave identically to XLOOKUP unless we pass match_mode=3, in which case the pattern in lookup_value will be used in REGEXTEST to build an array of TRUE/FALSE values which are passed as the include parameter in FILTER. This time, the array being filtered will be return_array, returning the matching rows. If this filter operation returns an error, the function returns to the spreadsheet the argument supplied to the if_not_found parameter, or #N/A if no such argument was provided.

Let’s use the same invalid email address pattern to filter the data for rows with invalid email addresses:

The pattern is passed as the lookup_value, the lookup_array is the range of email addresses in column C, the return_array is the range of data in columns A to E, and the match_mode is set to 3. So, the function returns those rows where REGEXTEST returns TRUE (shown in column F).

Simple!

CONCLUSION

While we wait for the official update that will allow us to use regular expressions with XLOOKUP and XMATCH, we can make some simple LAMBDA functions that will mimic this behavior, and start to practice our regular expressions now!

Click here for another post about regular expressions, and why “just get it from AI” is not going to be good enough.

Let me get right to the point.

Both SWITCH and unused complex names in LET can slow down your formulas

Introduction

This formula returns a large array:
=MAKEARRAY(10000,1000,LAMBDA(x, y, PRODUCT))

In the Beta version of Excel I’m using, the above formula can be written like this:

=MAKEARRAY(10000,1000, PRODUCT)

I’ll use the shorter version for the remainder of this post. This MAKEARRAY formula creates an array of 10000 rows and 1000 columns. For each cell in the array, it uses the PRODUCT function to multiply the row number by the column number. It takes my computer about 3 seconds to return this formula. By most measures, it’s a slow calculation.

As you develop your own solutions and complex functions, you may sometimes create similarly long running calculations and there may be times where these are either used or not used based on some condition in your formula.

This post explores the options available to us with conditional execution of such long-running calculations.

IF

First let’s look at what happens if the formula is in an IF function, but is not accessed.

=IF(TRUE,"some other return value",MAKEARRAY(10000,1000,PRODUCT))

By forcing the condition (the first argument) to TRUE, this returns “some other return value”. The formula evaluates almost immediately, meaning it doesn’t evaluate the call to MAKEARRAY. This is also true of the following formula, where the formula above is given a name in the LET function:

=LET(if_with_long_running_else, 
     IF(TRUE,"some other return value",MAKEARRAY(10000,1000,PRODUCT)), 
     if_with_long_running_else)

LET

If we name the long-running calculation in LET, but don’t access it in the return value, you might think the long-running calculation is not executed.

=LET(long_running_expr,
     MAKEARRAY(10000,1000,PRODUCT),
     "some other return value")

That’s not the case. The formula above returns “some other return value” but still takes 3 seconds to return, meaning MAKEARRAY is still calculated even though it’s not used.

One way we can speed this up is to “thunk” the long-running calculation if it’s not being used.

=LET(long_running_expr, 
     LAMBDA(MAKEARRAY(10000,1000,PRODUCT)), 
     "some other return value")

When we wrap an expression with LAMBDA like this, we’re creating a LAMBDA function with zero arguments. As with any function, if the function isn’t called, it’s not evaluated. So, the formula expression above returns “some other return value” with the speed we expect. To illustrate what I mean when I say “function is not called”, consider this formula:

=LET(long_running_expr, 
     LAMBDA(MAKEARRAY(10000,1000,PRODUCT)),
     long_running_expr)

This formula returns very quickly the value #CALC!, which if you hover over it, gives the hint ‘Cell contains a lambda’.

Here, the ‘long_running_expr’ name returns a LAMBDA function with zero arguments – a thunk – but it is not evaluated because we haven’t called the function. As with every other function in Excel, we call a function by providing its arguments wrapped in parentheses. Even if there are no arguments, we must provide empty parentheses.

=LET(long_running_expr, 
     LAMBDA(MAKEARRAY(10000,1000,PRODUCT)), 
     long_running_expr())

This formula calculates the big array and returns it to the spreadsheet. So, bear this in mind when using LET with lots of long-running named expressions:

LET names are evaluated whether they are used or not

SWITCH

Things start to get a bit trickier with SWITCH. Recall from the documentation:

We provide an expression which evaluates to some value. The result that’s returned by SWITCH depends on the value to which the expression evaluates.

=SWITCH(1, 1,"some other return value",2,MAKEARRAY(10000,1000,PRODUCT))

In the formula above, the expression is just ‘1’. This evaluates to the 2nd argument, so the return value is “some other return value”. However, the formula itself takes 3 seconds to run, meaning despite the fact that result2 is not being used, it’s still being calculated.

If we wrap result2 in a thunk, things improve:

=SWITCH(1, 1,"some other return value", 2,LAMBDA(MAKEARRAY(10000,1000,PRODUCT)))

This quickly returns “some other return value”, but of course is a problem if the expression evaluates to 2:

=SWITCH(2, 1,"some other return value", 2,LAMBDA(MAKEARRAY(10000,1000,PRODUCT)))

Since this is now returning the LAMBDA in result2, but we haven’t provided the parentheses, the formula above returns #CALC! (i.e. it is returning the LAMBDA function itself, not the LAMBDA function’s result). So, we can provide those parentheses at the end of the formula to retrieve the array from the LAMBDA result:

=SWITCH(2, 1,"some other return value", 2,LAMBDA(MAKEARRAY(10000,1000,PRODUCT)))()

This correctly returns the big array. But if we now want to pass 1 as the expression, we’re in trouble:

=SWITCH(1, 1,"some other return value", 2,LAMBDA(MAKEARRAY(10000,1000,PRODUCT)))()

Since we have those empty parentheses at the end of the formula, SWITCH needs to return a function. But “some other return value” isn’t a function, so the formula above returns #REF!

This can be resolved by thunking “some other return value” as well:

=SWITCH(1, 1,LAMBDA("some other return value"), 2,LAMBDA(MAKEARRAY(10000,1000,PRODUCT)))()

This returns “some other return value” quickly if the expression evaluates to 1, and returns the array if the expression evaluates to 2. This latter circumstance is slow, but that’s to be expected.

At this point I would recommend taking care when using the SWITCH function with long-running arguments, since:

SWITCH results are evaluated whether they are used or not

As a last note, if you have simple expressions which are ordered integers like the above example, you’re better off using CHOOSE. This formula returns “third option” immediately, meaning the MAKEARRAY call is not evaluated:

=CHOOSE(3,"some other return value",MAKEARRAY(10000,1000,PRODUCT),"third option")

CONCLUSION

There’s a lot to consider when it comes to managing the performance of your formulas or custom LAMBDA functions.

  1. We saw that IF short-circuits the else part if the condition evaluates to TRUE.
  2. We saw that LET will calculate every name, regardless of whether that name is used in the output of the LET function.
  3. Similarly, SWITCH will evaluate every result regardless of which result is used.
  4. Lastly, for simple switching behavior, I recommend you try to use CHOOSE before SWITCH if possible.
  5. For complex scenarios where SWITCH is the best option, and if you have long-running calculations within, consider wrapping each return value in a thunk (argument-less LAMBDA) and adding empty parentheses to the end of the SWITCH function.