REGEX
Learn how to use regex in Looker Studio to clean, extract, and transform your data. Discover key formulas, syntax, and practical examples.
Regular expressions (REGEX) are one of the most powerful yet underrated tools in Looker Studio. They allow you to clean, transform, and analyze your data directly in your reports, without having to use multiple complex formulas or manipulate your data sources.
What is a Regex?
A REGEX (short for Regular Expression) is a specialized language that allows you to search for, recognize, or transform specific patterns in a text.
In other words, instead of searching for an exact phrase or a fixed word, a REGEX allows you to define a matching rule (a "pattern"). This rule can cover a wide variety of cases:
- Identify all email addresses in a text.
- Find URLs that contain a certain campaign parameter.
- Replace a date format to standardize it.
- Extract a specific keyword or code from a field.
In Looker Studio, REGEX offers precision and flexibility that traditional formulas do not always provide. They allow you to unify heterogeneous data, such as different campaign names depending on the advertising network, and easily extract useful information from complex fields (e.g., tracking URLs) without modifying the source. They can also be used to segment data according to detailed text rules, capturing all relevant variations where a traditional search is limited to an exact word. Finally, they automate and improve the reliability of your reports, making dashboards more dynamic and easier to maintain.
The 4 essential REGEX functions in Looker Studio
Looker Studio offers four native REGEX functions for structuring text data in reports.
They allow you to validate, filter, extract, or replace text according to specific patterns. These tools are essential for cleaning raw data, categorizing campaigns, or creating custom dimensions.
REGEXP_MATCH: validate an exact match
The REGEXP_MATCH function returns true if the entire string matches the pattern.
Syntax: REGEXP_MATCH(field, "pattern")
Example: Check a PROD-1234 format with REGEXP_MATCH(Product Code, "PROD-[0-9]{4}")
. Returns false for PROD-ABCD.
Marketing use case: useful for validating formats such as emails REGEXP_MATCH(Email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$")
.
REGEXP_CONTAINS: find a partial match
REGEXP_CONTAINS detects whether part of the string matches the pattern.
Syntax: REGEXP_CONTAINS(field, "pattern")
.
Example: Filter campaigns with Brand via REGEXP_CONTAINS(Campaign, ".Brand.")
, which includes Brand_Spring and Brand2023_Sales.
Marketing use case: can be used in a CASE statement to categorize keywords: CASE WHEN REGEXP_CONTAINS(Campaign, "promo|discount|sale") THEN "Promotion" ELSE "Other" END
.
REGEXP_EXTRACT: extract specific information
REGEXP_EXTRACT returns the first substring captured by the pattern.
Syntax: REGEXP_EXTRACT(field, "(pattern)")
.
Example: Isolate the country code from a URL with REGEXP_EXTRACT(URL, "/(en|fr|es)/")
. For exemple.com/en/product, the output is en.
Another case: extract a date from a log with REGEXP_EXTRACT(Log, "(\\d{4}-\\d{2}-\\d{2})")
, which extracts 2023-10-05 from a text.
REGEXP_REPLACE: replace part of the text
REGEXP_REPLACE modifies the text by substituting the corresponding parts.
Syntax: REGEXP_REPLACE(field, "pattern", "replacement")
.
Example: Anonymize user IDs using REGEXP_REPLACE(URL, "user_id=[0-9]+", "user_id=ANONYMIZED")
, transforming mysite.com?user_id=12345 into mysite.com?user_id=ANONYMIZED.
Special character handling: use the R prefix to avoid multiple escapes, such as REGEXP_REPLACE(URL, R"/category/\d+/product", "/category/XXX/product")
.
REGEXP comparison
REGEX syntax explained: essential symbols
In Looker Studio, regular expression meta characters act as precision tools for structuring your queries. These symbols encode logical rules for extracting or transforming complex text data. Mastering their syntax allows you to use REGEX functions effectively.
Wildcards and quantifiers
Quantifiers define the repetition of characters in your patterns.
- The period: The period
"."
acts as a universal wildcard, capturing any single character except line breaks. A pattern likegr.y
will detect "gray" or "grey." - The asterisk: the asterisk "
*
" multiplies the effect of the preceding character:ab*c
matches "ac," "abc," or "abbc." - The plus sign: The "
+
" sign imposes a minimum occurrence (e.g.,ab+c
fails on "ac"). - The question mark: The question mark "
?
" creates optional matches. - The vertical bar: The vertical bar "
|
" introduces alternatives such as(.com|.net|.org)
to validate domain names, for example.
Anchors and groups
In Regex, an anchor is a symbol that does not correspond to a character, but to a position in the text. The anchors "^
" and "$
" set position constraints. For example, ^http://
filters out unsecured URLs, while .pdf$
identifies PDF documents.
Parentheses "()
" form capture groups, which are essential for the REGEXP_EXTRACT function. The syntax ([0-9]{5})
systematically extracts a postal code from an address.
Square brackets "[ ]
" create character sets: [a-zA-Z]
detects any letter, while [0-9a-fA-F]
identifies a hexadecimal character. A pattern such as [^0-9]
reverses the logic to find a non-digit.
Character classes and escaping
1. Predefined classes
These are shortcuts that avoid having to write long lists between square brackets [ ].
- \d → digit (equivalent to [0-9])
- Example:
\d{3}
detects three digits in a row → finds 123 in Code123XYZ.
- Example:
- \s → space (or tab, line break, etc.)
- Example:
Hello\sWorld
recognizes Hello World (with a space in between).
- Example:
- \w → alphanumeric character (letters A-Z, a-z, numbers 0-9, and _)
- Example: \w+ finds a word → in User_2025, it detects the entire User_2025 block.
2. The special case of Looker Studio (RE2)
Looker Studio uses the RE2 library to interpret Regex.
Problem: certain special characters such as \d, \s, \w must be "escaped twice" to be understood.
Specifically, instead of writing: \d+
you must write: \\d+
3. Example in Looker Studio
If you want to retrieve all the numbers in a tracking URL:
REGEXP_EXTRACT(URL, "\\d+")
returns 2025 if the URL contains campaign2025.
If you want to check whether a field contains a space: REGEXP_CONTAINS(Name, "\\s")
returns TRUE if the name is John Smith (because there is a space).