REGEX

Learn how to use regex in Looker Studio to clean, extract, and transform your data. Discover key formulas, syntax, and practical examples.

Data modeling

Regular expressions (REGEX) are one of the most powerful yet underrated tools in Looker Studio. They allow you to clean, transform, and analyze your data directly in your reports, without having to use multiple complex formulas or manipulate your data sources.

What is a Regex?

A REGEX (short for Regular Expression) is a specialized language that allows you to search for, recognize, or transform specific patterns in a text.

In other words, instead of searching for an exact phrase or a fixed word, a REGEX allows you to define a matching rule (a "pattern"). This rule can cover a wide variety of cases:

  • Identify all email addresses in a text.
  • Find URLs that contain a certain campaign parameter.
  • Replace a date format to standardize it.
  • Extract a specific keyword or code from a field.

In Looker Studio, REGEX offers precision and flexibility that traditional formulas do not always provide. They allow you to unify heterogeneous data, such as different campaign names depending on the advertising network, and easily extract useful information from complex fields (e.g., tracking URLs) without modifying the source. They can also be used to segment data according to detailed text rules, capturing all relevant variations where a traditional search is limited to an exact word. Finally, they automate and improve the reliability of your reports, making dashboards more dynamic and easier to maintain.

The 4 essential REGEX functions in Looker Studio

Looker Studio offers four native REGEX functions for structuring text data in reports.

They allow you to validate, filter, extract, or replace text according to specific patterns. These tools are essential for cleaning raw data, categorizing campaigns, or creating custom dimensions.

REGEXP_MATCH: validate an exact match

The REGEXP_MATCH function returns true if the entire string matches the pattern.

Syntax: REGEXP_MATCH(field, "pattern")

Example: Check a PROD-1234 format with REGEXP_MATCH(Product Code, "PROD-[0-9]{4}"). Returns false for PROD-ABCD.

Marketing use case: useful for validating formats such as emails REGEXP_MATCH(Email, "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$").

REGEXP_CONTAINS: find a partial match

REGEXP_CONTAINS detects whether part of the string matches the pattern.

Syntax: REGEXP_CONTAINS(field, "pattern").

Example: Filter campaigns with Brand via REGEXP_CONTAINS(Campaign, ".Brand."), which includes Brand_Spring and Brand2023_Sales.

Marketing use case: can be used in a CASE statement to categorize keywords: CASE WHEN REGEXP_CONTAINS(Campaign, "promo|discount|sale") THEN "Promotion" ELSE "Other" END.

REGEXP_EXTRACT: extract specific information

REGEXP_EXTRACT returns the first substring captured by the pattern.

Syntax: REGEXP_EXTRACT(field, "(pattern)").

Example: Isolate the country code from a URL with REGEXP_EXTRACT(URL, "/(en|fr|es)/"). For exemple.com/en/product, the output is en.

Another case: extract a date from a log with REGEXP_EXTRACT(Log, "(\\d{4}-\\d{2}-\\d{2})"), which extracts 2023-10-05 from a text.

REGEXP_REPLACE: replace part of the text

REGEXP_REPLACE modifies the text by substituting the corresponding parts.

Syntax: REGEXP_REPLACE(field, "pattern", "replacement").

Example: Anonymize user IDs using REGEXP_REPLACE(URL, "user_id=[0-9]+", "user_id=ANONYMIZED"), transforming mysite.com?user_id=12345 into mysite.com?user_id=ANONYMIZED.

Special character handling: use the R prefix to avoid multiple escapes, such as REGEXP_REPLACE(URL, R"/category/\d+/product", "/category/XXX/product").

REGEXP comparison

Function Main use Basic syntax Result
REGEXP_MATCH Validates if the entire string matches the pattern. REGEXP_MATCH(X, "pattern") True / False
REGEXP_CONTAINS Checks whether the string contains the pattern. REGEXP_CONTAINS(X, "pattern") True / False
REGEXP_EXTRACT Extracts the first capturing group of the pattern. REGEXP_EXTRACT(X, "(pattern)") Text (the extracted part)
REGEXP_REPLACE Replaces the text matching the pattern with another string. REGEXP_REPLACE(X, "pattern", "replacement") Text (the modified string)

REGEX syntax explained: essential symbols

In Looker Studio, regular expression meta characters act as precision tools for structuring your queries. These symbols encode logical rules for extracting or transforming complex text data. Mastering their syntax allows you to use REGEX functions effectively.

Wildcards and quantifiers

Quantifiers define the repetition of characters in your patterns.

  • The period: The period "." acts as a universal wildcard, capturing any single character except line breaks. A pattern like gr.y will detect "gray" or "grey."
  • The asterisk: the asterisk "*" multiplies the effect of the preceding character: ab*c matches "ac," "abc," or "abbc."
  • The plus sign: The "+" sign imposes a minimum occurrence (e.g., ab+c fails on "ac").
  • The question mark: The question mark "?" creates optional matches.
  • The vertical bar: The vertical bar "|" introduces alternatives such as (.com|.net|.org) to validate domain names, for example.

Anchors and groups

In Regex, an anchor is a symbol that does not correspond to a character, but to a position in the text. The anchors "^" and "$" set position constraints. For example, ^http:// filters out unsecured URLs, while .pdf$ identifies PDF documents.

Parentheses "()" form capture groups, which are essential for the REGEXP_EXTRACT function. The syntax ([0-9]{5}) systematically extracts a postal code from an address.

Square brackets "[ ]" create character sets: [a-zA-Z] detects any letter, while [0-9a-fA-F] identifies a hexadecimal character. A pattern such as [^0-9] reverses the logic to find a non-digit.

Character classes and escaping

1. Predefined classes

These are shortcuts that avoid having to write long lists between square brackets [ ].

  • \d → digit (equivalent to [0-9])
    • Example: \d{3} detects three digits in a row → finds 123 in Code123XYZ.
  • \s → space (or tab, line break, etc.)
    • Example: Hello\sWorld recognizes Hello World (with a space in between).
  • \w → alphanumeric character (letters A-Z, a-z, numbers 0-9, and _)
    • Example: \w+ finds a word → in User_2025, it detects the entire User_2025 block.

2. The special case of Looker Studio (RE2)

Looker Studio uses the RE2 library to interpret Regex.

Problem: certain special characters such as \d, \s, \w must be "escaped twice" to be understood.

Specifically, instead of writing: \d+

you must write: \\d+

3. Example in Looker Studio

If you want to retrieve all the numbers in a tracking URL:

REGEXP_EXTRACT(URL, "\\d+") returns 2025 if the URL contains campaign2025.

If you want to check whether a field contains a space: REGEXP_CONTAINS(Name, "\\s") returns TRUE if the name is John Smith (because there is a space).

Remember:
\d = digit, \s = space, \w = word.
In Looker Studio, always remember to double the backslash → \\d, \\s, \\w.
Symbol Syntax Marketing example Result
Period (wildcard) . gr.y Corresponds to gray or grey
Asterisk * ab*c Finds ac, abc, abbc
Plus + ab+c Finds abc, abbc, but not ac
Question mark ? colou?r Recognizes color and colour
Vertical bar (OR) | (.com|.net|.org) catchr.com, recognizes .com
Start anchor ^ ^http:// Detects only URLs that start with http://
Anchor end $ .pdf$ Finds links that end with .pdf
Capture group () ([0-9]{5}) Extracts a postal code (e.g., 75001)
Character set [] [A-Z]{3} Detects a code such as ABC
Negation in a set [^] [^0-9] Matches any character that is not a digit
Class of digits \d (or \\d in Looker Studio) \\d{3} Finds 123 in Code123XYZ
Space class \s (or \\s) REGEXP_CONTAINS(Name, "\\s") Returns TRUE if John Smith contains a space
Alphanumeric class \w (or \\w) \\w+ Detects User_2025 as a complete word

Practical examples: how REGEX transforms your marketing reports

Case 1: Categorizing "Brand" vs. "Non-Brand" campaigns

Advertising campaign names often contain variations of your brand name, making centralized analysis difficult. Use REGEXP_CONTAINS to automate this segmentation.

Formula

CASE
 WHEN REGEXP_CONTAINS(Campaign, "(?i).brand.") 
  THEN "Brand"
 ELSE "Non-Brand"
END

This code identifies campaigns containing "brand" (or any variation) and classifies them as "Brand." The (?i) makes the search case-insensitive, capturing "Brand," "BRAND," or "brand." For brands with variations ("brand," "product," "company"), add | operators: "(?i).brand.|product|company" for multiple matches.

Benefit: Instantly compare the performance of brand campaigns versus others, without manual cleanup. This method avoids segmentation errors and ensures consistent analysis, saving teams time.

Case 2: Clean up URLs to group pages

UTM or Google Ads parameters generate unique URLs for the same page, skewing metrics. Use REGEXP_EXTRACT to isolate the base URL.

Formula:

REGEXP_EXTRACT(page URL, "^([^?]+)")

This expression extracts everything before the ? in a URL, removing dynamic parameters. For example, exemple.com/page?utm_source=google becomes exemple.com/page . For URLs with fragments (#), use "^([^?#]+)" to ignore them as well.

Benefit: Aggregate sessions, bounce rate, and time spent per page reliably, without duplicates. This allows you to accurately measure engagement on each page, regardless of traffic sources or parameters.

💡 When a question mark or period is placed between square brackets, they lose their special role. So here, [?] no longer means "optional" but rather "?".

Case 3: Extract the source of a custom channel

A structured nomenclature such as FR_Social_Facebook_Promo_2024 can be analyzed to isolate the social network. Use REGEXP_EXTRACT with capture groups.

Advanced formula:

REGEXP_EXTRACT(Campaign, "^[^_]+_[^_]+_([^_]+)_.")

This pattern skips the first two segments separated by _, captures the third (the social network), and then ignores the rest. For FR_Social_Instagram_Promo, it returns Instagram. If your campaign names vary (e.g., FR_TikTok_Promo_2024), adapt the regex to skip one less segment: "^[^_]+_([^_]+)_.".

Benefit: Generate a "Social Network" dimension to compare performance between Facebook, LinkedIn, etc., without manual editing. This allows you to view trends in real time by automating extraction.

Best practices and limitations of REGEX in Looker Studio

Simplify your life with raw strings

Backslashes in regexes often generate errors. For a Windows path C:\Users\Name\Documents, the classic syntax requires REGEXP_CONTAINS(Path, "C:\\\\Users\\\\Name\\\\Documents"). Replace with REGEXP_CONTAINS(Path, R"C:\Users\Name\Documents"). The R prefix simplifies the handling of special characters, which is useful for extracting subdomains from complex URLs, such as REGEXP_EXTRACT(URL, R"^(\w+)\.shop\.example\.com$").

RE2 syntax: no lookahead or lookbehind

The RE2 engine, optimized for speed, excludes (?=...) and (?<=...). To validate that URLs include "product" without "archive," use AND(REGEXP_CONTAINS(URL, "product"), NOT(REGEXP_CONTAINS(URL, "archive"))) instead of an advanced regex. To isolate category in /shop/category/product123, use REGEXP_EXTRACT(URL, "/shop/([^/]+)/") rather than a lookbehind. These methods leverage the strengths of RE2 without overloading performance.

When should you avoid REGEX?

For simple tasks, text functions are more efficient. Check a promo code with STARTS_WITH(Code, "PROMO2023") instead of a regex. Clean up static terms using REPLACE(Campaign, "old", "new"). REGEX is designed for dynamic patterns, such as removing numbers with REGEXP_REPLACE(Name, R"\d+", "") to transform report2023.xlsx into report.xlsx.

Test your expressions on a sample of data before applying them to your entire report (Use https://regex101.com)