Regular Expressions

Regular expressions (regex) are powerful tools for searching, matching, and cleaning text data. They are especially useful for psychology students—whether undergraduates, graduate students, or lab members—working with survey data, open-ended responses, or behavioral experiments.

✅ When Regex Is Useful

1. Cleaning Survey or Experimental Data

Remove unwanted characters (e.g., extra spaces, punctuation, line breaks)
Standardize response formats (e.g., convert "yes", "Yes", and "YES" to the same value)

2. Analyzing Open-Ended Responses

Automatically find key phrases in text (e.g., responses mentioning “anxiety” or “stress”)

3. Recoding or Tagging Variables

Identify and label entries based on content (e.g., classify responses as “positive” or “negative”)

4. Filtering Log Files or Experimental Output

Extract events from EEG, fMRI, or behavioral logs (e.g., match all trials with a specific stimulus)

5. Writing Scripts for Stimulus Presentation

Select or randomize files based on filename patterns (e.g., match all .wav files that start with stim_)

6. Literature Review and Meta-Analysis

Search large bodies of text (e.g., abstracts or full articles) for phrases like “working memory” or “statistical learning”

🧪 Example

You have 500 open-text responses to the question:

“How are you feeling today?”

You can use regex to find responses that mention:

tired
exhausted
fatigued

Even if they vary in casing (e.g., “Tired”, “exHAUSTED”) or are embedded in sentences.

💡 Why It Matters

Learning regular expressions makes you a more efficient and independent researcher, especially when working with:

Qualitative data
Large datasets
Automated text coding
Experiment scripting (e.g., PsychoPy, OpenSesame, E-Prime)

✅ Regular Expression (Regex) Cheat Sheet

A quick reference for building regular expressions — useful for coding, text analysis, and platforms like Canvas, Python, and R.

🔤 Character Classes

Pattern	Meaning
`.`	Any character (except newline)
`\w`	Word character: `[a-zA-Z0-9_]`
`\W`	Non-word character: `[^a-zA-Z0-9_]`
`\d`	Digit: `[0-9]`
`\D`	Non-digit: `[^0-9]`
`\s`	Whitespace: space, tab, newline, etc.
`\S`	Non-whitespace

🔢 Quantifiers

Pattern	Meaning
`*`	0 or more times
`+`	1 or more times
`?`	0 or 1 time (optional)
`{n}`	Exactly n times
`{n,}`	n or more times
`{n,m}`	Between n and m times

📦 Anchors

Pattern	Meaning
`^`	Start of string
`$`	End of string
`\b`	Word boundary
`\B`	Not a word boundary

🔀 Groups & Alternation

Pattern	Meaning
`(abc)`	Group exact pattern
`a	b`
`[abc]`	Match any one character: a, b, or c
`[^abc]`	Match any character except a, b, or c

📐 Escaping Special Characters

Character	Escaped Form
`.`	`\.`
`+`	`\+`
`*`	`\*`
`?`	`\?`
`	`
`(` `)`	`$` `$`
`{` `}`	`\{` `\}`
`[` `]`	`\[` `\]`
`\`	`\\`

🎛 Flags (If Supported)

Flag	Meaning
`(?i)`	Case-insensitive matching
`(?m)`	Multiline mode (`^`/`$` match line ends)
`(?s)`	Dot matches newline (`.` includes `\n`)

🧪 Common Examples

Pattern	Matches
`(?i)\btype\s+(i	1)\b`
`\b\d{4}\b`	Any 4-digit number (e.g., “2024”)
`\s+`	One or more whitespace characters
`\b(yes	no
`^Hello`	“Hello” only at the start of the line

✅ Pro Tip: Use .* to allow any number of characters between key words, and \s* to allow optional spaces.

Morris Lab