Regex Survival Guide

Matt Choi
4 min readJan 12, 2021

--

I’m sure many of us have come across regular expression (RegEx) before and were confused about what we are looking at. At first, I just nodded my head and agreed to whatever it was that I copied and pasted because my code worked. However, I find myself needing RegEx more than I would like and would have to try and google what I wasting a lot of my time. Well after finally caving into learning a bit of RegEx I decided to create a survival guide to help others save time googling for answers and people who want a quick way to get by with RegEx.

Brief History About RegEx

Regex originally was created in the 1940s and in a completely different field of neuroscience. Later Ken Thompson brought regex into the computer science world in 1968 when Ken implemented regular expression inside the UNIX text editor.

Uses

Regular Expression can be very powerful and useful over any programming language, I will briefly go over some of the uses of regex (regular expression)

Validate Text

  • Can check if an email is valid
  • Can check if a phone number is valid

Parsing Text

  • Replace certain text occurrence with a different text
  • Count the number of occurrences of a character or word
  • Search for a specific word or words
  • Finding duplicates
  • Convert newlines into spaces or vice versa

Syntax for RegEx

/expression/flags

Expression

Whatever you are looking for that is written with character classes, brackets, or quantifications down below.

Flags

Flags aren’t required but they are there to help you with your search if needed. JavaScript has a total of 6 flags, but I will just go over the basic two that will probably be the most common ones that you will use.

/expression/i
  • adding the ‘i’ will make the expression being searched case insensitive, meaning it doesn’t matter if it is upper or lowercase.
/expression/g
  • adding the ‘g’ will make search for all of the matches instead of just the first match.

Character Classes

Character classes is basically a list of characters that we can try to find and match with. Down below is a break down of what each one means.

/./
/\d/
/\w/
/\s/
/\b/
/\B/
/\S/
/\W/
/\D/
  • using ‘.’ searches for any character except new lines.
  • using ‘\d’ searches for any digit character (0–9).
  • using ‘\w’ searches for any word character (a-z and A-Z).
  • using ‘\s’ searches for any whitespaces (spaces, newlines, tabs).
  • using ‘\b’ searches for boundary, for example \bhello\b
  • using ‘\B’ searches for non-boundary.
  • using ‘\S’ searches for non-whitespaces.
  • using ‘\W’ searches for non-word characters
  • using ‘\D’ searches for non-digits.

Brackets

/[]/
/[^]/
/[$]/
  • using ‘[]’ searches for anything within the brackets, for example /[a-d]/ will only search for characters in-between a-d.
  • using ‘[^]’ searches for anything that is not those characters, for example /[^a-d]/ will search for characters that aren’t in-between a-d.
  • using ‘[$]’ searches at the end of a string.

Quantifications

|
*
?
+
{int}
{minInt, maxInt}
()
  • using ‘|’ means the logical “or”.
  • using ‘*’ means occurrence of zero or more.
  • using ‘?’ means occurrence of zero or one.
  • using ‘+’ means occurrence of one or more.
  • using ‘{int}’ means the exact number of occurrences.
  • using ‘{minInt, maxInt}’ means you are setting a min and max value of occurrences.
  • using ‘()’ means you are grouping things together like just in math parenthesis.

Escape Special Characters

. [] {} () ^ $ * + - ? \ |
  • if you are searching for any of the special characters above you will need to use a backslash ‘\’ in order to “escape” the special character and to let it be searched for.

RegEx Examples

/[1-7a-h]/ig

The RegEx above broken down means the following:

  • ‘ 1–7’ are the numbers that are being searched for.
  • ‘a-h’ are the letters that are being searched for.
  • ‘i’ means that it will search for ‘a-h’ as well as ‘A-H’.
  • ‘g’ means that it will search for all matches instead of just the first match
Mr Choi
Mr. Jung
Ms Song
Miss S
Mrs J
/M(r|s|rs)\.?\s[A-Z]\w/

The RegEx above broken down means the following:

  • ‘M’ will be the required character that needs to be found.
  • (r|s|rs) is saying that following directly after ‘M’ can be an ‘r’, ‘s’, or ‘rs’ meaning it can either be Mr, Ms, or Mrs.
  • ‘\.’ since an escape character is before the period this means we are searching for a period to be after Mr, Ms or Mrs.
  • ‘?’ however, since the question mark is right after ‘\.’ this means that the period isn’t required to be there in order for it to be selected.
  • ‘\s’ is saying that after a whitespace is required to be there.
  • ‘[A-Z]’ is saying that the next character needs to be an uppercase letter after the whitespace.
  • ‘\w*’ is saying that after the capitalized letter another character may or may not exist and is not required to exist.

After reading this, try to figure out the list of names that this RegEx would match with. The answer will be provided below if you are curious to know!

Answers:

Mr Choi

Mr. Jung

Ms Song

Mrs J

--

--

No responses yet