Regular Expressions

Topics Covered

Helpful links

Why Use Regular Expressions

A regular expression (regex) is a pattern of characters that defines a search pattern in the text. You may have used something similar called wild cards. The search pattern “ook” would match all of the following: book, look, rookie, and cookie. Regular expressions are similar to wildcards but much more complex. In most cases, these patterns are employed by string-searching algorithms for “find” or “find and replace” functions on strings, as well as for input validation.

The Importantance of Regular Expressions

First rule of coding, validate your input. Second rule of coding, see first rule

Regular Expressions allow you to validate data supplied to your application. If you aren’t careful bad things can happen. I don’t want to get into the weeds, so if you want to read more read this

We can use RegEx to check for patterns in text strings, such as matching an IP address, domain name, or a valid email address. One of the beautiful aspects of RegEx is that it enables you to create your own search criteria for a pattern to match your needs, and although it might be intimidating because it’s like speaking a new language and once you learn the basics, it will level up your coding skills.

Python Regular Expression Special Characters

^ – Matches at the beginning of a line
$ – Matches at the end of a line
* – Matches zero or more or one or more times respectively
? – The preceding item is optional and matched at most once
\w – Matches any word character (a-z, A-Z, 0-9)
\W – Matches nonword characters. This is the opposite of \w.
\s – Matches any whitespace character

There are too many to cover here so check out this
Python Regex Cheat Sheet

Using Regular Expressions

Using Regular Expression Function search()

The search() function searches a string for a match and returns a Match object if one is found.

If there are multiple matches, only the first of each will be returned:

import re
ourString = "Abcde Acbde Bcdef bcdef"

regex1 = "^A"
a = re.search(regex1, ourString )
a[0]

'A'

regex2 = "^Abcde"
b = re.search(regex2, ourString )
b[0]

'Abcde'

Using the Regular Expression Function findall()

The findall() function returns a list containing all of the matches.

regex3 = "bcd"
c = re.findall(regex3, ourString)
c

['bcd', 'bcd']

Using the Regular Expression Function sub()

Sub() is a function that replaces the matches with the text of your choosing:

regex4 = "bcd"
replaceStr = "foo"
d = re.sub(regex4, replaceStr, ourString)
d

'Afooe Acbde Bcdef fooef'

Real World Regex Example

Lets imagine that you have an application that accepts and registers email addresses

You will want to ensure that the email address you are getting are valid address.

Typical format for email addresses: (uniqueEmailString)@(subdomain).(rootdomain)

NOTE: We are only doing a basic email validation.

Validation Requirements

On uniqueEmailString
- 64 character limit
- Only Alphanumeric
On subdomain
- 253 character limit
- Only Alphanumeric, hyphen, and periods
On rootdomain
- 2 to 3 character limit
- Alphanumeric

import re

emails = ["hello@journeyintopython.com","invalid%email@domain.com","badroot@domain.comd","foo@bad&domain.com"]
uniqEmailStrRegex = '[\w]{,64}'
subdomainRegex = '[\w.]{,253}'
rootdomainRegex = '[A-Za-z]{2,3}'
regex = re.compile(f'{uniqEmailStrRegex}@{subdomainRegex}\.{rootdomainRegex}')

for email in emails:
  if re.fullmatch(regex, email):
      print("Valid email")
  else:
      print("Invalid email")

Valid email
Invalid email
Invalid email
Invalid email

Python Regular Expression Helpful Links

Kevin Flint