Topics Covered
Helpful links
- Full Advanced Python Course Link
- Gitlab Code Page
- Additional Help at Python.org
- Google Colab: The Easiest Way to Code
Why Use Regular Expressions
A regular expression (regex) is a pattern of characters that defines a search pattern in the text. You may have used something similar called wild cards. The search pattern “ook” would match all of the following: book, look, rookie, and cookie. Regular expressions are similar to wildcards but much more complex. In most cases, these patterns are employed by string-searching algorithms for “find” or “find and replace” functions on strings, as well as for input validation.
The Importantance of Regular Expressions
First rule of coding, validate your input. Second rule of coding, see first rule
Regular Expressions allow you to validate data supplied to your application. If you aren’t careful bad things can happen. I don’t want to get into the weeds, so if you want to read more read this
We can use RegEx to check for patterns in text strings, such as matching an IP address, domain name, or a valid email address. One of the beautiful aspects of RegEx is that it enables you to create your own search criteria for a pattern to match your needs, and although it might be intimidating because it’s like speaking a new language and once you learn the basics, it will level up your coding skills.
Python Regular Expression Special Characters
- ^ – Matches at the beginning of a line
- $ – Matches at the end of a line
- * – Matches zero or more or one or more times respectively
- ? – The preceding item is optional and matched at most once
- \w – Matches any word character (a-z, A-Z, 0-9)
- \W – Matches nonword characters. This is the opposite of \w.
- \s – Matches any whitespace character
There are too many to cover here so check out this
Python Regex Cheat Sheet
Using Regular Expressions
Using Regular Expression Function search()
The search() function searches a string for a match and returns a Match object if one is found.
If there are multiple matches, only the first of each will be returned:
import re
ourString = "Abcde Acbde Bcdef bcdef"
regex1 = "^A"
a = re.search(regex1, ourString )
a[0]
'A'
regex2 = "^Abcde"
b = re.search(regex2, ourString )
b[0]
'Abcde'
Using the Regular Expression Function findall()
The findall() function returns a list containing all of the matches.
regex3 = "bcd"
c = re.findall(regex3, ourString)
c
['bcd', 'bcd']
Using the Regular Expression Function sub()
Sub() is a function that replaces the matches with the text of your choosing:
regex4 = "bcd"
replaceStr = "foo"
d = re.sub(regex4, replaceStr, ourString)
d
'Afooe Acbde Bcdef fooef'
Real World Regex Example
Lets imagine that you have an application that accepts and registers email addresses
You will want to ensure that the email address you are getting are valid address.
Typical format for email addresses: (uniqueEmailString)@(subdomain).(rootdomain)
NOTE: We are only doing a basic email validation.
Validation Requirements
- On uniqueEmailString
- 64 character limit
- Only Alphanumeric
- On subdomain
- 253 character limit
- Only Alphanumeric, hyphen, and periods
- On rootdomain
- 2 to 3 character limit
- Alphanumeric
import re
emails = ["hello@journeyintopython.com","invalid%email@domain.com","badroot@domain.comd","foo@bad&domain.com"]
uniqEmailStrRegex = '[\w]{,64}'
subdomainRegex = '[\w.]{,253}'
rootdomainRegex = '[A-Za-z]{2,3}'
regex = re.compile(f'{uniqEmailStrRegex}@{subdomainRegex}\.{rootdomainRegex}')
for email in emails:
if re.fullmatch(regex, email):
print("Valid email")
else:
print("Invalid email")
Valid email
Invalid email
Invalid email
Invalid email
Python Regular Expression Helpful Links
- Online Python Regular Expression Validator
- Python Regex Cheat Sheet
- Python.org Regular Expression How To