IPython/Jupyter Lab Notebooks in WordPress

I’ve been a hobbiest blogger for a long time, took a web development bootcamp and then transitioned into teaching data science.

When I discovered what was possible with IPython Notebooks in Jupyter Lab, I was psyched. As I started reading more about data science things on the internet, I became really interested in the people that were using Jupyter Lab notebooks for blogging.

For now at least, I’ve set this blog up on WordPress. I’ve been tempted to check out a Python-based option instead, but at the end of the day, WordPress actually works pretty well for me for small, personal blogs.

nbconvert plugin

So I decided to figure out if I could incorporate Jupyter Lab notebooks in my blog posts. There was not a WordPress plugin on the WordPress store that I could find, but there was one on Github. The author has a blog post with more information on it.

It makes adding a IPython Notebook as easy as putting a url in a WordPress shortcode.

Installation

The installation instructions suggested using a plugin called wppusher that allows one to migrate plugins and themes directly from Github, Gitlab, etc. This sounds like a really cool idea, but it wasn’t going to work for me because I have my site setup so that plugins can only be installed on the command line using wp-cli.

Just downloading it as a zip file and installing it via wp-cli worked just fine though. Presumably one could also download it as a zip file and upload it to the WordPress interface if they wanted to install it without using wppusher or wp-cli.

Sample notebook

Here’s an example of how this renders a sample notebook.

Regular Expressions

What are regular expressions?

  • Regular expressions are designed for matching patterns in text.
  • Examples are picking out three digits in a row or characters separated by white space or punctuation(i.e. words in a sentence).
  • They can be extremely useful for parsing & cleaning data - check that entries follow a certain format or extract parts of a string based on certain criteria
  • The downside is that they aren't very human-readable.
  • Sometimes called REs, regexes or regex patterns

Regular Expressions in Python

In [2]:
# re is the regular expression module in python
import re

Most characters match themselves

For example test will match 'test'

In [12]:
# Compiling the regular expression stores it as a regular expression object that you can use
retest = re.compile('test')
mystring = "testing. 1. 2. 3. ..."
# The match method looks for matches at the beginning of a string
# If there is a match it will return a match object 
# The match object has information about what matched
# information about what part of the string matched
m = retest.match(mystring)
print(m)
<re.Match object; span=(0, 4), match='test'>
In [9]:
# If there is no match, it will return None
mystring2 = "You can learn to be a good test taker"
print(retest.match(mystring2))
None

Metacharacters - special characters

These characters have special meanings and do not match themselves . ^ $ * + ? { } [ ] \ | ( )

Square Brackets - specify a set of characters to match

  • Called a character class
  • Characters can be listed individually or with a - to indicate a range
  • [aeiouAEIOU] would match any vowel
  • [a-z] would match any lowercase letter
  • Metacharacters are not active in character classes [ab\] would match a,b or \
  • You can match charcters not listed in a set by complementing a set with ^ at the beginning
  • [^aeiouAEIOU] would match any character that is NOT a vowel
In [15]:
rebrackets = re.compile('[aeiouAEIOU]')
mystring3 = "I like Python, cookies and tea!"
m = rebrackets.match(mystring3)
print(m)
<re.Match object; span=(0, 1), match='I'>

Asterix - match ZERO or more times

In [17]:
# the * matches zero or more 'a's
recat = re.compile('ca*t')
# all of these will match
m1 = recat.match("ct")
m2 = recat.match("cat")
m3 = recat.match("caaaaaaaaaat")
print(m1, m2, m3)
<re.Match object; span=(0, 2), match='ct'> <re.Match object; span=(0, 3), match='cat'> <re.Match object; span=(0, 12), match='caaaaaaaaaat'>

Plus sign - match ONE or more times

In [ ]:
# the + matches zone or more 'a's
recat = re.compile('ca+t')
# all of these will match
m1 = recat.match("ct")
m2 = recat.match("cat")
m3 = recat.match("caaaaaaaaaat")
print(m1, m2, m3)

Question mark - match zero or one times

-think of this as an matching an optional character

In [18]:
retwo = re.compile('two year-?old')
m1 = retwo.match('two year-old')
m2 = retwo.match('two yearold')
print(m1, m2)
<re.Match object; span=(0, 12), match='two year-old'> <re.Match object; span=(0, 11), match='two yearold'>

Backslashes - escape out special characters

If you want to match another special character the \ in front of it lets you do this

Backslashes are also escape characters for strings in Python which makes things tricky and weird - read more at this link: The Backslash Plague

In [20]:
# this will match '2+2' whereas '2+2' without the \ would match any series of more than three 2s
replus = re.compile('2\+2')
m1 = replus.match('2+2')
print(m1)
<re.Match object; span=(0, 3), match='2+2'>

Pipe (bar, vertical line or whatever you want to call it) - match this OR that

In [3]:
redrink = re.compile('coffee|tea')
# using search looks for the regex anywhere in the string
# using match as above just looks at the beginning
m = redrink.search('I like coffee')
print(m)
<re.Match object; span=(7, 13), match='coffee'>

Matching numbers, whitespace, characters, etc.

There are more of these that you can look up, but here are a few to get started with

  • \d will match any digit (inverse \D will match anything EXCEPT a digit)
  • \w will match a 'word character' - letter, digit or underscore (inverse \W will match anything EXCEPT a word character)
  • \s will match any whitespace character (inverse \S will match anything EXCEPT whitespace)
  • \t matches Tab
In [5]:
# This will match any number of digits
renum = re.compile('\d+')
m1 = renum.search('I am 98 years old!')
print(m1)
<re.Match object; span=(5, 7), match='98'>

Groups - ()

Groups let you catch parts of a string to use seperately

In [9]:
reperson = re.compile('(\w+):(\d+)')
# let's say you have data in the form 'name:age'
m1 = reperson.match('Cliff:25')
# group(0) is always the entire result
name = m1.group(1)
age = m1.group(2)
print(m1.group(0), name, age)
Cliff:25 Cliff 25

Types of matches

  • match() - looks to match the beginning of a string
  • search() - will match anywhere in the string
  • findall() - Find all substrings that match and return as a list
  • finditer() - Gind all substrings that match and retrun as an iterator
In [10]:
renum = re.compile('\d+')
matches = renum.findall("There were 14 mice that lived at 8756 2nd Avenue in a 6 bedroom house.")
print(matches)
['14', '8756', '2', '6']

More options

There are many options for things you can do with regular expressions. Some of them include:

  • Also testing what is ahead or behind your expression without capturing it
  • Capturing and non-capturing groups
  • Testing if something is before or after a new line
  • Split a string into substrings by where a regex is found
  • Replace parts of a string

Examples

Match Parcel Numbers

In Oklahoma, most counties follow a convention for creating a unique ID for each parcel of land owned in the county.

For rural parcels, this convention is: 0000-SS-TTT-RRR-Q-PPP-00 SS is a two digit section number from 01 to 36 TTT is a township expressed as two digits and a N or a S (north or south) RRR is a range expressed as two digits and a E or a W (east or west) Q is for the quarter section- a number from 1 to 4 (NE, NW, SW, SE) PPP is a number assigned consecutively for parcels in a quarter section

Create a regular expression to test if a string meets this format.

In [11]:
testNumbers = ['0000-03-24N-19E-3-008-00',
    '0000-22-24N-20E-1-002-00',
    '0000-35-24N-20E-1-008-00',
    '0000-35-24N-20E-1-005-00',
    '000-24-24S-21E-1-005-00',
    '0000-24-05N-21E-1-002-00',
    '0000-24-24N-21E-4-001-00',
    '0000-16-25N-19T-2-002-00',
    '0000-34-25S-21E-2-002-00',
    '0000-37-25B-21W-2-007-00',
    '0000-24-26N-18E-1-004-00']

Homework : Match Phone Numbers

Use a regular expression to pick just the phone numbers out from this list AND store them in a new list in the format XXX-XXX-XXXX.

In [ ]:
contacts = ['Mary: 505-343-7644', 'Santana: (834)-434-5879', 'Brandon - 436.753.4956']

Leave a Reply

Your email address will not be published.