[學習筆記] 以python實作正規表達式(Regular expression)

Visits: 0

正規表達式(regular expression; regex)是處理字串的好工具，在任何語言的實作方式上大同小異，本文以python實作，記錄一下上課學到的regex技法。

引用套件

import re
patterns = ['term1', 'term2']
text = 'This is a string with term1, not not the other term'
print(re.search('h', 'hihhhhi')) # search particular pattern, return match object

>>> <re.Match object; span=(0, 1), match='h'>

match = re.search(patterns[0], text)
print(match.start())
print(match.end())

>>> 22
>>> 27

# split method
split_term = "@"
string = "[email protected]"
re.split(split_term, string)

>>> ['aaa', 'gmail.com']

# findall method
re.findall("on", "how do you turn this on and on")

>>> ['on', 'on']

Pattern syntax

def multi_re_find(patterns,phrase):
    '''
    Takes in a list of regex patterns
    Prints a list of all matches
    '''
    for pattern in patterns:
        print('Searching the phrase using the re check: %r' %(pattern))
            # 代換字串的寫法(詳細解釋: https://www.geeksforgeeks.org/str-vs-repr-in-python/)
                # %r用rper()方法處理物件, output會有''
                # %s用str()方法處理物件，output無''
        print(re.findall(pattern,phrase))
        print('\n')

test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd...dds...dddds'

test_patterns = [ 'sd*',     # s followed by zero or more d's
                'sd+',       # s followed by one or more d's
                'sd?',       # s followed by zero or one d's
                'sd{3}',     # s followed by three d's
                'sd{2,3}',   # s followed by two to three d's
                 'd{2}s*',   # two d followed by 0 or more s
                ]

multi_re_find(test_patterns,test_phrase)

file

Character sets

test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd'

test_patterns = ['[sd]',    # either s or d
                's[sd]+']   # s followed by one or more s or d

multi_re_find(test_patterns,test_phrase)

file

Exclusion

For example, [^5] will match any character except '5'

test_phrase = 'This is a string! But it has punctuation. How can we remove it?'
re.findall('[^!.? ]+',test_phrase)

file

Escape code

Escapes are indicated by prefixing the character with a backslash \. Unfortunately, a backslash must itself be escaped in normal Python strings, and that results in expressions that are difficult to read. Using raw strings, created by prefixing the literal value with r"\", eliminates this problem and maintains readability.

test_phrase = 'This is a string with some numbers 1233 and a symbol #hashtag'

test_patterns=[ r'\d+', # sequence of digits
                r'\D+', # sequence of non-digits
                r'\s+', # sequence of whitespace
                r'\S+', # sequence of non-whitespace
                r'\w+', # alphanumeric characters
                r'\W+', # non-alphanumeric
                ]

multi_re_find(test_patterns,test_phrase)

file

重點就是，官方文件看熟，寫久了就會了吧
官方文件

[學習筆記] 以python實作正規表達式(Regular expression)

引用套件

Pattern syntax

Character sets

Exclusion

Escape code

About the Author

swchen

發佈留言取消回覆

You may also like these

[GCP] 服務帳戶 Service Account 消失導致 Compute Engine 無法啟動的解法

[python] 在 oracle 的 in 中使用長字串搜尋

[Udemy] 學習筆記: Taming Big Data with Apache Spark and Python – Hands On!

[筆記] 如何讓 shell 顯示 git branch [powershell/bash]

[學習筆記] 以python實作正規表達式(Regular expression)

引用套件

Pattern syntax

Character sets

Exclusion

Escape code

About the Author

swchen

發佈留言 取消回覆

You may also like these

發佈留言取消回覆