[學習筆記] 以python實作正規表達式(Regular expression)

Visits: 0

正規表達式(regular expression; regex)是處理字串的好工具,在任何語言的實作方式上大同小異,本文以python實作,記錄一下上課學到的regex技法。

引用套件

import re
patterns = ['term1', 'term2']
text = 'This is a string with term1, not not the other term'
print(re.search('h', 'hihhhhi')) # search particular pattern, return match object

>>> <re.Match object; span=(0, 1), match='h'>

match = re.search(patterns[0], text)
print(match.start())
print(match.end())

>>> 22
>>> 27

# split method
split_term = "@"
string = "[email protected]"
re.split(split_term, string)

>>> ['aaa', 'gmail.com']

# findall method
re.findall("on", "how do you turn this on and on")

>>> ['on', 'on']

Pattern syntax

def multi_re_find(patterns,phrase):
    '''
    Takes in a list of regex patterns
    Prints a list of all matches
    '''
    for pattern in patterns:
        print('Searching the phrase using the re check: %r' %(pattern))
            # 代換字串的寫法(詳細解釋: https://www.geeksforgeeks.org/str-vs-repr-in-python/)
                # %r用rper()方法處理物件, output會有''
                # %s用str()方法處理物件,output無''
        print(re.findall(pattern,phrase))
        print('\n')

test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd...dds...dddds'

test_patterns = [ 'sd*',     # s followed by zero or more d's
                'sd+',       # s followed by one or more d's
                'sd?',       # s followed by zero or one d's
                'sd{3}',     # s followed by three d's
                'sd{2,3}',   # s followed by two to three d's
                 'd{2}s*',   # two d followed by 0 or more s
                ]

multi_re_find(test_patterns,test_phrase)

file

Character sets

test_phrase = 'sdsd..sssddd...sdddsddd...dsds...dsssss...sdddd'

test_patterns = ['[sd]',    # either s or d
                's[sd]+']   # s followed by one or more s or d

multi_re_find(test_patterns,test_phrase)

file

Exclusion

For example, [^5] will match any character except '5'

test_phrase = 'This is a string! But it has punctuation. How can we remove it?'
re.findall('[^!.? ]+',test_phrase)

file

Escape code

Escapes are indicated by prefixing the character with a backslash \. Unfortunately, a backslash must itself be escaped in normal Python strings, and that results in expressions that are difficult to read. Using raw strings, created by prefixing the literal value with r"\", eliminates this problem and maintains readability.

test_phrase = 'This is a string with some numbers 1233 and a symbol #hashtag'

test_patterns=[ r'\d+', # sequence of digits
                r'\D+', # sequence of non-digits
                r'\s+', # sequence of whitespace
                r'\S+', # sequence of non-whitespace
                r'\w+', # alphanumeric characters
                r'\W+', # non-alphanumeric
                ]

multi_re_find(test_patterns,test_phrase)

file


重點就是,官方文件看熟,寫久了就會了吧
官方文件

About the Author

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

You may also like these