pandas extractall matching
How can I match the below with a pandas extractall regex: stringwithinmycolumn stuff, Duration: 15h:22m:33s, notstuff, stuff, Duration: 18h:22m:33s, notstuff, Currently, I am using the below: df.message.str.extractall(r',([^,]*?): ([^,:]*?,').reset_index() Expected output: 0 1 match 0 Duration 15h:22m:33s 1 Duration 18h:22m:33s I am not able to match so far.
You may use ,\s*([^,:]+):\s*([^,]+), See the regex demo It matches: , - a comma \s* - 0+ whitespaces ([^,:]+) - Group 1: - 0+ chars other than , and : : - a colon \s* - 0+ whitespaces ([^,]+) - Group 2: one or more chars other than , , - a comma (this actually can be removed, but may stay to ensure safer matching.) Note that you may consider making your regex more precise when you need to extract structured information from long strings. So, you may want to use letter matching pattern to match Duration, and only digits, colon, h, m or s to extract the time value. So, the pattern will become a bit more verbose: ,\s*([A-Za-z]+):\s*([\d:hms]+) but much safer. See another regex demo.
In : x.message.str.extractall(r',\s*(\w+):\s*([^,]*)').reset_index(level=0, drop=True) Out: 0 1 match 0 Duration 15h:22m:33s 0 Duration 18h:22m:33s
How to get this enum if I only have a string representation in Python 2.7
How to escape Unicode in Python 3
Performance of timezone-aware Pandas DateTimeIndex
How to remove single space between text
Writing out results with function in.txt document when there's min and max?
Not understanding why I cant use use cx_Oracle with Django
Convert a complex array of array to a list
sorted on basis of two keys, descending order sort for first and ascending for second
Django + redis session sharing accross multiple hosts
how to access a database of one module from another module
Mqtt subscribe message while continuous publishing to topic
flask-restless validation_exceptions not working for few column in flask-sqlalchemy models
I don't understand how cache work's on GAE python
Get first version of a line with duplicate values versus one column
Using set()/setp() to set unknown properties in matplotlib
Is there a built-in Python function which will return the first True-ish value when mapping a function over an iterable?