python


pandas extractall matching


How can I match the below with a pandas extractall regex:
stringwithinmycolumn
stuff, Duration: 15h:22m:33s, notstuff,
stuff, Duration: 18h:22m:33s, notstuff,
Currently, I am using the below:
df.message.str.extractall(r',([^,]*?): ([^,:]*?,').reset_index()
Expected output:
0 1
match
0 Duration 15h:22m:33s
1 Duration 18h:22m:33s
I am not able to match so far.
You may use
,\s*([^,:]+):\s*([^,]+),
See the regex demo
It matches:
, - a comma
\s* - 0+ whitespaces
([^,:]+) - Group 1: - 0+ chars other than , and :
: - a colon
\s* - 0+ whitespaces
([^,]+) - Group 2: one or more chars other than ,
, - a comma (this actually can be removed, but may stay to ensure safer matching.)
Note that you may consider making your regex more precise when you need to extract structured information from long strings. So, you may want to use letter matching pattern to match Duration, and only digits, colon, h, m or s to extract the time value. So, the pattern will become a bit more verbose:
,\s*([A-Za-z]+):\s*([\d:hms]+)
but much safer. See another regex demo.
In [246]: x.message.str.extractall(r',\s*(\w+):\s*([^,]*)').reset_index(level=0, drop=True)
Out[246]:
0 1
match
0 Duration 15h:22m:33s
0 Duration 18h:22m:33s

Related Links

How to get this enum if I only have a string representation in Python 2.7
How to escape Unicode in Python 3
Performance of timezone-aware Pandas DateTimeIndex
How to remove single space between text
Writing out results with function in.txt document when there's min and max?
Not understanding why I cant use use cx_Oracle with Django
Convert a complex array of array to a list
sorted on basis of two keys, descending order sort for first and ascending for second
Django + redis session sharing accross multiple hosts
how to access a database of one module from another module
Mqtt subscribe message while continuous publishing to topic
flask-restless validation_exceptions not working for few column in flask-sqlalchemy models
I don't understand how cache work's on GAE python
Get first version of a line with duplicate values versus one column
Using set()/setp() to set unknown properties in matplotlib
Is there a built-in Python function which will return the first True-ish value when mapping a function over an iterable?

Categories

HOME
sidekiq
pug
ibm-watson-cognitive
elixir
openssl
checkbox
freepascal
mapping
angular2-routing
cpu-architecture
agile
label
formal-verification
extract
hana
phpstorm
chaiscript
automated-tests
ll
gspread
height
powershell-v3.0
cpanel
dlib
local
typeahead
titan
ssms-2016
aptana
jcl
thumbnails
aws-cognito
shippo
header-files
rworldmap
logarithm
auditing
dpi
configure
blackboard
cell-array
roundcube
rowcount
svnkit
salesforce-chatter
openpgp
nunit-3.0
android-download-manager
brute-force
floating-accuracy
socketscan
bayesian-networks
intersystems-ensemble
jdbi
catia
nsjsonserialization
nuget-server
addin-express
pspice
tomee
uistackview
vst
cron-task
spark-cassandra-connector
database-backups
supportmapfragment
amazon-kcl
handlebars.java
altbeacon
plone-3.x
harp
divide-by-zero
pgm
pylearn
consensus
start-job
tilestache
getimagedata
code-testing
argument-passing
alertifyjs
opendata
nservicebus4
image-zoom
nscolor
haskell-platform
multiple-conditions
client-side-scripting
grunt-contrib-compass
fraud-prevention
zend-lucene
vt100
gdataxml
moq-3
xmlslurper
gdt
viewdidload
nsindexpath
django-nose
authenticode
getresource
mvccontrib-grid
method-signature

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App