python


pandas extractall matching


How can I match the below with a pandas extractall regex:
stringwithinmycolumn
stuff, Duration: 15h:22m:33s, notstuff,
stuff, Duration: 18h:22m:33s, notstuff,
Currently, I am using the below:
df.message.str.extractall(r',([^,]*?): ([^,:]*?,').reset_index()
Expected output:
0 1
match
0 Duration 15h:22m:33s
1 Duration 18h:22m:33s
I am not able to match so far.
You may use
,\s*([^,:]+):\s*([^,]+),
See the regex demo
It matches:
, - a comma
\s* - 0+ whitespaces
([^,:]+) - Group 1: - 0+ chars other than , and :
: - a colon
\s* - 0+ whitespaces
([^,]+) - Group 2: one or more chars other than ,
, - a comma (this actually can be removed, but may stay to ensure safer matching.)
Note that you may consider making your regex more precise when you need to extract structured information from long strings. So, you may want to use letter matching pattern to match Duration, and only digits, colon, h, m or s to extract the time value. So, the pattern will become a bit more verbose:
,\s*([A-Za-z]+):\s*([\d:hms]+)
but much safer. See another regex demo.
In [246]: x.message.str.extractall(r',\s*(\w+):\s*([^,]*)').reset_index(level=0, drop=True)
Out[246]:
0 1
match
0 Duration 15h:22m:33s
0 Duration 18h:22m:33s

Related Links

Not able to include widgets in a Toplevel container in Tkinter
-1 returns second to last item in python list
How to Convert Each Character in a String using Python
Difference between linear regression in Python (and R) and Stata
Default value of Django's model doesn't appear in SQL
Errno 2 - No such file or directory
Emails generated in loop not sending subject
OpenCV - Create multichannel Mat from numpy array
Python urlencode don't encode special characters
Making a sequence of tuples unique by a specific element
Can we make many views.py in Django as a Controller?
What is the status of Functional Reactive Programming in Python?
How to send a request by a private protocol with Python
Django+MongoDb connection error
Odoo/OpenERP failed mail handling
Makefile cannot find module in Python3

Categories

HOME
arrays
caching
jenkins-plugins
checkbox
sed
wsdl
ipython
agile
d
sbt-assembly
apple-numbers
nodemailer
jframe
actionscript-2
dropbear
viber
rebol2
aspell
pyyaml
pyephem
tree-traversal
ip-camera
multicore
jconsole
rails-activerecord
extractor
memsql
qsub
jflex
katharsis
jenkins-job-dsl
mayavi
stocks
preg-grep
hotmail
geopy
body-parser
recurrence-relation
service-locator
oscommerce
broadleaf-commerce
liquid-xml
jquery-multidatespicker
knockout-3.0
ttcn
worker
walmart-electrode
rpostgresql
pagefile
uicollectionviewlayout
communication-protocol
date-range
createobject
sql-server-administration
hspec
backstop.js
tomee
image-editing
multilingual-app-toolkit
picking
eml
brackets-shell
loopj
web-essentials
license-key
void
anythingslider
lemoon
datainputstream
schtasks.exe
file-diffs
sem
achartengine
gray-code
swift2.1
django-1.6
adcolony
firebug-lite
dalekjs
facebook-chat
bigint
osx-snow-leopard
fogbugz-api
ienumerator
nuspec
resgen
soa-suite
fluidsynth
snapjs
sequelpro
vim-powerline
feof
getusermedia
shim
getstring
v4l
ou
posting
associative
collect
noir
adobe-contribute
chatroom
google-instant
webrat
oggvorbis
nt4
method-signature
gears

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App