python


How to replace a pattern in a string?


Hi I am trying to replace all the expressions containing 'www...' and 'http://..' with just 'URL'. I tried this but I am getting this error.
TypeError: expected string or buffer
My code is:
df['text_1'] = re.sub('((www\.[^\s]+)|(https?://[^\s]+))','URL',df['text'])
df[text] contains tweets, so I want to keep only the text in there.
I am in Python 2
Thanks.

Assuming df is a pandas DataFrame, don't use re.sub. Use pandas.DataFrame.replace instead:
df['text_1'] = df['text'].replace('((www\.[^\s]+)|(https?://[^\s]+))',
'URL',
regex=True)
This will generate a new column text_1 with all values of text replaced according to your regular expression.

It sounds like you're getting that error because you're not supplying a string or buffer as the third argument to re.sub.
>>> re.sub('\W', 'REPLACED', 'this is my text')
'thisREPLACEDisREPLACEDmyREPLACEDtext'
>>> re.sub('\W', 'REPLACED', None)
Traceback (most recent call last):
...
TypeError: expected string or buffer
Ensure that df['text'] contains a proper string before you try using it for re.sub


Related Links

BeautifulSoup crawling cookies
Selenium get_screenshot_as_file vs get_screenshot_as_base64?
How to print or store a selected column using pandas
make sphinx viewcode show module sources
Strategies for HTML processing in Python: Ambiguous Characters
Detect Audio with Selenium Webdriver and Python
Python scripting for NetLogo?
Python: input file is output file
glcoud auth login doesn't ask for verification code
python - going from a stack trace to a tree
Python r Preceding Quoted Windows Registry Key
Python cannot ssh to a server and print the expected output using subprocess
Python variable can not be same name as function it is calling like it is possible in PHP? [closed]
Setting path to firefox binary on windows with selenium webdriver
How to determine what value a variable is assigned to [closed]
Compress an array in python?

Categories

HOME
erlang
checkbox
warnings
backup
thunderbird-addon
computer-vision
h2
automated-tests
appmaker
histogrammar
jpanel
data-science-experience
dropbox
flexbox
wysiwyg
nuxt.js
rhapsody
arraylist
off-canvas-menu
aptana
ms-access-2007
embedly
cruisecontrol.net
apply
thingsboard
hhvm
kitematic
tarantool
iup
jquery-scrollify
auditing
sumo
helper
ivy
picturebox
ipfw
idl
geopy
fabric-digits
nsurlconnection
node-horseman
reactivemongo
google-guava-cache
linq-to-entities
nashorn
median
powermta
apiary
fqdn
photon-controller
pagefile
vici
inject
pdflib
cjson
cron-task
dbscan
license-key
lemoon
android-cursoradapter
jta
jericho-html-parser
team-build
leadtools-sdk
rx-groovy
web-component-tester
ember-cli-addons
code-first-migrations
activity-streams
cpu-speed
fragment-tab-host
iostream
log4cplus
surrogate-key
level
indexing-service
help-viewer
gamepad
windows-mobile-6
loop-invariant
asp.net-mvc-scaffolding
goinstant
trimming
execvp
zend-lucene
appfog
prng
crocodoc
maven-ear-plugin
moq-3
macruby
expression-evaluation
selectmanycheckbox
ou
symbol-server
hibernate3-maven-plugin
scala-2.8
webrat
windows-controls
revision
gears
divx
rd





Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm