python


Remove emoji flags from text in Python


I'm trying to remove all emojis, including emoji flag of Macau 🇲🇴 from my Python string. I've tried several standard regular expressions and regex from the emoji lib, but do not succeed in removing it.
My code:
def remove_emoji(text):
emoji_pattern = re.compile(
u"(\ud83d[\ude00-\ude4f])|" # emoticons
u"(\ud83c[\udf00-\uffff])|" # symbols & pictographs (1 of 2)
u"(\ud83d[\u0000-\uddff])|" # symbols & pictographs (2 of 2)
u"(\ud83d[\ude80-\udeff])|" # transport & map symbols
u"(\ud83c[\udde0-\uddff])" # flags (iOS)
"+", flags=re.UNICODE)
return emoji_pattern.sub('', text)
Your patterns seem not to indicate the flag emoji (and possibly other glyphs) you are targeting.
E.g., to remove the flag:
def remove_emoji(text):
emoji_pattern = re.compile(
u'(\U0001F1F2\U0001F1F4)|' # Macau flag
u'([\U0001F1E6-\U0001F1FF]{2})|' # flags
u'([\U0001F600-\U0001F64F])' # emoticons
"+", flags=re.UNICODE)
return emoji_pattern.sub('', text)
Note the capital-U (\U) escape signifying 32-bit hex values. Flags and emoji are way up in the high Unicode values. Also, flags seem especially complicated, as they're two-codepoint combinations. Once you've got the right characters targeted (as demonstrated with the Macau flag), you can extend with a character set (here demonstrated with an expression for all the pairs for the Regional Indicator Symbols.
You can also start to add back in descriptions for other emoji and symbols; here for example done for basic emoticon block.
With the above definition:
flag = '\U0001F1F2\U0001F1F4'
emote = '\U0001F620'
​
print("flag: {!r} gone: {!r}".format(flag, remove_emoji(flag)))
print("emote: {!r} gone: {!r}".format(emote, remove_emoji(emote)))
Yields:
flag: '🇲🇴' gone: ''
emote: '😠' gone: ''
You can further extend this with the other blocks you'd like to target. I recommend you look them up individually, looking for the 32-bit notation. Note that you will often see them written U+1Fxyz; these need to be restated as \U0001Fxyz for Python. If you want to remove all the symbols ("all emoji") you can do so with a broad character set. But if you want to be precise and remove only a limited set of symbols, you will need to use care. One of the sets you're targeting, for example, Transport and Map Symbols, comprises five independent ranges, with overlaps on the full emoji set.

Related Links

Python While Loop how to rerun
How does str(list) work?
Django Bad File Descriptor on live server
Why is My Django Form Executed Twice?
In matplotlib, what's the difference between title() and suptitle()?
Modifying and rewriting .csv files in Python
How do I scroll to a certain widget in a QScrollArea
NoSuchKey when getting a signed url for a cloudstorage object with a space in the name
Not sure how to parse this
Error drawing 3D graph in python
How to change objects in a python script by using a keyboardinterrupt for specific keys?
Arguments to an object's parent's function
python regex ignoring underscore incorrectly
Will installing Anaconda3 change Mac OS X default Python version to 3.4?
Devices Labels. Python Code Debugging
Adding list of values to rows, turning the dataframe into long format afterwards

Categories

HOME
json
jsf
firebase-app-indexing
ios10
actionscript-2
contact-form-7
ipfs
google-shopping
cs-cart
data-analysis
ida
session-timeout
pyyaml
eclipselink
clover
prediction
qsub
pdfsharp
pptp
frp
xilinx-ise
nesc
weinre
referenceerror
protobuf-net
spring-profiles
polymorphism
shapes
nsurlconnection
transient
boilerplate
poco-libraries
active-model-serializers
fax
jgraph
text-classification
mbaas
info
mercury
jcreator
sqldatareader
barcode-printing
zurb-foundation-apps
scalar
instruments
database-backups
project-online
eml
servlet-3.0
insert-into
site-prism
pyopengl
pgm
swift2.1
xaml-designer
ticket-system
chaining
jcr-sql2
kallithea
adcolony
ember-components
emma
proxies
smartystreets
httpcontext
angulartics
.net-cf-3.5
sql-server-2012-web
dllexport
haskell-platform
surveyor-gem
iconv
fraud-prevention
xslkey
arbor.js
xml-libxml
mvccontrib
static-variables
fotoware
windows-phone-7-emulator
signals2
visual-web-gui
coff
django-nose
cookieless
handheld

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App