python


Remove emoji flags from text in Python


I'm trying to remove all emojis, including emoji flag of Macau ๐Ÿ‡ฒ๐Ÿ‡ด from my Python string. I've tried several standard regular expressions and regex from the emoji lib, but do not succeed in removing it.
My code:
def remove_emoji(text):
emoji_pattern = re.compile(
u"(\ud83d[\ude00-\ude4f])|" # emoticons
u"(\ud83c[\udf00-\uffff])|" # symbols & pictographs (1 of 2)
u"(\ud83d[\u0000-\uddff])|" # symbols & pictographs (2 of 2)
u"(\ud83d[\ude80-\udeff])|" # transport & map symbols
u"(\ud83c[\udde0-\uddff])" # flags (iOS)
"+", flags=re.UNICODE)
return emoji_pattern.sub('', text)
Your patterns seem not to indicate the flag emoji (and possibly other glyphs) you are targeting.
E.g., to remove the flag:
def remove_emoji(text):
emoji_pattern = re.compile(
u'(\U0001F1F2\U0001F1F4)|' # Macau flag
u'([\U0001F1E6-\U0001F1FF]{2})|' # flags
u'([\U0001F600-\U0001F64F])' # emoticons
"+", flags=re.UNICODE)
return emoji_pattern.sub('', text)
Note the capital-U (\U) escape signifying 32-bit hex values. Flags and emoji are way up in the high Unicode values. Also, flags seem especially complicated, as they're two-codepoint combinations. Once you've got the right characters targeted (as demonstrated with the Macau flag), you can extend with a character set (here demonstrated with an expression for all the pairs for the Regional Indicator Symbols.
You can also start to add back in descriptions for other emoji and symbols; here for example done for basic emoticon block.
With the above definition:
flag = '\U0001F1F2\U0001F1F4'
emote = '\U0001F620'
โ€‹
print("flag: {!r} gone: {!r}".format(flag, remove_emoji(flag)))
print("emote: {!r} gone: {!r}".format(emote, remove_emoji(emote)))
Yields:
flag: '๐Ÿ‡ฒ๐Ÿ‡ด' gone: ''
emote: '๐Ÿ˜ ' gone: ''
You can further extend this with the other blocks you'd like to target. I recommend you look them up individually, looking for the 32-bit notation. Note that you will often see them written U+1Fxyz; these need to be restated as \U0001Fxyz for Python. If you want to remove all the symbols ("all emoji") you can do so with a broad character set. But if you want to be precise and remove only a limited set of symbols, you will need to use care. One of the sets you're targeting, for example, Transport and Map Symbols, comprises five independent ranges, with overlaps on the full emoji set.

Related Links

How do people usually operate information on server database?
Python: Return function wonโ€™t return a list
Which python Linux IDEs support GAE's webapp2 framework?
Using an argument from one Function in a separate function
Can Nikola bootstrap Jinja2 theme render navigation dropdown menu in nikola?
Running Django unittests causes South migrations to duplicate tables
Python - Regex capture multiple fields and build a dictionary with them
Running multiple Python scripts
programatically change matplotlib fill data
Python: include module into one *.py file
TypeError: can't multiply sequence by non-int of type 'tuple'
Assert that two dictionaries are almost equal
How to retry urllib2.urlopen n times
Python: Create a multidimensional array from a loop
MySQL SELECT: Find all languages by Unicode
how to write IS NOT for str.endswith in python [closed]

Categories

HOME
project-management
system-verilog
textwatcher
ncurses
nuxeo
tomcat7
survival-analysis
transparent
sbt-assembly
webdav
terrain
urlencode
onenote-api
netflix-feign
deeplearning4j
hosts
mongodb-query
nuxt.js
line-api
dendrogram
phpbb
eclipselink
styles
excel-2010
google-maps-android-api-2
ghost-inspector
branch
upnp
thinking-sphinx
getjson
helix-3d-toolkit
stocks
geopy
honeysql
color-scheme
pmwiki
liquid-xml
es-shell
sonarqube-scan
strstr
initializer
installshield-2012
visualstudio.testtools
brute-force
intellilock
distributed-transactions
paho
powermta
catia
worker
aqgridview
seamless-immutable
ogre3d
search-box
keycode
payu
sql-import-wizard
master
foxit
freetype2
nssplitview
spring-lemon
string-parsing
anti-cheat
handlebars.java
altbeacon
android-viewholder
file-diffs
coldfusion-7
sourcegear-vault
oai
digits
gulp-livereload
scrollspy
guzzle6
magento-1.12
viewflipper
rhel5
mser
and-operator
docopt
genetic-programming
bullet
ssms-addin
alertifyjs
ksoap2
openkinect
internal
winrt-httpclient
sanitization
leptonica
gridworld
multiple-conditions
ng-pattern
zend-lucene
code-conversion
linkedhashset
xgettext
supersized
v4l
objectbrowser
dcpu-16
server-error
file-encodings
mvvm-foundation
opcodes
weborb
interface-design

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App