python


Pandas data conversion


I have the following data in a Pandas dataframe:
AIRPORT
EWR|JAX
EWR|BHX
EWR|BHX
EWR|BHX
EWR|BHX
...
Is there a dynamic way to convert this to:
AIRPORT EWR JAX BHX
EWR|JAX Y Y NULL
EWR|BHX Y NULL Y
and so on. I know how to do this if I want to count the hard coded values
df.assign(EWR = lambda x: x.TYPE.apply(lambda y: y.split('|').count('EWR')))
but I'm hoping not to have to write this code for each airport.
You can use .str accessor and get_dummies, then using assign with dictionary unpacking to create the additional columns in your dataframe. And, lastly replace to change those 0's and 1's to your str, bool, and nan of choice.
df_out = df.assign(**df.AIRPORT.str.get_dummies().replace({1:'Y',0:np.nan}))
print(df_out)
Output:
AIRPORT BHX EWR JAX
0 EWR|JAX NaN Y Y
1 EWR|BHX Y Y NaN
2 EWR|BHX Y Y NaN
3 EWR|BHX Y Y NaN
4 EWR|BHX Y Y NaN
pandas only with str.get_dummies
dummies = df.AIRPORT.str.get_dummies()
df.join(
dummies * pd.Series('Y', dummies.columns)
).replace('', np.nan)
AIRPORT BHX EWR JAX
0 EWR|JAX nan Y Y
1 EWR|BHX Y Y nan
2 EWR|BHX Y Y nan
3 EWR|BHX Y Y nan
4 EWR|BHX Y Y nan
pandas & numpy with np.where
dummies = df.AIRPORT.str.get_dummies()
d1 = pd.DataFrame(
np.where(dummies.values == 1, 'Y', np.nan),
dummies.index, dummies.columns
)
d2 = df.join(d1)
print(d2)
AIRPORT BHX EWR JAX
0 EWR|JAX nan Y Y
1 EWR|BHX Y Y nan
2 EWR|BHX Y Y nan
3 EWR|BHX Y Y nan
4 EWR|BHX Y Y nan
Timing
small data
%%timeit
df.join(
df.AIRPORT.str.get_dummies() * pd.Series('Y', dummies.columns)
).replace('', np.nan)
100 loops, best of 3: 2.31 ms per loop
%timeit df.assign(**df.AIRPORT.str.get_dummies().replace({1:'Y',0:np.nan}))
100 loops, best of 3: 2.78 ms per loop
%%timeit
dummies = df.AIRPORT.str.get_dummies()
d1 = pd.DataFrame(
np.where(dummies.values == 1, 'Y', np.nan),
dummies.index, dummies.columns
)
df.join(d1)
1000 loops, best of 3: 1.65 ms per loop
large data
from string import ascii_uppercase
np.random.seed([3,1415])
source = pd.DataFrame(
np.random.choice(list(ascii_uppercase), [100, 3])
).sum(1).unique()
df = pd.DataFrame(
np.random.choice(source, [10000, 2]), columns=['A', 'B']
).query('A != B').apply('|'.join, 1).to_frame('AIRPORT')
%%timeit
dummies = df.AIRPORT.str.get_dummies()
df.join(
dummies * pd.Series('Y', dummies.columns)
).replace('', np.nan)
1 loop, best of 3: 594 ms per loop
%timeit df.assign(**df.AIRPORT.str.get_dummies().replace({1:'Y',0:np.nan}))
1 loop, best of 3: 629 ms per loop
%%timeit
dummies = df.AIRPORT.str.get_dummies()
d1 = pd.DataFrame(
np.where(dummies.values == 1, 'Y', np.nan),
dummies.index, dummies.columns
)
df.join(d1)
1 loop, best of 3: 592 ms per loop

Related Links

Implementing Django style API, chaining dots, inheritance
Python ~ Getting first few numbers from left to right
Error installing twisted on windows 10. INCLUDE environment variable is empty
How to implement “circular” generator in Python?
telegram bot. Forward message to another chat
Pyspark aggregation using groupBy is very slow compared to Scala
Multiple delimiters in single CSV file
Is it safe to use scipy.sparse functions with Pandas sparse dataframes?
I got error : UnboundLocalError: local variable 'porc' referenced before assignment
Trigger a delete when inserting a new entry on django
How does gauss laguerre integration works for large limits?
Selenium: Runtime.executionContextCreated has invalid 'context':
Python 3.5 csv input shows extra character 'b'
Python Error 104, connection reset by peer
css not getting loaded in flask webpage?
Regarding the regex in search module with and without raw text

Categories

HOME
arrays
reverse-engineering
oauth
visual-studio-2013
fparsec
qpython3
parse-server
survival-analysis
ios10
adsense
jpanel
shipping
dropbox
window
cockroachdb
finite-group-theory
dlib
x-frame-options
phoenix
sign
rapidjson
sendkeys
spring-cloud-config
zoho
distance
sensu
azure-servicebus-queues
switching
mayavi
pdflatex
hotmail
boilerplate
socketpair
issue-tracking
image-compression
installshield-2012
agent
polyfills
paho
encapsulation
rollback
component-pascal
chrome-remote-desktop
fontconfig
unity3d-editor
objloader
vst
mongoskin
contour
spatial-query
group-concat
handlebars.java
anythingslider
drf-nested-routers
paypal-nvp
stream-framework
debian-based
gmaps4rails
microblaze
android-looper
httpcontext
visualstatemanager
baucis
nsmatrix
asp.net-mvc-scaffolding
mailcore
device-emulation
osi
separation-of-concerns
jquery-selectbox
flexicious
file-encodings
google-instant
longjmp
method-signature

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App