python


pass multiple params to a groupby function in pandas


I wanna do a custom function over a groupby, so for example if my data has the following format.
personid jobid start_date end_date
1 1 2015-01-01 2016-01-30
1 2 2016-01-01 2017-01-01
I wanna compute the overlap between the two dates of the two different jobs for the same person. Would it be wise to use
df.groupby(personid).agg(x)
But then how would i reference both start date and end date for different records in the function x.
The output of code would be something like
personid overlap
1 30
I think you need groupby with custom function where select first and last value of start and end datetime, get date_range and then find length of intersection by numpy.intersect1d:
def f(x):
a = pd.date_range(x['start_date'].iat[0], x['end_date'].iat[0], unit='d')
b = pd.date_range(x['start_date'].iat[-1], x['end_date'].iat[-1], unit='d')
return pd.Series(len(np.intersect1d(a,b)), index=['overlap'])
df = df.groupby('personid').apply(f).reset_index()
print (df)
personid overlap
0 1 366
1 2 6
Sample:
df = pd.DataFrame({'start_date': [pd.Timestamp('2015-01-01 00:00:00'), pd.Timestamp('2015-01-01 00:00:00'), pd.Timestamp('2015-01-01 00:00:00'), pd.Timestamp('2015-01-05 00:00:00')], 'personid': [1, 1, 2, 2], 'end_date': [pd.Timestamp('2016-01-30 00:00:00'), pd.Timestamp('2016-01-01 00:00:00'), pd.Timestamp('2015-01-25 00:00:00'), pd.Timestamp('2015-01-10 00:00:00')], 'jobid': [1, 2, 1, 2]})
print (df)
end_date jobid personid start_date
0 2016-01-30 1 1 2015-01-01
1 2016-01-01 2 1 2015-01-01
2 2015-01-25 1 2 2015-01-01
3 2015-01-10 2 2 2015-01-05

Related Links

Assert two variables are almost equal in python
PyQt: How to expand all children in a QTreeWidget
How to insert into sqlite faster [duplicate]
How to read and extract and merge multiple huge size csv(1G~)?
how to extract data from json. other answers here did not work for me
How to execute a script or a funciton through python setuptoos?
jinja2 link to static files
multiple checkbox sqlalchemy delete
Pycharm Quick Documentation: Fetching Documentation
How can I create binary label from two tables
pyautogui.locateOnScreen() Returns… Nothing?
How can I judge whether a mail send sccess or not use Python?
Django Admin list_display product list
Python: How can I index in MapReduce(MRJob)?
Download multiple CSV files from a list in a single CSV (Python)
Does Python garbage collect when Heroku warns about memory quota vastly exceeded (R15)?

Categories

HOME
ajax
numpy
ibm-watson-cognitive
redis
vb6
blockchain
aem
youtube-livestreaming-api
qpython3
formal-verification
cocos2d-x-3.0
safari
terrain
ios10
defragmentation
algorithmic-trading
jpanel
websphere-liberty
dropbear
diagram
simple-injector
typeahead
line-api
scala-native
phpbb
device-detection
cython
multiple-columns
explode
restfb
shopping-cart
switching
codelite
i3
cortex-m3
red-black-tree
roundcube
pycparser
npm-shrinkwrap
recurrence-relation
large-data
vmd
readfile
concrete5-5.7
jquery-multidatespicker
distributed-transactions
filepath
onmouseover
number-theory
drawingarea
defold
tiddlywiki
struts-layout
pluck
modelattribute
unity3d-editor
communication-protocol
lotus
rsqlite
excon
ruby-2.0
tomcat5
dbscan
squirrel
inet
proxygen
fortran90
git-rebase
jenkins-scriptler
windows-vista
document-oriented-db
mysqldumpslow
retro-computing
abas
pgm
callstack
adodb
and-operator
iostream
dbsetup
dvcs
visualstatemanager
factors
initialization-vector
crystal-reports-10
gamepad
multiple-login
stagefright
apache2.2
symphony-cms
angularjs-timeout
mvs
pitch
getusermedia
hibernate3
collect
ihttphandler
ruby-debug
zend-form-element
carbide
trampolines
code-design
plinq
odbc-sql-server-driver
data-entry

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App