python


What is the most efficient method for accessing and manipulating a pandas df


I am working on an agent based modelling project and have a 800x800 grid that represents a landscape. Each cell in this grid is assigned certain variables. One of these variables is 'vegetation' (i.e. what functional_types this cell posses). I have a data fame that looks like follows:
Each cell is assigned a landscape_type before I access this data frame. I then loop through each cell in the 800x800 grid and assign more variables, so, for example, if cell 1 is landscape_type 4, I need to access the above data frame, generate a random number for each functional_type between the min and max_species_percent, and then assign all the variables (i.e. pollen_loading, succession_time etc etc) for that landscape_type to that cell, however, if the cumsum of the random numbers is <100 I grab function_types from the next landscape_type (so in this example, I would move down to landscape_type 3), this continues until I reach a cumsum closer to 100.
I have this process working as desired, however it is incredibly slow - as you can imagine, there are hundreds of thousands of assignments! So far I do this (self.model.veg_data is the above df):
def create_vegetation(self, landscape_type):
if landscape_type == 4:
veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] <= landscape_type].copy()
else:
veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] >= landscape_type].copy()
veg_this_patch['veg_total'] = veg_this_patch.apply(lambda x: randint(x["min_species_percent"],
x["max_species_percent"]), axis=1)
veg_this_patch['cum_sum_veg'] = veg_this_patch.veg_total.cumsum()
veg_this_patch = veg_this_patch[veg_this_patch['cum_sum_veg'] <= 100]
self.vegetation = veg_this_patch
I am certain there is a more efficient way to do this. The process will be repeated constantly, and as the model progresses, landscape_types will change, i.e. 3 become 4. So its essential this become as fast as possible! Thank you.
As per the comment: EDIT.
The loop that creates the landscape objects is given below:
for agent, x, y in self.grid.coord_iter():
# check that patch is land
if self.landscape.elevation[x,y] != -9999.0:
elevation_xy = int(self.landscape.elevation[x, y])
# calculate burn probabilities based on soil and temp
burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4)
burn_s_t_p = round(1/(1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 3), 4)
# calculate succession probabilities based on soil and temp
succ_s_m_p = round(2 - (1 / (1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 0.5)))) * 2), 4)
succ_s_t_p = round(1 / (1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 0.5), 4)
vegetation_typ_xy = self.landscape.vegetation[x, y]
time_colonised_xy = self.landscape.time_colonised[x, y]
is_patch_colonised_xy = self.landscape.colonised[x, y]
# populate landscape patch with values
patch = Landscape((x, y), self, elevation_xy, burn_s_m_p, burn_s_t_p, vegetation_typ_xy,
False, time_colonised_xy, is_patch_colonised_xy, succ_s_m_p, succ_s_t_p)
self.grid.place_agent(patch, (x, y))
self.schedule.add(patch)
Then, in the object itself I call the create_vegetation function to add the functional_types from the above df. Everything else in this loop comes from a different dataset so isn't relevant.
You need to extract as many calculations as you can into a vectorized preprocessing step. For example in your 800x800 loop you have:
burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4)
Instead of executing this line 800x800 times, just do it once, during initialization:
burn_array = np.round(2-(1/(1 + (np.exp(- (self.landscape.soil_moisture * 3)))) * 2),4)
Now in your loop it is simply:
burn_s_m_p = burn_array[x, y]
Apply this technique to the rest of the similar lines.

Related Links

Running python script without installed libraries
In a django data model, is there any way to create a data field for each json attribute stored in a postgres table?
Updating an Existing XML Document in Python
Change in PANDAS .to_csv default formats? Or is it Anaconda?
URL does not work with formatted string but plain string does
Static class variables in Python — Lists & Objects [duplicate]
Python, Heroku & Memcachier - access settings.py variable
os.listdir outputting different files than there are in the folder
Python iterator not working as anticipated
Form placeholder in django doesn't show properly
How the OS handles python and subprocesses of a python script…?
Need help to work with characters longer than 2 or more bytes in Python
Python script to check Namenode status
Python While Loop how to rerun
How does str(list) work?
Django Bad File Descriptor on live server

Categories

HOME
cil
ibm-watson-cognitive
mql4
npm
oauth
ocaml
decorator
jscript
sbt-assembly
vlc
yocto
browserify
currency
openwrt
deeplearning4j
uiactivityviewcontroller
pjsip
outlook-web-addins
camera-calibration
jacoco
derived
google-api-java-client
ejbca
pycrypto
header-files
restfb
yosys
lync-2013
helper
gzip
hibernate-cache
twitter-bootstrap-2
preg-grep
oscommerce
bing-maps-api
vmd
concrete5-5.7
brute-force
rule
paho
filepath
range-v3
chunked-encoding
rpostgresql
heroku-postgres
iron.io
sqldatareader
build-process
ebtables
dts
spark-cassandra-connector
freefem++
django-south
distribute
mfc-feature-pack
eml
uialertview
futuretask
paypal-nvp
mogrify
divide-by-zero
uos
code-first-migrations
named-ranges
line-numbers
bullet
bsod
windows-update
iconv
shiva3d
symphony-cms
xslkey
xml-libxml
dynamic-data
sequelpro
ubuntu-11.10
mvccontrib
hashalgorithm
throttling
gtk2hs
windows-live-id
asp.net-routing
hibernate3-maven-plugin
sitemappath
graniteds
front-controller
web-statistics
data-entry

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App