What is the most efficient method for accessing and manipulating a pandas df
I am working on an agent based modelling project and have a 800x800 grid that represents a landscape. Each cell in this grid is assigned certain variables. One of these variables is 'vegetation' (i.e. what functional_types this cell posses). I have a data fame that looks like follows: Each cell is assigned a landscape_type before I access this data frame. I then loop through each cell in the 800x800 grid and assign more variables, so, for example, if cell 1 is landscape_type 4, I need to access the above data frame, generate a random number for each functional_type between the min and max_species_percent, and then assign all the variables (i.e. pollen_loading, succession_time etc etc) for that landscape_type to that cell, however, if the cumsum of the random numbers is <100 I grab function_types from the next landscape_type (so in this example, I would move down to landscape_type 3), this continues until I reach a cumsum closer to 100. I have this process working as desired, however it is incredibly slow - as you can imagine, there are hundreds of thousands of assignments! So far I do this (self.model.veg_data is the above df): def create_vegetation(self, landscape_type): if landscape_type == 4: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] <= landscape_type].copy() else: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] >= landscape_type].copy() veg_this_patch['veg_total'] = veg_this_patch.apply(lambda x: randint(x["min_species_percent"], x["max_species_percent"]), axis=1) veg_this_patch['cum_sum_veg'] = veg_this_patch.veg_total.cumsum() veg_this_patch = veg_this_patch[veg_this_patch['cum_sum_veg'] <= 100] self.vegetation = veg_this_patch I am certain there is a more efficient way to do this. The process will be repeated constantly, and as the model progresses, landscape_types will change, i.e. 3 become 4. So its essential this become as fast as possible! Thank you. As per the comment: EDIT. The loop that creates the landscape objects is given below: for agent, x, y in self.grid.coord_iter(): # check that patch is land if self.landscape.elevation[x,y] != -9999.0: elevation_xy = int(self.landscape.elevation[x, y]) # calculate burn probabilities based on soil and temp burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) burn_s_t_p = round(1/(1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 3), 4) # calculate succession probabilities based on soil and temp succ_s_m_p = round(2 - (1 / (1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 0.5)))) * 2), 4) succ_s_t_p = round(1 / (1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 0.5), 4) vegetation_typ_xy = self.landscape.vegetation[x, y] time_colonised_xy = self.landscape.time_colonised[x, y] is_patch_colonised_xy = self.landscape.colonised[x, y] # populate landscape patch with values patch = Landscape((x, y), self, elevation_xy, burn_s_m_p, burn_s_t_p, vegetation_typ_xy, False, time_colonised_xy, is_patch_colonised_xy, succ_s_m_p, succ_s_t_p) self.grid.place_agent(patch, (x, y)) self.schedule.add(patch) Then, in the object itself I call the create_vegetation function to add the functional_types from the above df. Everything else in this loop comes from a different dataset so isn't relevant.
You need to extract as many calculations as you can into a vectorized preprocessing step. For example in your 800x800 loop you have: burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) Instead of executing this line 800x800 times, just do it once, during initialization: burn_array = np.round(2-(1/(1 + (np.exp(- (self.landscape.soil_moisture * 3)))) * 2),4) Now in your loop it is simply: burn_s_m_p = burn_array[x, y] Apply this technique to the rest of the similar lines.
Running python script without installed libraries
In a django data model, is there any way to create a data field for each json attribute stored in a postgres table?
Updating an Existing XML Document in Python
Change in PANDAS .to_csv default formats? Or is it Anaconda?
URL does not work with formatted string but plain string does
Static class variables in Python — Lists & Objects [duplicate]
Python, Heroku & Memcachier - access settings.py variable
os.listdir outputting different files than there are in the folder
Python iterator not working as anticipated
Form placeholder in django doesn't show properly
How the OS handles python and subprocesses of a python script…?
Need help to work with characters longer than 2 or more bytes in Python
Python script to check Namenode status
Python While Loop how to rerun
How does str(list) work?
Django Bad File Descriptor on live server