What is the most efficient method for accessing and manipulating a pandas df
I am working on an agent based modelling project and have a 800x800 grid that represents a landscape. Each cell in this grid is assigned certain variables. One of these variables is 'vegetation' (i.e. what functional_types this cell posses). I have a data fame that looks like follows: Each cell is assigned a landscape_type before I access this data frame. I then loop through each cell in the 800x800 grid and assign more variables, so, for example, if cell 1 is landscape_type 4, I need to access the above data frame, generate a random number for each functional_type between the min and max_species_percent, and then assign all the variables (i.e. pollen_loading, succession_time etc etc) for that landscape_type to that cell, however, if the cumsum of the random numbers is <100 I grab function_types from the next landscape_type (so in this example, I would move down to landscape_type 3), this continues until I reach a cumsum closer to 100. I have this process working as desired, however it is incredibly slow - as you can imagine, there are hundreds of thousands of assignments! So far I do this (self.model.veg_data is the above df): def create_vegetation(self, landscape_type): if landscape_type == 4: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] <= landscape_type].copy() else: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] >= landscape_type].copy() veg_this_patch['veg_total'] = veg_this_patch.apply(lambda x: randint(x["min_species_percent"], x["max_species_percent"]), axis=1) veg_this_patch['cum_sum_veg'] = veg_this_patch.veg_total.cumsum() veg_this_patch = veg_this_patch[veg_this_patch['cum_sum_veg'] <= 100] self.vegetation = veg_this_patch I am certain there is a more efficient way to do this. The process will be repeated constantly, and as the model progresses, landscape_types will change, i.e. 3 become 4. So its essential this become as fast as possible! Thank you. As per the comment: EDIT. The loop that creates the landscape objects is given below: for agent, x, y in self.grid.coord_iter(): # check that patch is land if self.landscape.elevation[x,y] != -9999.0: elevation_xy = int(self.landscape.elevation[x, y]) # calculate burn probabilities based on soil and temp burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) burn_s_t_p = round(1/(1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 3), 4) # calculate succession probabilities based on soil and temp succ_s_m_p = round(2 - (1 / (1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 0.5)))) * 2), 4) succ_s_t_p = round(1 / (1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 0.5), 4) vegetation_typ_xy = self.landscape.vegetation[x, y] time_colonised_xy = self.landscape.time_colonised[x, y] is_patch_colonised_xy = self.landscape.colonised[x, y] # populate landscape patch with values patch = Landscape((x, y), self, elevation_xy, burn_s_m_p, burn_s_t_p, vegetation_typ_xy, False, time_colonised_xy, is_patch_colonised_xy, succ_s_m_p, succ_s_t_p) self.grid.place_agent(patch, (x, y)) self.schedule.add(patch) Then, in the object itself I call the create_vegetation function to add the functional_types from the above df. Everything else in this loop comes from a different dataset so isn't relevant.
You need to extract as many calculations as you can into a vectorized preprocessing step. For example in your 800x800 loop you have: burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) Instead of executing this line 800x800 times, just do it once, during initialization: burn_array = np.round(2-(1/(1 + (np.exp(- (self.landscape.soil_moisture * 3)))) * 2),4) Now in your loop it is simply: burn_s_m_p = burn_array[x, y] Apply this technique to the rest of the similar lines.
KeyError stopping App in kivy
how to run an individual test in python unittest
Pandas DataFrame fails on index but Series succeeds
balance numpy array with over-sampling
Reading in a list python [duplicate]
Adding attachments to TestCaseResults using pyral 0.9.3
Detecting and altering the time delta based on daylight savings for GMT (London)?
Error handlers in python
How to parse json file havin dictionary with in dictionary
Plot XLabel date format from object 'dateIndex'
Extracting BLAST output columns in CSV form with python
How to center text vertically inside a text input in kv file?
Using joblib makes the program run much slower, why?
python alexa result parsing with lxml.etree
The smallest python distribtion to run Sympy, Scipy, Numpy and Matplotlib
Why combining flask with apache2 server is better?