### python

#### What is the most efficient method for accessing and manipulating a pandas df

I am working on an agent based modelling project and have a 800x800 grid that represents a landscape. Each cell in this grid is assigned certain variables. One of these variables is 'vegetation' (i.e. what functional_types this cell posses). I have a data fame that looks like follows: Each cell is assigned a landscape_type before I access this data frame. I then loop through each cell in the 800x800 grid and assign more variables, so, for example, if cell 1 is landscape_type 4, I need to access the above data frame, generate a random number for each functional_type between the min and max_species_percent, and then assign all the variables (i.e. pollen_loading, succession_time etc etc) for that landscape_type to that cell, however, if the cumsum of the random numbers is <100 I grab function_types from the next landscape_type (so in this example, I would move down to landscape_type 3), this continues until I reach a cumsum closer to 100. I have this process working as desired, however it is incredibly slow - as you can imagine, there are hundreds of thousands of assignments! So far I do this (self.model.veg_data is the above df): def create_vegetation(self, landscape_type): if landscape_type == 4: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] <= landscape_type].copy() else: veg_this_patch = self.model.veg_data[self.model.veg_data['landscape_type'] >= landscape_type].copy() veg_this_patch['veg_total'] = veg_this_patch.apply(lambda x: randint(x["min_species_percent"], x["max_species_percent"]), axis=1) veg_this_patch['cum_sum_veg'] = veg_this_patch.veg_total.cumsum() veg_this_patch = veg_this_patch[veg_this_patch['cum_sum_veg'] <= 100] self.vegetation = veg_this_patch I am certain there is a more efficient way to do this. The process will be repeated constantly, and as the model progresses, landscape_types will change, i.e. 3 become 4. So its essential this become as fast as possible! Thank you. As per the comment: EDIT. The loop that creates the landscape objects is given below: for agent, x, y in self.grid.coord_iter(): # check that patch is land if self.landscape.elevation[x,y] != -9999.0: elevation_xy = int(self.landscape.elevation[x, y]) # calculate burn probabilities based on soil and temp burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) burn_s_t_p = round(1/(1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 3), 4) # calculate succession probabilities based on soil and temp succ_s_m_p = round(2 - (1 / (1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 0.5)))) * 2), 4) succ_s_t_p = round(1 / (1 + (math.exp(-(self.landscape.soil_temp[x, y] * 1))) * 0.5), 4) vegetation_typ_xy = self.landscape.vegetation[x, y] time_colonised_xy = self.landscape.time_colonised[x, y] is_patch_colonised_xy = self.landscape.colonised[x, y] # populate landscape patch with values patch = Landscape((x, y), self, elevation_xy, burn_s_m_p, burn_s_t_p, vegetation_typ_xy, False, time_colonised_xy, is_patch_colonised_xy, succ_s_m_p, succ_s_t_p) self.grid.place_agent(patch, (x, y)) self.schedule.add(patch) Then, in the object itself I call the create_vegetation function to add the functional_types from the above df. Everything else in this loop comes from a different dataset so isn't relevant.

You need to extract as many calculations as you can into a vectorized preprocessing step. For example in your 800x800 loop you have: burn_s_m_p = round(2-(1/(1 + (math.exp(- (self.landscape.soil_moisture[x, y] * 3)))) * 2),4) Instead of executing this line 800x800 times, just do it once, during initialization: burn_array = np.round(2-(1/(1 + (np.exp(- (self.landscape.soil_moisture * 3)))) * 2),4) Now in your loop it is simply: burn_s_m_p = burn_array[x, y] Apply this technique to the rest of the similar lines.

