python


Pandas: Efficient way to get first row with element that is smaller than a given value


I'm wondering if there's an efficient way to do this in pandas: Given a dataframe, what is the first row that is smaller than a given value? For example, given:
addr
0 4196656
1 4197034
2 4197075
3 4197082
4 4197134
What is the first value that is smaller than 4197080? I want it to return just the row with 4197075.
A solution would be to first filter by 4197080 and then take the last row, but that looks like to be an extremely slow O(N) operation (first building a dataframe and then taking its last row), while a binary search would take O(logN).
df.addr[ df.addr < 4197080].tail(1)
I timed it, and creating df.addr[ df.addr < 4197080] more or less takes the same as df.addr[ df.addr < 4197080].tail(1), strongly hinting that internally it's building an entire df first.
num = np.random.randint(0, 10**8, 10**6)
num.sort()
df = pd.DataFrame({'addr':num})
df = df.set_index('addr', drop=False)
df = df.sort_index()
Getting the first smaller value is very slow:
%timeit df.addr[ df.addr < 57830391].tail(1)
100 loops, best of 3: 7.9 ms per loop
Using lt improves things a bit:
%timeit df.lt(57830391)[-1:]
1000 loops, best of 3: 853 µs per loop
But still nowhere near as fast as a binary search:
%timeit bisect(num, 57830391, 0, len(num))
100000 loops, best of 3: 6.53 µs per loop
Is there any better way?
This requires 0.14.0
Note that the frame IS NOT SORTED.
In [16]: s = df['addr']
Find biggest value lower than required
In [18]: %timeit s[s<5783091]
100 loops, best of 3: 9.01 ms per loop
In [19]: %timeit s[s<5783091].nlargest(1)
100 loops, best of 3: 11 ms per loop
So this is faster than actuallying performing a full-sort, then indexing.
The .copy is to avoid biasing the inplace sort.
In [32]: x = np.random.randint(0, 10**8, 10**6)
In [33]: def f(x):
....: x.copy().sort()
....:
In [35]: %timeit f(x)
10 loops, best of 3: 67.2 ms per loop
If you are simply searching an ALREADY SORTED series, then use searchsorted. Note that you must use the numpy version (e.g. operate on .values. The series version will be defined in 0.14.1)
In [41]: %timeit s.values.searchsorted(5783091)
100000 loops, best of 3: 2.5 µs per loop

Related Links

using python; read the second column, for three sections which exist in a file.txt, and then make calculations with columns
Difference in sys.argv behaviour when directly running from command line [on hold]
How to use class instance as a json or python dictionary value
Inserting more data in metadata in Django Rest Framework
Read csv file with column as literal list
Check if array row is None and assign it value
Stringing conditionals? [duplicate]
Replacing strings, and changeing file extensions in all the folders certain files
Resource has been exhausted Google Cloud Speech
Python: How to calculate Difference btw Current Year and Year from Column?
Access a class attribute using django in Python [duplicate]
How to pass setuptools command option from install to build command?
Python pyttsx module error when converted to exe with cx_freeze
Pygame clock.tick not functioning
ProcessPoolExecutor logging failed?
IndexError using stmplib in Python

Categories

HOME
caching
erlang
payment-gateway
signalr
facebook-graph-api
angular2-routing
ruby-on-rails-3
flyway
ll
phaser-framework
snap.svg
pygame
deeplearning4j
android-source
game-physics
automata
local
nuxt.js
store
aptana
multicore
backpropagation
raphael
dspic
memsql
continuous-deployment
guile
fileinfo
cell-array
superscript
ipfw
receipt
jquery-cycle2
rider
.net-assembly
vmd
socketpair
intellij-idea-2016
control-flow-graph
stereo-3d
mod-fcgid
azure-availability-set
livefyre
spring-data-hadoop
etherpad
react-native-fbsdk
cdk
lotus
atlassian-crowd
dts
coordinate-transformation
codesign
multinomial
hornetq
ffserver
uos
xaml-designer
uptodate
activity-streams
gyroscope-framework
java-collections-api
getimagedata
alertifyjs
sgml
data-generation
reserved-words
chrome-for-android
hamsterdb
.net-cf-3.5
onselect
astyanax
html-form-post
client-side-scripting
resgen
xslkey
database-permissions
arbor.js
code-conversion
dropdownlistfor
symfony-2.0
ninject-extensions
isolatedstorage
serp
phpcrawl
pivotal-crm
clause
soft-keyboard
system-codedom-compiler
data-execution-prevention
usergroups
visual-studio-dbpro

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App