python


Efficient sorting and ranking of long list of 2d lists in Python


I have a dataset with several hundred 2d value plots stored in list such as this simplified version (actual plots are up to 100 x 100):
[['p01', ['p0x',
[[30, 40, 50] [[42, 52, 72]
[33, 43, 53] [44, 63, 83]
[36, 46, 56]]],... [76, 96, 99]]
I have so far iterated through the plots to simply check if the current plot has the lowest value for a cell and then stored the lowest costs and lowest plots name into the respective result array, giving me two array like so:
lowest values
[[20, 40, 45]
[26, 42, 50]
[36, 44, 51]]]
lowest plot for each point
[[p02, p01, p03]
[p02, p02, p02]
[p01, p01, p04]]]
What I ideally like is say the lowest 10 of each, i.e. is there an efficient way to go through the plots without having to iterate through loop by loop
i.e.: generate lowest, iterate again and see if 2nd lowest (is higher then lowest but lower than currently stored in 2nd lowest array) etc
If you have NumPy (you tagged it with arrays so I assume you already work with NumPy) you could use np.min and np.argmin to get the lowest and index of the lowest element:
>>> import numpy as np
>>> p1 = [[1, 10],
... [1, 20]]
>>> p2 = [[2, 8],
... [4, 6]]
>>> p3 = [[0, 4],
... [17, 8]]
>>> np.min([p1, p2, p3], axis=0) # lowest value
array([[0, 4],
[1, 6]])
>>> np.argmin([p1, p2, p3], axis=0) # lowest plot number
array([[2, 2],
[0, 1]], dtype=int64)
In case you want to sort them you can use np.sort and np.argsort:
>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0, 4], # lowest
[ 1, 6]],
[[ 1, 8], # middle
[ 4, 8]],
[[ 2, 10], # highest
[17, 20]]])
>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
[0, 1]],
[[0, 1],
[1, 2]],
[[1, 0],
[2, 0]]], dtype=int64)
Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:
Suppose, we have some data like the following:
>>> from pprint import pprint
>>> pprint(data)
[['p1',
[[48, 71, 36, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
['p2',
[[82, 49, 72, 76, 48, 67],
[78, 57, 62, 20, 43, 28],
[71, 40, 23, 35, 88, 32],
[51, 66, 73, 84, 68, 35],
[44, 42, 44, 67, 20, 59],
[62, 20, 39, 33, 63, 46]]],
['p3',
[[70, 59, 86, 80, 70, 87],
[88, 47, 38, 63, 56, 63],
[84, 26, 46, 31, 52, 22],
[51, 63, 63, 34, 58, 87],
[75, 69, 39, 37, 88, 35],
[42, 25, 76, 86, 59, 47]]],
['p4',
[[44, 21, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[33, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
['p6',
[[58, 72, 82, 80, 80, 21],
[41, 45, 57, 67, 74, 39],
[70, 78, 51, 81, 85, 86],
[81, 53, 49, 73, 60, 60],
[26, 66, 60, 38, 87, 54],
[31, 55, 44, 38, 28, 68]]],
['p7',
[[43, 22, 57, 66, 53, 68],
[65, 61, 52, 78, 59, 27],
[66, 42, 58, 79, 75, 60],
[83, 81, 67, 43, 34, 76],
[53, 41, 36, 34, 32, 76],
[68, 43, 53, 46, 54, 41]]],
['p8',
[[74, 65, 37, 50, 51, 87],
[72, 79, 65, 44, 46, 73],
[42, 31, 80, 46, 63, 24],
[83, 40, 28, 39, 86, 29],
[29, 45, 86, 20, 26, 25],
[52, 52, 34, 24, 44, 65]]],
['p9',
[[63, 76, 54, 71, 64, 56],
[24, 30, 67, 65, 49, 50],
[38, 40, 55, 72, 78, 56],
[74, 41, 34, 62, 53, 76],
[30, 30, 36, 86, 69, 74],
[40, 87, 29, 75, 50, 51]]]]
First, we sort each i, j value along the first axis:
>>> def matrix_idx(x,y, matrix):
... return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]
So, now, each element of sorts:
>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
[('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
[('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
[('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
[('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
[('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6
What does this correspond to? Consider again:
[['p1',
[[48, 71, **36**, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
...
['p4',
[[44, **21**, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[**33**, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
So each ith element of sorts contains a list, with the sorted values for the ith row, going down the first axis of the data. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple:
>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')
Now, finally:
>>> for r in range(len(data)):
... val = []
... orig = []
... for i in range(6):
... orig.append([sorts[i][j][r][0] for j in range(6)])
... val.append([sorts[i][j][r][1] for j in range(6)])
... ranked.append(Data(orig, val))
...
And now, check this:
>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
[24, 30, 36, 20, 43, 27],
[38, 26, 21, 22, 40, 22],
[20, 23, 28, 34, 34, 29],
[26, 30, 32, 20, 20, 25],
[23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
[88, 79, 87, 78, 79, 73],
[85, 78, 86, 82, 88, 86],
[83, 82, 73, 84, 86, 87],
[75, 82, 86, 86, 88, 78],
[68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]
Whether it is performant enough for your use-case I don't know, but it sure was fun!

Related Links

'UnboundLocalError: local variable 'lower' referenced before assignment'
How to save the result of a button's command in tkinter?
I'm lost and confused! regarding functions and scope of returned objects. Why is the object returned from the first function changed?
How to remove quotes and brackets from a dict that has two keys to one value in a jinja template?
I am looking at each individual date in a Pandas dataframe and adjusting one column (weight), based on condition on another column for each date
For loop in Python working, but not doing it's job
Python error 'int' object has no attribute 'penup' (python turtle)
custome user serializer giving error on is_valid() - django
Dividing a graph into 2 disconnected subgraphs
use boto3 on GAE for Python
boto command for describing an Auto Scaling Group?
get sets of index values, grouped by column year
How to use a tensorflow model extracted from a trained keras model
How to query with many tables
python-shell on linux system indentation error
How come 1 is printed instead of 0?

Categories

HOME
cluster-computing
app-inventor
semantic-ui
voip
performancecounter
heap-memory
snap.svg
dataframe
devstack
i2c
oclint
data-analysis
mongodb-query
cloudflare
ida
mod-pagespeed
arraylist
google-openid
prestodb
attask
jconsole
prediction
myob
opencart2.3
geopandas
rhandsontable
classpath
nesc
google-api-dotnet-client
pep8-assembly
gpib
referenceerror
stocks
pdflatex
nsurlconnection
superpowered
precedence
oscommerce
dql
intellilock
bayesian-networks
z3py
archer
ajp
taskmanager
nstouchbar
pinvoke
persistent
workflow-foundation-4.5
greenhills
crystal-reports-8.5
crash-reports
nuget-server
addin-express
bettercms
iron.io
objloader
verbose
inject
spring-ioc
glkit
vst
easing
tomcat5
emokit
faraday
mpeg-4
financial
proxygen
geodjango
wif
rx-groovy
coypu
adobe-indesign
chaining
gyroscope-framework
microblaze
livechat
libssh2
funq
adaptive-compression
mbox
xenocode
gfs
confusion-matrix
nsmatrix
sslexception
execvp
argb
cgimageref
adomd.net
prng
socketstream
enumerators
maven-ear-plugin
vim-powerline
unions
concurrent-programming
flexicious
shared-objects
surf
opengl-es-lighting
dashcode
external-assemblies
aio

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App