python


Efficient sorting and ranking of long list of 2d lists in Python


I have a dataset with several hundred 2d value plots stored in list such as this simplified version (actual plots are up to 100 x 100):
[['p01', ['p0x',
[[30, 40, 50] [[42, 52, 72]
[33, 43, 53] [44, 63, 83]
[36, 46, 56]]],... [76, 96, 99]]
I have so far iterated through the plots to simply check if the current plot has the lowest value for a cell and then stored the lowest costs and lowest plots name into the respective result array, giving me two array like so:
lowest values
[[20, 40, 45]
[26, 42, 50]
[36, 44, 51]]]
lowest plot for each point
[[p02, p01, p03]
[p02, p02, p02]
[p01, p01, p04]]]
What I ideally like is say the lowest 10 of each, i.e. is there an efficient way to go through the plots without having to iterate through loop by loop
i.e.: generate lowest, iterate again and see if 2nd lowest (is higher then lowest but lower than currently stored in 2nd lowest array) etc
If you have NumPy (you tagged it with arrays so I assume you already work with NumPy) you could use np.min and np.argmin to get the lowest and index of the lowest element:
>>> import numpy as np
>>> p1 = [[1, 10],
... [1, 20]]
>>> p2 = [[2, 8],
... [4, 6]]
>>> p3 = [[0, 4],
... [17, 8]]
>>> np.min([p1, p2, p3], axis=0) # lowest value
array([[0, 4],
[1, 6]])
>>> np.argmin([p1, p2, p3], axis=0) # lowest plot number
array([[2, 2],
[0, 1]], dtype=int64)
In case you want to sort them you can use np.sort and np.argsort:
>>> np.sort([p1, p2, p3], axis=0)
array([[[ 0, 4], # lowest
[ 1, 6]],
[[ 1, 8], # middle
[ 4, 8]],
[[ 2, 10], # highest
[17, 20]]])
>>> np.argsort([p1, p2, p3], axis=0)
array([[[2, 2],
[0, 1]],
[[0, 1],
[1, 2]],
[[1, 0],
[2, 0]]], dtype=int64)
Well, here's a plain, brute-force Python solution. I don't think it's the very efficient (I believe it's O(n^3log(n)), but it works:
Suppose, we have some data like the following:
>>> from pprint import pprint
>>> pprint(data)
[['p1',
[[48, 71, 36, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
['p2',
[[82, 49, 72, 76, 48, 67],
[78, 57, 62, 20, 43, 28],
[71, 40, 23, 35, 88, 32],
[51, 66, 73, 84, 68, 35],
[44, 42, 44, 67, 20, 59],
[62, 20, 39, 33, 63, 46]]],
['p3',
[[70, 59, 86, 80, 70, 87],
[88, 47, 38, 63, 56, 63],
[84, 26, 46, 31, 52, 22],
[51, 63, 63, 34, 58, 87],
[75, 69, 39, 37, 88, 35],
[42, 25, 76, 86, 59, 47]]],
['p4',
[[44, 21, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[33, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
['p6',
[[58, 72, 82, 80, 80, 21],
[41, 45, 57, 67, 74, 39],
[70, 78, 51, 81, 85, 86],
[81, 53, 49, 73, 60, 60],
[26, 66, 60, 38, 87, 54],
[31, 55, 44, 38, 28, 68]]],
['p7',
[[43, 22, 57, 66, 53, 68],
[65, 61, 52, 78, 59, 27],
[66, 42, 58, 79, 75, 60],
[83, 81, 67, 43, 34, 76],
[53, 41, 36, 34, 32, 76],
[68, 43, 53, 46, 54, 41]]],
['p8',
[[74, 65, 37, 50, 51, 87],
[72, 79, 65, 44, 46, 73],
[42, 31, 80, 46, 63, 24],
[83, 40, 28, 39, 86, 29],
[29, 45, 86, 20, 26, 25],
[52, 52, 34, 24, 44, 65]]],
['p9',
[[63, 76, 54, 71, 64, 56],
[24, 30, 67, 65, 49, 50],
[38, 40, 55, 72, 78, 56],
[74, 41, 34, 62, 53, 76],
[30, 30, 36, 86, 69, 74],
[40, 87, 29, 75, 50, 51]]]]
First, we sort each i, j value along the first axis:
>>> def matrix_idx(x,y, matrix):
... return matrix[x][y]
...
>>> from operator import itemgetter
>>> sorts = [[sorted([(tag, matrix_idx(i, j, matrix)) for tag, matrix in data], key=itemgetter(1)) for j in range(6)] for i in range(6)]
So, now, each element of sorts:
>>> pprint(sorts[0], width=600)
[[('p5', 33), ('p7', 43), ('p4', 44), ('p1', 48), ('p6', 58), ('p9', 63), ('p3', 70), ('p8', 74), ('p2', 82)],
[('p4', 21), ('p7', 22), ('p5', 41), ('p2', 49), ('p3', 59), ('p8', 65), ('p1', 71), ('p6', 72), ('p9', 76)],
[('p1', 36), ('p8', 37), ('p4', 39), ('p5', 41), ('p9', 54), ('p7', 57), ('p2', 72), ('p6', 82), ('p3', 86)],
[('p1', 40), ('p8', 50), ('p5', 54), ('p4', 57), ('p7', 66), ('p9', 71), ('p2', 76), ('p3', 80), ('p6', 80)],
[('p5', 25), ('p2', 48), ('p8', 51), ('p7', 53), ('p4', 61), ('p9', 64), ('p3', 70), ('p1', 80), ('p6', 80)],
[('p6', 21), ('p9', 56), ('p1', 59), ('p2', 67), ('p7', 68), ('p5', 86), ('p3', 87), ('p8', 87), ('p4', 88)]]
>>> len(sorts)
6
>>>
>>> len(sorts[0])
6
What does this correspond to? Consider again:
[['p1',
[[48, 71, **36**, 40, 80, 59],
[44, 56, 87, 43, 78, 47],
[45, 71, 86, 61, 45, 27],
[40, 82, 72, 39, 47, 77],
[46, 82, 66, 48, 78, 57],
[49, 38, 65, 56, 75, 79]]],
...
['p4',
[[44, **21**, 39, 57, 61, 88],
[31, 64, 36, 42, 79, 62],
[41, 38, 21, 82, 71, 60],
[37, 23, 46, 40, 77, 69],
[27, 47, 64, 59, 51, 32],
[23, 68, 76, 67, 39, 60]]],
['p5',
[[**33**, 41, 41, 54, 25, 86],
[64, 34, 76, 66, 78, 51],
[85, 47, 85, 22, 40, 28],
[20, 33, 30, 59, 86, 47],
[36, 39, 32, 60, 41, 78],
[57, 33, 35, 37, 86, 64]]],
So each ith element of sorts contains a list, with the sorted values for the ith row, going down the first axis of the data. So, let's do one final transformation to get this into a more useful format, first, let's define a handy namedtuple:
>>> from collections import namedtuple
>>> Data = namedtuple('Data', 'origin value')
Now, finally:
>>> for r in range(len(data)):
... val = []
... orig = []
... for i in range(6):
... orig.append([sorts[i][j][r][0] for j in range(6)])
... val.append([sorts[i][j][r][1] for j in range(6)])
... ranked.append(Data(orig, val))
...
And now, check this:
>>> pprint(ranked[0].value)
[[33, 21, 36, 40, 25, 21],
[24, 30, 36, 20, 43, 27],
[38, 26, 21, 22, 40, 22],
[20, 23, 28, 34, 34, 29],
[26, 30, 32, 20, 20, 25],
[23, 20, 29, 24, 28, 41]]
>>> pprint(ranked[0].origin)
[['p5', 'p4', 'p1', 'p1', 'p5', 'p6'],
['p9', 'p9', 'p4', 'p2', 'p2', 'p7'],
['p9', 'p3', 'p4', 'p5', 'p5', 'p3'],
['p5', 'p4', 'p8', 'p3', 'p7', 'p8'],
['p6', 'p9', 'p5', 'p8', 'p2', 'p8'],
['p4', 'p2', 'p9', 'p8', 'p6', 'p7']]
>>> pprint(ranked[-1].value)
[[82, 76, 86, 80, 80, 88],
[88, 79, 87, 78, 79, 73],
[85, 78, 86, 82, 88, 86],
[83, 82, 73, 84, 86, 87],
[75, 82, 86, 86, 88, 78],
[68, 87, 76, 86, 86, 79]]
>>> pprint(ranked[-1].origin)
[['p2', 'p9', 'p3', 'p6', 'p6', 'p4'],
['p3', 'p8', 'p1', 'p7', 'p4', 'p8'],
['p5', 'p6', 'p1', 'p4', 'p2', 'p6'],
['p8', 'p1', 'p2', 'p2', 'p8', 'p3'],
['p3', 'p1', 'p8', 'p9', 'p3', 'p5'],
['p7', 'p9', 'p4', 'p3', 'p5', 'p1']]
Whether it is performant enough for your use-case I don't know, but it sure was fun!

Related Links

one or more most frequent letter in string python
pandas - select/mask the first n elements by value
python filter files by modified time
Python, Multiprocessing: what to do if process.join() waits forever?
How to sum values in an iterator in a PySpark groupByKey()
Allauth will not save additional fields
Read Flask Session Cookie
How do I make my script take only numeric inputs without screwing it up
Incorrect output while reading text file in Python
PhantomJS - Permission Denied
Combining image RGB channels
Python input validation and edge case handling
Struggling with making a Python module accessible via PyPi
Finding the minimum and maximum of a list of arrays
Accessing Google Drive Spreadsheets with Python Gspread
py.test & pytest on Raspberry Pi : Differences ?

Categories

HOME
spring
dynamics-crm
websphere
raspbian
visual-studio-2013
avro
mainframe
phonegap-cli
value
snap.svg
filter
structuremap
ndis
cs-cart
pdo
thumbnails
ng2-dragula
aws-cognito
marathon
apply
multiplayer
binutils
header-files
go-cd
extractor
fabric
rhmap
osmdroid
lync-2013
interrupt-handling
pcre
mangodb
configure
superagent
helix-3d-toolkit
postgresql-9.2
modelandview
pycparser
fabric-digits
tic-tac-toe
exiftool
react-native-router-flux
spring-bean
resuming-training
email-parsing
range-v3
defold
shutdown
dwarf
crystal-reports-8.5
pagefile
selenium-firefoxdriver
spring-lemon
console-redirect
asp.net-mvc-partialview
financial
aerogear
cmocka
self-hosting
datainputstream
optionbutton
mirrorlink
registrykey
twitter-rest-api
jcr-sql2
libsndfile
grunt-express
mfmailcomposeviewcontroll
crystal-reports-10
flash-cc
edit-in-place
facebook-chat
convex-polygon
sql-server-2012-web
sharpmap
reporting-tools
android-contextmenu
trailing-slash
shim
law-of-demeter
objectbrowser
curljs
qtembedded
posting
adsl
msn
idictionary
callgrind
odbc-sql-server-driver
inline-if
evb
configurable
self-reference

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App