python


Is it OK to create very large tuples in Python?


I have a quite large list (>1K elements) of objects of the same type in my Python program. The list is never modified - no elements are added, removed or changed. Are there any downside to putting the objects into a tuple instead of a list?
On the one hand, tuples are immutable so that matches my requirements. On the other hand, using such a large tuple just feels wrong. In my mind, tuples has always been for small collections. It's a double, a tripple, a quadruple... Not a two-thousand-and-fiftyseven-duple.
Is my fear of large tuples somehow justified? Is it bad for performance, unpythonic, or otherwise bad practice?
In CPython, go ahead. Under the covers, the only real difference between the storage of lists and tuples is that the C-level array holding the tuple elements is allocated in the tuple object, while a list object contains a pointer to a C-level array holding the list elements, which is allocated separately from the list object. The list implementation needs to do that because the list may grow, and so the memory containing the C-level vector may need to change its base address. A tuple can't change size, so the memory for it is allocated directly in the tuple object.
I've created tuples with millions of elements, and yet I lived to type about it ;-)
Obscure
In CPython, there can even be "a reason" to prefer giant tuples: the cyclic garbage collection scheme exempts a tuple from periodic scanning if it only contains immutable objects. Then the tuple can never be part of a cycle, so cyclic gc can ignore it. The same optimization cannot be used for lists; just because a list contains only immutable objects during one run of cyclic gc says nothing about whether that will still be the case during the next run.
This is almost never highly significant, but it can save a percent or so in a long-running program, and the benefit of exempting giant tuples grows the bigger they are.
Yes, it is OK.
However, depending on the operations you're doing, you might want to consider using the set function in Python. This will convert your input iterable (tuple, list, or other) to a set. Sets are nice for a few reasons, but especially because you get a unique list of items that has constant time lookup for items.
There's nothing "un-pythonic" about holding large data sets in memory, though.

Related Links

How to calculate the mean of a column by decade in Python
How to read webpages that are without .htm* extension using Python?
can I use python's 'socket' module to connect to a wireless ethernet host?
Requests VS Urllib 2 [closed]
Call python class from another Python script
How can I use BeautifulSoup to get a few contents that comes after a specific text on a page?
Reinstall python 2.7.12 and python 3.5.2
How to keep chrome browser window open after selenium script finishes on python
Outlook email attachment downloader (Date range)
Create array based on conditional logic of values in other arrays in Python
Python qt - TableWidget update MySQL
Getting black plots with plt.imshow after multiplying image array by a scalar
Updating a value in a Pandas dataframe seems to update all dataframes
4 entry box nummeric keypad
os.walk for loop not executing [duplicate]
python matplotlib polar plot

Categories

HOME
ajax
caching
cluster-computing
sass
answer-set-programming
amazon-swf
system-verilog
formal-verification
propertygrid
subquery
dropbox
rfid
nixos
xlsx
activesync
symfony2-easyadmin
rebol2
lenskit
uisplitviewcontroller
titan
ng-show
aws-cognito
backpropagation
raphael
spring-ws
tflearn
nmf
npm-install
samsung-mobile
rhmap
lync-2013
cep
modulo
honeysql
bigcartel
initializer
concrete5-5.7
active-model-serializers
linq-to-entities
bing-translator-api
tango
dojox.mobile
rich-text-editor
lirc
floor
range-v3
ideone
webdatagrid
applepayjs
pycurl
hspec
manifest.mf
excon
tomcat5
bacnet
pagekit
database-backups
gradle-eclipse
shipitjs
inotifypropertychanged
picking
distribute
sourcegear-vault
team-build
leadtools-sdk
crash-dumps
word-2013
multipleselection
ticket-system
android-search
hana-xs
start-job
kallithea
activity-streams
omnifocus
fragment-tab-host
dmp
grunt-express
vcl
ogr2ogr
jstat
sthttprequest
sid
yahoo-boss-api
alertifyjs
initialization-vector
roxygen
healthvault
visual-studio-addins
baucis
real-time-updates
bigint
cdata
cilk-plus
facebook-sdk-3.1
vertical-rhythm
client-side-scripting
cgimageref
lync-server-2010
cbcentralmanager
usn
icefaces-3
hibernate3
nemerle
easygui
opengl-es-lighting
scalaxb
loadui
external-assemblies
integer-promotion
winsnmp
virtualquery

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App