python


How do I initialize a Counter from a list of key-value pairs?


If I have a sequence of (key, value) pairs, I can quickly initialize a dictionary like this:
>>> data = [ ('a', 1), ('b', 2) ]
>>> dict(data)
{'a': 1, 'b': 2}
I would like to do the same with a Counter dictionary; but how? Both the constructor and the update() method treat the ordered pairs as keys, not key-value pairs:
>>> from collections import Counter
>>> Counter(data)
Counter({('a', 1): 1, ('b', 2): 1})
The best I could manage was to use a temporary dictionary, which is ugly and needlessly circuitous:
>>> Counter(dict(data))
Counter({'b': 2, 'a': 1})
Is there a proper way to directly initialize a Counter from a list of (key, count) pairs? My use case involves reading lots of saved counts from files (with unique keys).
I would just do a loop:
for obj, cnt in [ ('a', 1), ('b', 2) ]:
counter[obj] = cnt
You could also just call the parent dict.update method:
>>> from collections import Counter
>>> data = [ ('a', 1), ('b', 2) ]
>>> c = Counter()
>>> dict.update(c, data)
>>> c
Counter({'b': 2, 'a': 1})
Lastly, there isn't anything wrong with your original solution:
Counter(dict(list_of_pairs))
The expensive part of creating dictionaries or counters is hashing all of the keys and doing periodic resizes. Once the dictionary is made, converting it to a Counter is very cheap about as fast as a dict.copy(). The hash values are reused and the final Counter hash table is pre-sized (no need for resizing).
From docs:
Elements are counted from an iterable or initialized from another mapping (or counter)
So it's a No, you need to convert it to mapping and then initialize Counter. And Yes when you initialized with dict it was the right move.
UPDATE
I agree that #RaymondHettinger code looks good, and actually it's faster
from collections import Counter
from random import choice
from string import ascii_letters
a=[(choice(ascii_letters), i) for i in range(100)]
Tested with Python 3.6.1 and IPython 6
Initialization with dict:
%%timeit
c1=Counter(dict(a))
Output
12.1 µs ± 342 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Update with dict.update()
%%timeit
c2=Counter()
dict.update(c2, a)
Output:
7.21 µs ± 236 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
If your list of keys in the (key, value) pairs are already unique -- no duplicates -- you can use Raymond Hettinger's great solution.
Beware though you only get the last value for any given key if there are duplicate keys:
>>> data=[ ('a', 1), ('b', 2), ('a', 3), ('b', 4) ]
>>> c=Counter()
>>> dict.update(c, data)
>>> c
Counter({'b': 4, 'a': 3}) # note 'a' and 'b' are only the last value...
Same with dict:
>>> Counter(dict(data))
Counter({'b': 4, 'a': 3})
But Counters are most often used to count totals including of duplicates. If you want the sum of 'a' and 'b' entries, you need to loop over all the pairs:
>>> c=Counter()
>>> for k, v in data:
... c[k]+=v
...
>>> c
Counter({'b': 6, 'a': 4}) # the sum of the 'k' entries given 'v'

Related Links

How to retry urllib2.urlopen n times
Python: Create a multidimensional array from a loop
MySQL SELECT: Find all languages by Unicode
how to write IS NOT for str.endswith in python [closed]
Python — Use of Numpy.mgrid and Numpy.reshape
How can I add more to a file when printing stdout to a file, instead of overwriting the file?
How do I create a custom window title bar using PyQt4?
Trouble printing all items from a list in python
Python datetime vs time
Error handling (dividing by zero) [duplicate]
Regex won't capture past \n
Sorting lists by multiple criteria and appending
Redirecting a file to another location
Celery-RabbitMQ Distributed Queue Test Message
Using xlrd to read dates and xlsxwriter to write them in Python
convert string to dictionary python

Categories

HOME
ssas-2012
udp
project-management
xamarin.android
windows-7
facebook-graph-api
boost-thread
cpu-architecture
agile
reportportal
avro
windows-store-apps
recyclerview
triggers
nodemailer
currency
oclint
flexbox
jboss-eap-7
outlook-web-addins
lagom
public-key-encryption
viber
mod-pagespeed
yeoman-generator
ms-access-2007
chocolatey
jive
distance
tortoisegit
prediction
iis-10
weinre
gzip
bitcoin-testnet
rowcount
clockwork
outsystems
hawtio
body-parser
honeysql
broadleaf-commerce
issue-tracking
fltk
visualstudio.testtools
alchemy.js
soundjs
mediawiki-extensions
bing-translator-api
tango
homekit
powermta
finite-element-analysis
catia
applepayjs
mercury
visual-studio-templates
project-template
rsqlite
master
manifest.mf
icefaces
update-site
software-product-lines
json-schema-validator
instruments
amf
com-interop
magento-1.4
insertion-sort
plone-3.x
geodjango
document-oriented-db
android-snackbar
apigee-baas
wif
viper-architecture
cpu-cores
ibm-data-studio
pgm
stackframe
adaptive-compression
android-looper
joox
reserved-words
edit-in-place
gamepad
bigint
nservicebus4
driver-signing
vmware-server
visual-studio-2003
argb
mstsc
soa-suite
fileconveyor
enumerators
osi
wcf-web-api
hashalgorithm
throttling
datamember
handwriting
dashcode
graniteds
grooveshark
openvg
code-design
hardware-infrastructure

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App