python


Celery chunks large data set


I'm trying to use celery's chunks functionality to divide my iterable dataset into pieces, which is then sent to a celery task for further processing.
I have a query_set that I got from making the following sqlalchemy call
query_set = MyModel.query.join(OtherModel).all())
Currently, query_set is a list of tuples. The lenth of query_results is at 40,000 and growing.
I have another function (celery task) that crunches the data in query_set, whose definition is
#celery_app.task
def crunch_qs(query_set):
. . .
. . .
Since query_set is a list of tuples, I figured I could pass it directly to crunch_qs like this
crunched_qs = crunch_qs.chunks(query_set, 5000)()
results = crunched_qs.get()
That did not work. It gave me an unexpected result. It was unpacking the items in each query_set's tuple and sending them to crunch_qs.
So crunch_qs would receive **query_set[0] on first iteration, which raised the following error
TypeError: crunch_qs() takes exactly 1 argument (10 given)
len(query_set[0]) = 10
I also tried..
crunched_qs = crunch_qs.chunks((row,) for row in query_set, 5000)()
results = crunched_qs.get()
That worked a little better. The TypeError went away. However, my crunch_qs function is now getting each row (tuple) as a parameter instead of a list of tuples whose length is 5000.
Any help/ideas on how to pass a list of tuples to celery chunks would be highly appreciated.
Thanks in advance


Related Links

Populate XML values from HTML Web Form Using Python
Gunicorn + Flask-Restful : High CPU usage while starting
Pymongo Regex match with list
Python 2.7 The 'packaging' package is required; normally this is bundled with this package
EOF Error Pickle
Graphviz: write result to file
Debug behavior differ from normal execution in python
nodejs unable to pass more than one arguments to the script with spawn
Setting indents while writing to an xml. Python.
Python Numpy's argsort duplicate issue [duplicate]
How to write Python script like shell script for UNIX?
Why does pythons slice indexing give counter intuitive results? [duplicate]
How to solve “Insufficient Permission” for userUsageReport with Google API?
Python: TypeError: 'list' object is not callable on global variable
sudo/suid non-root nesting fails
Difference between a list & a stack in python?

Categories

HOME
sql-server-2008
xbox-live
laravel-5
sql-server
sidekiq
symfony
plsql
admin-on-rest
wildcard
raspbian
wxwidgets
laravel-5.2
angular2-directives
bibtex
matplotlib
appmaker
filter
dataframe
data-science-experience
dropbox
constructor
ctypes
jacoco
scaling
opera-mini
android-toolbar
aptana
primes
ip-camera
cvs2svn
deb
google-maps-android-api-2
memsql
delete-file
compare-and-swap
unpack
owl-api
bxslider
hibernate-cache
entity-system
helix-3d-toolkit
preg-grep
getline
receipt
salesforce-chatter
large-data
pmwiki
liquid-xml
installshield-2012
wcf-security
bayesian-networks
apiary
vcf
jtds
gist
jenkins-jira-trigger
rollback
info
python-hypothesis
date-range
emgu
keycode
mongoose-populate
software-product-lines
instruments
spatial-query
spring-repositories
search-regex
financial
handlebars.java
magento-1.4
android-cursoradapter
coldfusion-7
document-oriented-db
eyeql
matcaffe
python-winshell
stream-framework
candidate-key
livechat
node-inspector
bullet
proxies
valence
korma
magicalrecord-2.2
sql-server-2012-web
dynamic-binding
apache2.2
lame
fluidsynth
template-haskell
git-filter-branch
maven-ear-plugin
usn
static-variables
parameterization
responsetext
objective-c-protocol
cross-domain-policy
serp
collect
msn
google-instant
openvg
windows-controls
longjmp
internals
web-analytics-tools





Mobile Apps Dev
Database Users
javascript
java
csharp
php
android


MS Developer
developer works
python
ios
c
html
jquery


RDBMS discuss