python


Celery chunks large data set


I'm trying to use celery's chunks functionality to divide my iterable dataset into pieces, which is then sent to a celery task for further processing.
I have a query_set that I got from making the following sqlalchemy call
query_set = MyModel.query.join(OtherModel).all())
Currently, query_set is a list of tuples. The lenth of query_results is at 40,000 and growing.
I have another function (celery task) that crunches the data in query_set, whose definition is
#celery_app.task
def crunch_qs(query_set):
. . .
. . .
Since query_set is a list of tuples, I figured I could pass it directly to crunch_qs like this
crunched_qs = crunch_qs.chunks(query_set, 5000)()
results = crunched_qs.get()
That did not work. It gave me an unexpected result. It was unpacking the items in each query_set's tuple and sending them to crunch_qs.
So crunch_qs would receive **query_set[0] on first iteration, which raised the following error
TypeError: crunch_qs() takes exactly 1 argument (10 given)
len(query_set[0]) = 10
I also tried..
crunched_qs = crunch_qs.chunks((row,) for row in query_set, 5000)()
results = crunched_qs.get()
That worked a little better. The TypeError went away. However, my crunch_qs function is now getting each row (tuple) as a parameter instead of a list of tuples whose length is 5000.
Any help/ideas on how to pass a list of tuples to celery chunks would be highly appreciated.
Thanks in advance

Related Links

IndexError: list index out of range - Odoo v8 to Odoo v10 community
Appending a Linklist Node to a queue
longest time lazy flappy bird can survive - consecutive gap between 2 arrays
Return outside of function gives an error but print works fine
Combine two tables only when 3 similar values using pandas python
Sockets python client
python requests return a different web page from browser or urllib
Does Pyspark ML KMean have a way to get the explained variance?
Showing total on stacked bar Plotly
Traversal through a string with a loop in Python [duplicate]
Inline block not working with Python Django
Wrong structure - opts.get('deposit_date').strftime('%A')
Using Numpy to load in Large data files?
Tensorflow retrain.py tensorflow.python.framework.errors_impl.FailedPreconditionError
Draw line over image with PyQt
Add a bit of text before and after crispy_forms InlineRadios?

Categories

HOME
google-apps-script
dotnetrdf
ssas-2012
signalr
npm
meshlab
google-tag-manager
agile
label
couchdb-2.0
avl-tree
contact-form-7
google-plus
game-maker-studio-1.4
nuxt.js
boolean-expression
multiple-columns
plsqldeveloper
google-ima
facebook4j
compare-and-swap
rhmap
winscp
superagent
pitest
roundcube
recurrence-relation
viewstate
windows-10-iot-core
shibboleth
color-scheme
rider
forecasting
jquery-multidatespicker
common.logging
intellilock
struts-layout
pluck
volume
static-code-analysis
onresume
wcf-ria-services
slam-algorithm
cron-task
bacnet
metalsmith
cctray
direct3d12
hyperthreading
unity5.3
twitter-rest-api
isml
android-search
ejabberd-saas
odftoolkit
mser
hidden-field
docopt
libssh2
argument-passing
mfmailcomposeviewcontroll
alertifyjs
help-viewer
roxygen
visual-c++-2010-express
asp.net-mvc-scaffolding
sttwitter
sharpmap
vertical-scrolling
autostart
windows-live-id
gethashcode
nvelocity
squeel
getresource
database-diagramming

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App