Writing and reading to and from BigQuery in one Dataflow job
I have a seemingly simple problem when constructing my pipeline for Dataflow. I have multiple pipelines that fetch data from external sources, transform the data and write it to several BigQuery tables. When this process is done I would like to run a query that queries the just generated tables. Ideally I would like this to happen in the same job. Is this the way Dataflow is meant to be used, or should the loading to BigQuery and the querying of the tables be split up between jobs? If this is possible in the same job how would one solve this, as the BigQuerySink does not generate a PCollection? If this is not possible in the same job, is there some way to trigger a job on the completion of another job (i.e. the writing job and the querying job)?
You alluded to what would need to happen to do this in a single job -- the BigQuerySink would need to produce a PCollection. Even if it is empty, you could then use it as the input to the step that reads from BigQuery in a way that made that step wait until the first sink was done. You would need to create your own version of the BigQuerySink to to do this. If possible, an easier option might be to have the second step read from the collection that you wrote to BigQuery rather than reading the table you just put into BigQuery. For example: PCollection<TableRow> rows = ...; rows.apply(BigQuery.Write.to(...)); rows.apply(/* rest of the pipeline */); You could even do this earlier if you wanted to continue processing the elements written to BigQuery rather than the table rows.
Python Struct Library appending unwanted Char 'L'
Python 3.4 Rock Paper Scissors
Timed Actions - Randomized
Split pandas column into two
Case insensitive field in Django
How to rewrite the code which appends to lists with numpy arrays
Figure in a figure in a figure
Cron job not working all the way. and Python is only half working on my script
Module vtkCommonCorePython not found in windows
Trying to use exec() to define a variable by setting a string equal to a numpy ndarray, but I get a syntax error. Python 2.7.10
Django Staticfiles not being served on Azure
Create virtualenv with most recent version of python
Python date function bugs
Maya/Python: How do I scale multiple selected animation curves each from their own specific pivot point?
django model instance in form without model
How can I generically call super().method(), without explicitly passing in the current class? [duplicate]