python


Python Multiprocessing does not wait


I am currently using multiprocessing functions to analyze roughly 10 files.
However, I only want to run 5 processes at each time.
When I try to implement this, it doesn't work. More processes are created then the number I specified. Is there a way that easily limits the number of processes to 5? (Windows 7 / Python 2.7)
EDIT:
I'm afraid your solutions still don't work. I will try to post some more details here;
Main python file;
import python1
import python2
import multiprocessing
# parallel = [fname1, fname2, fname3, fname4, fname5, fname6, fname7, fname8, fname9, fname10]
if name == '__main__':
pool = multiprocessing.Pool(processes=max(len(parallel), 5))
print pool.map(python1.worker, parallel)
Python1 file;
import os
import time
import subprocess
def worker(sample):
command = 'perl '+sample[1].split('data_')[0]+'methods_FastQC\\fastqc '+sample[1]+'\\'+sample[0]+'\\'+sample[0]+' --outdir='+sample[1]+'\\_IlluminaResults\\_fastqcAnalysis'
subprocess.call(command)
return sample
The return statement of 12 files come back befóre all the opened perl modules have closed. Also 12 perl shells are opened instead of only the max of 5. (Image; You can clearly see that the return statements come back before the perl commands even finish, and there are more than 5 processes http://oi57.tinypic.com/126a8ht.jpg)
I tried with the following code under Linux with python-2.7 and it doesn't assert. Only 5 processes are created at a time.
import os
import multiprocessing
import psutil
from functools import partial
def worker(pid, filename):
# assert len(psutil.Process(pid).children(recursive=True)) == 5 # for psutil-2.x
assert len(psutil.Process(pid).get_children(recursive=True)) == 5
print(filename)
parallel = range(0, 15)
if __name__ == '__main__':
# with multiprocessing.Pool(processes=5) as pool: # if you use python-3
pool = multiprocessing.Pool(processes=min(len(parallel), 5))
pool.map(partial(worker, os.getpid()), parallel)
Of course, if you use os.system() inside the worker function, it will create extra processes and the process tree will look like (using os.system('sleep 1') here)
\_ python2.7 ./test02.py
\_ python2.7 ./test02.py
| \_ sh -c sleep 1
| \_ sleep 1
\_ python2.7 ./test02.py
| \_ sh -c sleep 1
| \_ sleep 1
\_ python2.7 ./test02.py
| \_ sh -c sleep 1
| \_ sleep 1
\_ python2.7 ./test02.py
| \_ sh -c sleep 1
| \_ sleep 1
\_ python2.7 ./test02.py
\_ sh -c sleep 1
\_ sleep 1
I don't know why it is a secret what exactly doesn't happen and what happens instead.
And providing a SSCCE means a program that actually runs. (Have a look at the worker() function, for example. It gets a file parameter which is never used, and uses a command variable which is nowhere defined.)
But I think it is the point that your fileX are just file names and they are tried to be executed.
Change your function to
def worker(filename):
command = "echo X " + filename + " Y"
os.system(command)
and it should work fine. (Note that I changed file to filename in order not to hide a built-in name.)
BTW, instead of os.system() you should use the subprocess module.
In this case, you can do
import subprocess
def worker(filename):
command = ["echo", "X", filename, "Y"]
subprocess.call(command)
which should do the same.
Just as a stylistic remark:
pool = multiprocessing.Pool(processes=max(len(parallel), 5))
is simpler and does the same.
Your edit makes the problem much clearer now.
It seems that due to unknown reasons your perl programs exit earlier than they are really finished. I don't know why that happens - maybe they fork another process by themselves and exit immediately. Or it is due to windows and its weirdnesses.
As soon as the multiprocessing pool notices that a subprocess claims to be finished, it is ready to start another one.
So the right way would be to find out why the perl programs don't work as expected.

Related Links

How to pack python files and its dependencies in a single executable file?
Printing Variable names and contents as debugging tool; looking for emacs/Python shortcut
Cheking added file to upload python, pylons?
How to refer to the local module in Python?
Is close() necessary when using iterator on a Python file object [duplicate]
Django Admin “Edit Selection” Action?
How to change firefox proxy from webdriver?
Is it possible to hook up a more robust HTML parser to Python mechanize?
Enable Unicode “globally” in Python
Dynamically import a callable given the full module path?
python chaining
py2app and xml.etree.ElementTree
What is the difference between isinstance('aaa', basestring) and isinstance('aaa', str)?
Is this essential functional programming feature missing from python?
Hooking into a wave-out on different platforms
What causes subprocess.call to output blank file when attempting db export with mysqldump?

Categories

HOME
sidekiq
numpy
answer-set-programming
signalr
npm
reverse-engineering
oauth
websphere
ocaml
avro
h2
console
propertygrid
hyperledger-fabric
histogrammar
snap.svg
java-home
slim-3
vuex
hosts
xamarin-studio
wordpress-theming
gatsby
automata
cratedb
react-leaflet
bootstrap-popover
phpbb
yeoman-generator
redux-observable
zoomcharts
off-canvas-menu
eclipselink
jconsole
backpropagation
jquery-form-validator
cronet
jflex
maximo
edge-detection
nesc
mangodb
pygooglechart
twitter-bootstrap-2
expand
pdflatex
reactivemongo
ksoap
rhel.net
elfinder
linq-to-entities
common.logging
wcf-security
ddms
floating-accuracy
web-development-server
mediawiki-extensions
html-agility-pack
stress-testing
encapsulation
dojox.grid.datagrid
rapidweaver
applepayjs
bnd
dwarf
android-bitmap
fedora20
heroku-postgres
quintus
static-code-analysis
rsqlite
keycode
sql-import-wizard
audioqueue
slam-algorithm
tuxedo
spring-repositories
string-parsing
gemini
faraday
mongo-c-driver
spinlock
usb-drive
void
integral
rx-groovy
es2015
custom-url
adodb
hana-xs
moai
ultrawingrid
fragment-tab-host
hidden-field
cppdepend
gfs
bfd
document-database
sslexception
driver-signing
sql-server-2012-web
entity-framework-4.1
windows-update
vt100
android-4.0
attachevent
returnurl
couchdb-lucene
responsetext
flexicious
selectonemenu
scraperwiki
printing-web-page
scalaxb
ruby-debug
rijndael
libavformat
clause
loadui
svn-hooks
osx-leopard
multibyte-functions
divx

Resources

Mobile Apps Dev
Database Users
javascript
java
csharp
php
android
MS Developer
developer works
python
ios
c
html
jquery
RDBMS discuss
Cloud Virtualization
Database Dev&Adm
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App