Python Multiprocessing does not wait
I am currently using multiprocessing functions to analyze roughly 10 files. However, I only want to run 5 processes at each time. When I try to implement this, it doesn't work. More processes are created then the number I specified. Is there a way that easily limits the number of processes to 5? (Windows 7 / Python 2.7) EDIT: I'm afraid your solutions still don't work. I will try to post some more details here; Main python file; import python1 import python2 import multiprocessing # parallel = [fname1, fname2, fname3, fname4, fname5, fname6, fname7, fname8, fname9, fname10] if name == '__main__': pool = multiprocessing.Pool(processes=max(len(parallel), 5)) print pool.map(python1.worker, parallel) Python1 file; import os import time import subprocess def worker(sample): command = 'perl '+sample.split('data_')+'methods_FastQC\\fastqc '+sample+'\\'+sample+'\\'+sample+' --outdir='+sample+'\\_IlluminaResults\\_fastqcAnalysis' subprocess.call(command) return sample The return statement of 12 files come back befóre all the opened perl modules have closed. Also 12 perl shells are opened instead of only the max of 5. (Image; You can clearly see that the return statements come back before the perl commands even finish, and there are more than 5 processes http://oi57.tinypic.com/126a8ht.jpg)
I tried with the following code under Linux with python-2.7 and it doesn't assert. Only 5 processes are created at a time. import os import multiprocessing import psutil from functools import partial def worker(pid, filename): # assert len(psutil.Process(pid).children(recursive=True)) == 5 # for psutil-2.x assert len(psutil.Process(pid).get_children(recursive=True)) == 5 print(filename) parallel = range(0, 15) if __name__ == '__main__': # with multiprocessing.Pool(processes=5) as pool: # if you use python-3 pool = multiprocessing.Pool(processes=min(len(parallel), 5)) pool.map(partial(worker, os.getpid()), parallel) Of course, if you use os.system() inside the worker function, it will create extra processes and the process tree will look like (using os.system('sleep 1') here) \_ python2.7 ./test02.py \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py \_ sh -c sleep 1 \_ sleep 1
I don't know why it is a secret what exactly doesn't happen and what happens instead. And providing a SSCCE means a program that actually runs. (Have a look at the worker() function, for example. It gets a file parameter which is never used, and uses a command variable which is nowhere defined.) But I think it is the point that your fileX are just file names and they are tried to be executed. Change your function to def worker(filename): command = "echo X " + filename + " Y" os.system(command) and it should work fine. (Note that I changed file to filename in order not to hide a built-in name.) BTW, instead of os.system() you should use the subprocess module. In this case, you can do import subprocess def worker(filename): command = ["echo", "X", filename, "Y"] subprocess.call(command) which should do the same. Just as a stylistic remark: pool = multiprocessing.Pool(processes=max(len(parallel), 5)) is simpler and does the same. Your edit makes the problem much clearer now. It seems that due to unknown reasons your perl programs exit earlier than they are really finished. I don't know why that happens - maybe they fork another process by themselves and exit immediately. Or it is due to windows and its weirdnesses. As soon as the multiprocessing pool notices that a subprocess claims to be finished, it is ready to start another one. So the right way would be to find out why the perl programs don't work as expected.
How to pack python files and its dependencies in a single executable file?
Printing Variable names and contents as debugging tool; looking for emacs/Python shortcut
Cheking added file to upload python, pylons?
How to refer to the local module in Python?
Is close() necessary when using iterator on a Python file object [duplicate]
Django Admin “Edit Selection” Action?
How to change firefox proxy from webdriver?
Is it possible to hook up a more robust HTML parser to Python mechanize?
Enable Unicode “globally” in Python
Dynamically import a callable given the full module path?
py2app and xml.etree.ElementTree
What is the difference between isinstance('aaa', basestring) and isinstance('aaa', str)?
Is this essential functional programming feature missing from python?
Hooking into a wave-out on different platforms
What causes subprocess.call to output blank file when attempting db export with mysqldump?