Python Multiprocessing does not wait
I am currently using multiprocessing functions to analyze roughly 10 files. However, I only want to run 5 processes at each time. When I try to implement this, it doesn't work. More processes are created then the number I specified. Is there a way that easily limits the number of processes to 5? (Windows 7 / Python 2.7) EDIT: I'm afraid your solutions still don't work. I will try to post some more details here; Main python file; import python1 import python2 import multiprocessing # parallel = [fname1, fname2, fname3, fname4, fname5, fname6, fname7, fname8, fname9, fname10] if name == '__main__': pool = multiprocessing.Pool(processes=max(len(parallel), 5)) print pool.map(python1.worker, parallel) Python1 file; import os import time import subprocess def worker(sample): command = 'perl '+sample.split('data_')+'methods_FastQC\\fastqc '+sample+'\\'+sample+'\\'+sample+' --outdir='+sample+'\\_IlluminaResults\\_fastqcAnalysis' subprocess.call(command) return sample The return statement of 12 files come back befóre all the opened perl modules have closed. Also 12 perl shells are opened instead of only the max of 5. (Image; You can clearly see that the return statements come back before the perl commands even finish, and there are more than 5 processes http://oi57.tinypic.com/126a8ht.jpg)
I tried with the following code under Linux with python-2.7 and it doesn't assert. Only 5 processes are created at a time. import os import multiprocessing import psutil from functools import partial def worker(pid, filename): # assert len(psutil.Process(pid).children(recursive=True)) == 5 # for psutil-2.x assert len(psutil.Process(pid).get_children(recursive=True)) == 5 print(filename) parallel = range(0, 15) if __name__ == '__main__': # with multiprocessing.Pool(processes=5) as pool: # if you use python-3 pool = multiprocessing.Pool(processes=min(len(parallel), 5)) pool.map(partial(worker, os.getpid()), parallel) Of course, if you use os.system() inside the worker function, it will create extra processes and the process tree will look like (using os.system('sleep 1') here) \_ python2.7 ./test02.py \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py | \_ sh -c sleep 1 | \_ sleep 1 \_ python2.7 ./test02.py \_ sh -c sleep 1 \_ sleep 1
I don't know why it is a secret what exactly doesn't happen and what happens instead. And providing a SSCCE means a program that actually runs. (Have a look at the worker() function, for example. It gets a file parameter which is never used, and uses a command variable which is nowhere defined.) But I think it is the point that your fileX are just file names and they are tried to be executed. Change your function to def worker(filename): command = "echo X " + filename + " Y" os.system(command) and it should work fine. (Note that I changed file to filename in order not to hide a built-in name.) BTW, instead of os.system() you should use the subprocess module. In this case, you can do import subprocess def worker(filename): command = ["echo", "X", filename, "Y"] subprocess.call(command) which should do the same. Just as a stylistic remark: pool = multiprocessing.Pool(processes=max(len(parallel), 5)) is simpler and does the same. Your edit makes the problem much clearer now. It seems that due to unknown reasons your perl programs exit earlier than they are really finished. I don't know why that happens - maybe they fork another process by themselves and exit immediately. Or it is due to windows and its weirdnesses. As soon as the multiprocessing pool notices that a subprocess claims to be finished, it is ready to start another one. So the right way would be to find out why the perl programs don't work as expected.
Using sample weights for partial_fit with SGDclassifier in Sci-Kit Learn
Python plot : legend text on the same line
Python Integer type-check function output is inconsistent
Python + BeautifulSoup: How to get ‘href’ attribute of ‘a’ element?
Stringing together a series of foreign keys to return a table value in Django
Why do I get complex numbers when computing the transition matrix - Sympy
scrape hidden email field from ajax site
How to use pandas to print the difference of two columns?
How to use google cloud vision with Google App Engine Python?
Time Complexity in Python - Big O notation
Is there a better approach in finding items that starts with a certain character other than looping each items?
What does the greater-than symbol mean in this piece of python code?
Python3 - Why this code occurs list out of index?
Python cProfile - decorated functions obscuring profile visualization
save correctly my DataFrame
How can you automatically check if one or more attribute is modified in sqlalchemy?