Find all occurrences of a string except if found within another pattern
I have blocks of text like below where I am looking to find all occurrences of; data ...; ... run; where ... can be any type of string pattern. I want to only find occurences of this where the pattern is not within C style comments or if it is wrapped in another pattern like below. I want to find all occurrences of; data foo; set bar; run; but not %macro x(); data foo; set bar; run; %mend; or /* data foo;*/ /* set bar;*/ /* run;*/ I have the following function which will exclude the pattern when wrapped in a comment or %macro ... %mend however it is only returning the last match and not each occurrence. How can I adjust this to return every match as a list of lists with one list per block? Thanks in advance. s = """ /** * #file * #brief Description of the program */ /** * #macro xyz * #brief Description of the Macro */ %macro xyz(); data foo_nomatch; set bar; run; %mend; /** * #data foo_matchme * #brief Description of the DataStep */ data foo_matchme; set bar; run; # Should Not Match /** * data foo_nomatch2; * set bar; * run; */ /** * #datastep: foo2 * #brief: This is a description. */ # Should match as a 2nd match data foo_matchme2; set bar; run; """ def datastep(s): t1 = 'data' t2 = 'run;' t3 = ';' e1 = re.escape('/**') e2 = re.escape('*/') e3 = re.escape('%macro') e4 = re.escape('%mend') return re.findall('%s.*%s|%s.*%s|(%s.*?%s)' %(e1,e2,e3,e4,t1,t2),s,re.DOTALL|re.IGNORECASE) print(datastep(s))
Make the .*-part of the skip-subregexes non-greedy, i.e., change '%s.*%s|%s.*%s|(%s.*?%s)' to '%s.*?%s|%s.*?%s|(%s.*?%s)'. Demo: for match in datastep(s): if match: print(match) Output: data foo_matchme; set bar; run; data foo_matchme2; set bar; run;
What is the appropriate way to implement n-elitist selection with replacement?
Implementation specific behavior of `groupby` and argument unpacking
Matching in a list of lists
handle legend in matplotlib?
Python list for different number of strings per line
Why I got different signature when I perform RSA sign in C++ and Python?
How do I count the number of nonzero values in a given array column?
Python - pymodbus - able to connect to device but can't read registers
How to parallelize the dataframe transformation using pyspark pipeline?
how to animate a scatterplot of pandas data with matplotlib
Pandas to_sql() inserting index
Much time cost from worker to ps server
Matplotlib plot frame
python 3 redirect to magnet
Django REST Framework user registration Validation confusion
google analytics api v4 behaving very wierd