WIP: Put all work in one big parallel region
This is supposed to be faster because it avoids creating and destroying threads repeatedly. But I don't think that it actually is faster because OpenMP capable compilers are smarter than they used to be.
This is supposed to be faster because it avoids creating and destroying threads repeatedly. But I don't think that it actually is faster because OpenMP capable compilers are smarter than they used to be.