WIP: Put all work in one big parallel region

This is supposed to be faster because it avoids creating and
destroying threads repeatedly. But I don't think that it actually
is faster because OpenMP capable compilers are smarter than they
used to be.
12 jobs for one-big-parallel-section in 33 minutes and 53 seconds (queued for 11 seconds)
latest
Status Name Job ID Coverage
  Dist
passed dependencies/python3.6 #212510

00:00:24

passed dependencies/python3.7 #212511

00:00:22

passed sdist #212505

00:00:26

passed wheel:cp36-cp36m-macosx #212508
macos_elcapitan

00:00:28

passed wheel:cp36-cp36m-manylinux1 #212506

00:00:36

passed wheel:cp37-cp37m-macosx #212509
macos_elcapitan

00:00:25

passed wheel:cp37-cp37m-manylinux1 #212507

00:00:28

 
  Test
passed docs #212512

00:09:53

passed lint #212516

00:00:27

passed test/coverage #212515

00:29:08

83.7%
passed test/python3.6 #212513

00:15:02

passed test/python3.7 #212514

00:14:06