@@ -258,3 +258,35 @@ Here is the speed-up relative to the 1-core job
Notes
* This demonstrates the linear scaling
* This demonstrates that if the code is instructed to use more cores than the available number, we reach a plateue
## Update: 04/12/2020
After it was pointed out that the scaling was linear, but that the gradient was not close to one, I studied the behaviour in a little more detail. First, here is a re-run of the data above showing different gradients:
Here, it looks like the gradient is ~0.5. This is at odds with [the pbilby paper](https://arxiv.org/pdf/1909.11873.pdf) which demonstrated a speed-up close to the theoretically expected behaviour (Eq 10):
After digging in, I realized that the single-core job used about half as many likelihood evaluations as the parallelized version. Here is a table of the number of evaluations:
| n cores | # likelihood evaluations [millions] |
| ------ | ------ |
| 1 | 0.59 |
| 4 | 1.5 |
| 8 | 1.5 |
| 12 | 1.5 |
| 16 | 1.6 |
So, this offers to ways to calculate the speed up. The usual "total time" method, or on a "per-likelihood". On this basis, things look much better!
Of course, what we really care about is "total time". So, some conclusions:
1. The parallel algorithm is different from the serial algorithm.
2. The parallel algorithm is about 2-3 times less efficient than the serial algorithm.
3. This explains the difference in speedups (pbilby speedups where measured per-likelihood)
4. It is worth stating: while it is less efficient, the parallel algorithm does let you scale!
5. This suggests the parallel algorithm could be improved yielding up to a factor of 3 in speed gains.
Note: For the first run of the update, the ratio of likelihood evaluations between the serial and parallel jobs was ~2.8 while for the second run it was 2.7)