... | ... | @@ -257,4 +257,36 @@ Here is the speed-up relative to the 1-core job |
|
|
|
|
|
Notes
|
|
|
* This demonstrates the linear scaling
|
|
|
* This demonstrates that if the code is instructed to use more cores than the available number, we reach a plateue |
|
|
\ No newline at end of file |
|
|
* This demonstrates that if the code is instructed to use more cores than the available number, we reach a plateue
|
|
|
|
|
|
## Update: 04/12/2020
|
|
|
|
|
|
After it was pointed out that the scaling was linear, but that the gradient was not close to one, I studied the behaviour in a little more detail. First, here is a re-run of the data above showing different gradients:
|
|
|
![image](uploads/32ece9913c735eb981eac9ad52011a9e/image.png)
|
|
|
|
|
|
Here, it looks like the gradient is ~0.5. This is at odds with [the pbilby paper](https://arxiv.org/pdf/1909.11873.pdf) which demonstrated a speed-up close to the theoretically expected behaviour (Eq 10):
|
|
|
![image](uploads/36b4515be72d9748e0d7fc114bf83f5d/image.png)
|
|
|
|
|
|
After digging in, I realized that the single-core job used about half as many likelihood evaluations as the parallelized version. Here is a table of the number of evaluations:
|
|
|
|
|
|
| n cores | # likelihood evaluations [millions] |
|
|
|
| ------ | ------ |
|
|
|
| 1 | 0.59 |
|
|
|
| 4 | 1.5 |
|
|
|
| 8 | 1.5 |
|
|
|
| 12 | 1.5 |
|
|
|
| 16 | 1.6 |
|
|
|
|
|
|
So, this offers to ways to calculate the speed up. The usual "total time" method, or on a "per-likelihood". On this basis, things look much better!
|
|
|
|
|
|
![image](uploads/e98da47726dff85af7841a78a713555c/image.png)
|
|
|
|
|
|
Of course, what we really care about is "total time". So, some conclusions:
|
|
|
|
|
|
1. The parallel algorithm is different from the serial algorithm.
|
|
|
2. The parallel algorithm is about 2-3 times less efficient than the serial algorithm.
|
|
|
3. This explains the difference in speedups (pbilby speedups where measured per-likelihood)
|
|
|
4. It is worth stating: while it is less efficient, the parallel algorithm does let you scale!
|
|
|
5. This suggests the parallel algorithm could be improved yielding up to a factor of 3 in speed gains.
|
|
|
|
|
|
Note: For the first run of the update, the ratio of likelihood evaluations between the serial and parallel jobs was ~2.8 while for the second run it was 2.7) |
|
|
\ No newline at end of file |