... | ... | @@ -216,3 +216,22 @@ In [5]: %timeit np.interp(2.22, x, y) |
|
|
```
|
|
|
|
|
|
The average error in magnitude of `d_inner_h` when using linear interpolation (compared with cubic) is 0.1%, and max error is 3.7%.
|
|
|
|
|
|
## MPI task overloading
|
|
|
MPI barrier time (caused by workers finishing at different times) can be reduced by overloading the tasks, i.e. setting the number of live points to be greater than the number of workers. This is done using the `queue_size` argument of `NestedSamppler`. For example,
|
|
|
```
|
|
|
queue_size=POOL_SIZE*4,
|
|
|
```
|
|
|
will sample 4 times as many points as there are workers. This works because it allows workers that have finished early to immediately start on a new task. After the pool of tasks has been exhausted, the barrier time problem remains, but this wasted time is now averaged over a large number of tasks, reducing the overall barrier fraction.
|
|
|
|
|
|
For low core counts, overloading results in a faster run time, but scaling is worse due to the inefficiencies that arise from a large pool size. At 128 cores and above, run time is slower.
|
|
|
|
|
|

|
|
|
|
|
|
Barrier time is initially high because the sampling points are spread across the entire domain, where some points take much longer to return a solution. Later in the run, the points are closely clustered and evolve from a similar starting point, and thus return in a similar time. Whereas the initial barrier time peak is reduced by overloading, the barrier time does not decay to smaller values later in the run.
|
|
|
|
|
|

|
|
|
|
|
|

|
|
|
|
|
|
The speedup from overloading can be taken advantage of without the later slowdown by only enabling overloading at the beginning of each run. |
|
|
\ No newline at end of file |