Dlphin MX Adapter Testing
Test Roadmap
Test | Complete? | Pass? | Issues Found |
---|---|---|---|
4Km DTS1 RTT Benchmark | Yes | None | |
Bandwidth Test | Yes | BW is limited by link length, ~same for IX/MX | |
Whole buffer use test | Yes | Need to make sure we don't cflush() past buffer |
|
CDSRFM Test | Yes | Already see IPC errors with 2 FEs | |
Reboot Glitch Testing | No | Reconfigure HW when done with LR testing | |
Removing Dolphin Drivers Panics Kernel | No | Looks like it does, checkout a bit more. |
Other Configurations/Setups to Try
Change/Test | Complete? | Pass? | Issues Found |
---|---|---|---|
Dolphin DMA Testing | Yes | GPL taint, bad max latencies (~25X mean) | |
Swap CDSRFM machine to faster FE | Yes | Maxes get a bit worse, when outside of RT compatible constraints. | |
Remove fiber rate limit on 4K PCIe bust extender | Yes | Used with the new FE, no measurable difference. | |
Use the new Adnaco (16 lane?) fiber bus extender | No |
Hardware
-
x2cdsrfm
- Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz- Switched to a
Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz
for the 'new FE' tests
- Switched to a
-
x2lsc
- Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz -
x2iscex
- Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz -
x2cdsrfm
<->x2iscex
Is the 4Km link. -
x2cdsrfm
<->x2lsc
Is the short link.
4Km DTS1 RTT Benchmark ~72 Hour Test
Histogram of Results (Click to Expand)
[254992.404633] dolphin_client: INFO - Histogram Of All Latencies (ns)
[254992.403096] <36000 : 0
[254992.403096] [36000, 38000) : 0
[254992.403097] [38000, 40000) : 0
[254992.403097] [40000, 42000) : 0
[254992.403098] [42000, 44000) : 5800135909
[254992.403098] [44000, 46000) : 505
[254992.403098] [46000, 48000) : 0
[254992.403099] [48000, 50000) : 0
[254992.403099] [50000, 60000) : 0
[254992.403100] [60000, 70000) : 0
[254992.403100] [70000, 80000) : 0
[254992.403100] [80000, 100000) : 0
[254992.403101] >100000 : 0
[254992.403102] rts_cpu_isolator: LIGO code is done, calling regular shutdown code
[254992.403203] dolphin_client: INFO - Count was 5800136416, err_cnt: 0
[254992.404522] dolphin_client: INFO - min: 42712 ns, max: 45926 ns, avg: 43587 ns
✅ PASS: Hosts werex2iscex
andx2cdsrfm
. No long transfers.
Bandwidth Test
The question we want answered with this test is, how many IPCs can the MX adapters support and how does that compare with the IX adapters, and how much headroom do we have in the production system.
Long Range vs Short range links
It appears as though the call to clflush_cache_range()
can introduce significant latency on MX adapters.
This is the line where old front-ends would lag on PX adapters. Currently it looks as though the link length might be causing the latency with the flush, as the delay corresponds with the expected propagation delay. Maybe the MX adapters have added something that waits for a response, every so often, as the delay here is only on ~3.5% on flushes.
H1 Production Dolphin IPC Use
IPC Rate | PCIE Total | Max PCIE in Model | RFM Total | Max RFM in Model |
---|---|---|---|---|
65536 | 1 | 1 - h1psldbb
|
0 | NA |
16384 | 188 | 52 - h1lsc
|
66 | 8 - h1alsex, h1alsey
|
4096 | 191 | 60 - h1seiproc
|
18 | 8 - h1seiproc
|
2048 | 139 | 85 - h1asc
|
16 | 16 - h1asc
|
Results Summary MX vs PX vs IX
The test here simulates IPCs Per Flush
number of IPCs every Cycle Time
and measures the time it takes the call to clflush_cache_range()
to return. The maximum number of Dolphin IPCs (65536 Hz) currently on a single DTS1 model is 8, which x2omcpi, x2susetmxpi, x2susetmypi all have, they are all short links.
65536 Hz Short Link
65536 Hz 4Km Link
16384 Hz Short Link
16384 Hz 4Km Link
Raw Data for Above Tests
The raw data and plotting script can be found this repo
Whole Buffer Use Test
I think the issue was caused by clflush_cache_range()
always being called with 64 bytes. We need to make sure we don't flush past the end of the buffer on the last element.
CDSRFM Test
Configuration is to build/run and RFM and iscex/lsc models for MX, just to make sure everything is backwards compatible. The build went fine, but as expected from the bandwidth benchmark there are some IPC errors, ~30 sec for the worst IPC.
In the below screenshot we only expect RFM0 IPCs between x2lsc0 and x2iscex to work. The X2:OMC-ETMX_LOCK_L
appears to be the worst offender having an error about once ever 30 seconds.
The IPCs in the x2pemex
and x2lsc
models also have IPC errors, although they are more rare.
PX and More Testing
These errors were replicated with PX adapters on the same front ends. I added more instrumentation to the CDSRFM (That slows down the copy loops) and errors stopped/became very rare. This is inline with the hypothesis that the slower IX adapters don't have these issues BECAUSE they are slow. Plan is to throttle the CDSRFM so that we don't do into timing glitch territory as described by the above bandwidth tests.
Dolphin DMA Testing
Because the DMA calls taint the kernel module using them in our real-time models may prove impossible. However the test models I have written here are fine with being tainted as they are very simple.
Another complication with using the DMA calls from the the LIGO isolated threads, is that the startDMA
functions call the Linux Schedule()
function. Implemented a worker thread solution that allows the isolated thread to queue the DMA trigger to be done by Linux worker threads that can call Schedule()
.
IX 4K Link, 4 IPCs per DMA Trigger (250 Hz)
[245454.338120] Histogram Of All DMA Latencies (ns)
[245454.338257] min: 42200, max: 259522, mean: 42732
[245454.338391] <43000 : 98659349
[245454.338519] [43000, 44000) : 2746627
[245454.338646] [44000, 45000) : 2533775
[245454.338772] [45000, 48000) : 462553
[245454.338898] [48000, 50000) : 28272
[245454.339023] [50000, 54000) : 10367
[245454.339148] [54000, 58000) : 1742
[245454.339273] [58000, 62000) : 527
[245454.339398] [62000, 68000) : 482
[245454.339523] [68000, 72000) : 326
[245454.339648] [72000, 78000) : 498
[245454.339772] [78000, 82000) : 425
[245454.339897] [82000, 88000) : 375
[245454.340033] >88000 : 1438
IX Short Link, 4 IPCs per DMA Trigger (250 Hz)
[3119557.115838] Histogram Of All DMA Latencies (ns)
[3119557.115976] min: 1347, max: 18470, mean: 3175
[3119557.116118] <1000 : 0
[3119557.116240] [1000, 2000) : 258583
[3119557.116365] [2000, 3000) : 11867
[3119557.116490] [3000, 3500) : 2154
[3119557.116615] [3500, 4000) : 306697
[3119557.116740] [4000, 8000) : 246277
[3119557.116865] [8000, 10000) : 360
[3119557.116990] [10000, 15000) : 255
[3119557.117115] [15000, 20000) : 3
[3119557.117239] >20000 : 0
IX Short Link, 4 IPCs per DMA Trigger (16K Hz)
[3121002.953487] Histogram Of All DMA Latencies (ns)
[3121002.953625] min: 1327, max: 50128, mean: 1743
[3121002.953759] <1000 : 0
[3121002.953884] [1000, 2000) : 18049272
[3121002.954023] [2000, 3000) : 56093
[3121002.954148] [3000, 3500) : 6634
[3121002.954272] [3500, 4000) : 3333
[3121002.954397] [4000, 8000) : 2518
[3121002.954522] [8000, 10000) : 36
[3121002.954646] [10000, 15000) : 29
[3121002.954771] [15000, 20000) : 0
[3121002.954895] >20000 : 3
Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz
Use new CDSRFM Machine with Overview and Expected Changes
dis_diag
reports a maximum payload size of 512 on bother adapters with the new FE.
Max payload size (MPS) : 512
However lspci -vvv
still lists the max payload as 128:
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
Old vs New 65536 Hz 4Km Link
Old vs New 16384 Hz 4Km Link
Old vs New 65536 Hz Short Link
Old vs New 16384 Hz Short Link
Extended the data here a bit, just to see if anything stood out.
True Maximum Payload Size Changes
Because the above shows that while dis_diag
showed a MPS of 512 and lspci
showed 128, more changes were made to try and get lspci
to show a larger MPS.
Configuring the ntb_set_mps
parameter in the dis_px.conf
file and rebooting the systems allowed for a MPS of 512 to be reported on the W-3323 FEs and 256 on the older W-2245 (iscex) machine.
If I set ntb_set_mps=3
(force 256 MSP) and reboot the FEs, both dis_diag
and lspci
show a max payload size of 256. However after the drivers are configured, the node/speed configured, the new PCIe "functions" that show up have a MaxPayload of 128.
19:00.0 Bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: Dolphin Interconnect Solutions AS PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
...
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 128 bytes
...
19:00.1 Intelligent controller [0e80]: Dolphin Interconnect Solutions AS Device 0810
Subsystem: Dolphin Interconnect Solutions AS Device 2810
...
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
Results
Same packet size limits and max bandwidth was recorded with the MPS configured as 256 or 512.