Skip to content

Dolphin Issues on IX/PX Deb10/11

Overview

As a general rule Debian 11 works better with PX and Debian 10 works better with IX.

Debian 11 Debian 10
PX 5.19 Working on V4 and W22s
V1 High Max Latency
Segment ID issue workaround found.
Dolphin Bug: cannot create more than one segment, so latency test does not include cdsrfm
PX 5.20 Working on V4 and W22s
V1 High Max Latency
Segment ID issue workaround found.
Dolphin Bug: cannot create more than one segment, so latency test does not include cdsrfm
IX 5.20 V1 High Max Latency Untested
IX 5.19 One way communication issues, expect 5.20 to fix Working in production

Open Issues

  • Debian 10 with PX on the large test stand, still has some transient errors on DTS0
    • Not a blocker as Debian 10/PX is not a planned configuration.

Debian 10 PX Workaround

Workaround using segments PCIE->0, RFM_CS->1, RFM_EX->2, RFM_EY->3. This depletes most of our segment IDs, 4/4 corner station and 2/4 on end stations, but allows us to run.

V1 Max Latency Issue

On Debian 11 machines V1's experience a transient huge max latency that was causing timing over dolphin issued on DTS1. Debian 10 does not have this issue.

Configurations

PX 5.19

Dolphin Benchmark Debian 11

x2lsc0 <-> x2seih16

[68670.595965] Count was 891104862, err_cnt: 0
[68670.595965] min: 2219 ns, max: 14607 ns, avg: 2625 ns
[68670.595966] Histogram Of All Latencies (ns)
[68670.595966] <2000 : 0
[68670.595966] [2000, 4000) : 891086829
[68670.595967] [4000, 5000) : 8738
[68670.595967] [5000, 6000) : 1826
[68670.595967] [6000, 7000) : 1314
[68670.595968] [7000, 8000) : 1305
[68670.595968] [8000, 9000) : 1451
[68670.595969] [9000, 10000) : 1238
[68670.595969] [10000, 11000) : 686
[68670.595969] [11000, 12000) : 568
[68670.595970] [12000, 13000) : 486
[68670.595970] [13000, 14000) : 326
[68670.595970] [14000, 15000) : 93
[68670.595971] [15000, 16000) : 0
[68670.595971] [16000, 17000) : 0
[68670.595971] [17000, 18000) : 0
[68670.595972] [18000, 20000) : 0
[68670.595972] [20000, 22000) : 0
[68670.595972] [22000, 25000) : 0
[68670.595973] [25000, 30000) : 0
[68670.595973] >30000 : 0

Dolphin Deb 10 Bug

The PX drivers have a bug where you cannot create

Dolphin Benchmark Debian 10

x2lsc0 <-> x2seih16

ligo-dolphin-px-srcdis/unstable,now 5.19.2-2 all

[ 8319.599174] Count was 1123301708, err_cnt: 0
[ 8319.599175] min: 2199 ns, max: 5420 ns, avg: 2436 ns
[ 8319.599175] Histogram Of All Latencies (ns)
[ 8319.599176] <2000 : 0
[ 8319.599176] [2000, 4000) : 1123272300
[ 8319.599177] [4000, 5000) : 29403
[ 8319.599177] [5000, 6000) : 3
[ 8319.599177] [6000, 7000) : 0
[ 8319.599177] [7000, 8000) : 0
[ 8319.599178] [8000, 9000) : 0
[ 8319.599178] [9000, 10000) : 0
[ 8319.599178] [10000, 11000) : 0
[ 8319.599179] [11000, 12000) : 0
[ 8319.599179] [12000, 13000) : 0
[ 8319.599179] [13000, 14000) : 0
[ 8319.599180] [14000, 15000) : 0
[ 8319.599180] [15000, 16000) : 0
[ 8319.599180] [16000, 17000) : 0
[ 8319.599180] [17000, 18000) : 0
[ 8319.599181] [18000, 20000) : 0
[ 8319.599181] [20000, 22000) : 0
[ 8319.599181] [22000, 25000) : 0
[ 8319.599181] [25000, 30000) : 0
[ 8319.599182] >30000 : 0

Dolphin benchmark records max round trip latency of ~15 us, suggesting one way MAX latency of ~7.5 us. The RCG code suggests an expected max latency of ~5 us with previous versions.

PX 5.20

No cdsrfm round trip times

[ 6038.500263] min: 2204 ns, max: 11409 ns, avg: 2417 ns
[ 6038.500263] Histogram Of All Latencies (ns)
[ 6038.500263] <2000 : 0
[ 6038.500264] [2000, 4000) : 39372536
[ 6038.500264] [4000, 5000) : 315
[ 6038.500265] [5000, 6000) : 83
[ 6038.500265] [6000, 7000) : 60
[ 6038.500265] [7000, 8000) : 51
[ 6038.500266] [8000, 9000) : 55
[ 6038.500266] [9000, 10000) : 40
[ 6038.500266] [10000, 11000) : 8
[ 6038.500267] [11000, 12000) : 1
[ 6038.500267] [12000, 13000) : 0
[ 6038.500267] [13000, 14000) : 0
[ 6038.500268] [14000, 15000) : 0
[ 6038.500268] [15000, 16000) : 0
[ 6038.500268] [16000, 17000) : 0
[ 6038.500269] [17000, 18000) : 0
[ 6038.500269] [18000, 20000) : 0
[ 6038.500269] [20000, 22000) : 0
[ 6038.500270] [22000, 25000) : 0
[ 6038.500270] [25000, 30000) : 0
[ 6038.500270] >30000 : 0

With cdsrfm running

[ 6522.464323] Count was 39371649, err_cnt: 0
[ 6522.464324] min: 2204 ns, max: 12399 ns, avg: 2417 ns
[ 6522.464324] Histogram Of All Latencies (ns)
[ 6522.464325] <2000 : 0
[ 6522.464325] [2000, 4000) : 39371069
[ 6522.464326] [4000, 5000) : 290
[ 6522.464326] [5000, 6000) : 74
[ 6522.464326] [6000, 7000) : 56
[ 6522.464327] [7000, 8000) : 46
[ 6522.464327] [8000, 9000) : 60
[ 6522.464327] [9000, 10000) : 41
[ 6522.464328] [10000, 11000) : 9
[ 6522.464328] [11000, 12000) : 1
[ 6522.464328] [12000, 13000) : 1
[ 6522.464329] [13000, 14000) : 0
[ 6522.464329] [14000, 15000) : 0
[ 6522.464329] [15000, 16000) : 0
[ 6522.464330] [16000, 17000) : 0
[ 6522.464330] [17000, 18000) : 0
[ 6522.464330] [18000, 20000) : 0
[ 6522.464330] [20000, 22000) : 0
[ 6522.464331] [22000, 25000) : 0
[ 6522.464331] [25000, 30000) : 0
[ 6522.464331] >30000 : 0

IX

Debian 11 Small Test Stand

[80549.898008] Count was 86863792, err_cnt: 0
[80549.898009] min: 2239 ns, max: 14379 ns, avg: 3149 ns
[80549.898010] Histogram Of All Latencies (ns)
[80549.898010] <2000 : 0
[80549.898011] [2000, 4000) : 86861947
[80549.898011] [4000, 5000) : 985
[80549.898012] [5000, 6000) : 191
[80549.898012] [6000, 7000) : 93
[80549.898013] [7000, 8000) : 113
[80549.898013] [8000, 9000) : 119
[80549.898014] [9000, 10000) : 115
[80549.898014] [10000, 11000) : 105
[80549.898015] [11000, 12000) : 80
[80549.898015] [12000, 13000) : 35
[80549.898016] [13000, 14000) : 6
[80549.898016] [14000, 15000) : 1
[80549.898017] [15000, 16000) : 0
[80549.898017] [16000, 17000) : 0
[80549.898018] [17000, 18000) : 0
[80549.898018] [18000, 20000) : 0
[80549.898019] [20000, 22000) : 0
[80549.898019] [22000, 25000) : 0
[80549.898020] [25000, 30000) : 0
[80549.898020] >30000 : 0

Edited by Ezekiel Dohmen