advLigoRTS issueshttps://git.ligo.org/cds/software/advligorts/-/issues2024-03-21T21:17:50Zhttps://git.ligo.org/cds/software/advligorts/-/issues/620Add a dolphin time xmit indicator to IOP GDS_TP screen.2024-03-21T21:17:50ZEzekiel DohmenAdd a dolphin time xmit indicator to IOP GDS_TP screen.advligorts 5.2.0Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/619daqd int64bit suport2024-03-20T15:40:35ZJonathan Hanksdaqd int64bit suportThe frame building code has this in it. You can specify 64int, but not actually use it in the daqd.
<code>
case _64bit_integer: {
abort( );
}
</code>The frame building code has this in it. You can specify 64int, but not actually use it in the daqd.
<code>
case _64bit_integer: {
abort( );
}
</code>Jonathan HanksJonathan Hankshttps://git.ligo.org/cds/software/advligorts/-/issues/618The RCG should emit more structured data about the model's IPCs and filter mo...2024-03-12T22:04:11ZEzekiel DohmenThe RCG should emit more structured data about the model's IPCs and filter modules, so we don't need as many one-off scripts.`librts` scans a header file to find filter information, and this breaks every so often because we change that headers structure.
We also have to scan medm or build files to list receivers when we remove an IPC sender.
The `.ini` refl...`librts` scans a header file to find filter information, and this breaks every so often because we change that headers structure.
We also have to scan medm or build files to list receivers when we remove an IPC sender.
The `.ini` reflects the state of the slow and fast data channels for a model, and IPCs/filters should emit a similar file.advligorts 5.2.0Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/617The CFC stateword bit is a logical or of `.ini` and filter file changes.2024-03-12T18:32:42ZEzekiel DohmenThe CFC stateword bit is a logical or of `.ini` and filter file changes.This is probably not specific enough, as `.ini` changes are rare (changed model) where as filter pole changes are more common and not as much a concern. Suggest spiting the status into at least two separate bits, so we can raise a more v...This is probably not specific enough, as `.ini` changes are rare (changed model) where as filter pole changes are more common and not as much a concern. Suggest spiting the status into at least two separate bits, so we can raise a more visible warning for `.ini` changes.https://git.ligo.org/cds/software/advligorts/-/issues/616State-space Support Design Issue2024-03-12T17:15:45ZEzekiel DohmenState-space Support Design Issue## Loading and Tracking the state of the part/matrices.
Do we have to use SDF? Would that display even work for matrices?
For current matrix parts we generate EPICS channels at build time of the model, however we would like dynamic siz...## Loading and Tracking the state of the part/matrices.
Do we have to use SDF? Would that display even work for matrices?
For current matrix parts we generate EPICS channels at build time of the model, however we would like dynamic sizing for the state-space configuration. So it seems like we would have to "compress" data into a known set of channels. Ex. string channel for matrix: `[0, 1, 5; 5, 3, 2; 0, 0, 0]`?
### Open Questions
- Does EPICS have to be the interface by which we configure the states/configuration of the part?
- Does EPICS have to be the interface by which we read out the state of the state-space parts?
- Do we need to ramp up/down output for reloading of the state-space part?
- Do we need specific control over when a new/reloaded state-space part engages/takes effect?Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/615System name '#define' breaks librts filter lookup.2024-03-08T20:41:17ZErik von ReisSystem name '#define' breaks librts filter lookup.From Chris Wipf:
| [This change](https://git.ligo.org/cds/software/advligorts/-/merge_requests/613) breaks the filter module detection code in `librts/scripts/buildMapFromHeader.py`From Chris Wipf:
| [This change](https://git.ligo.org/cds/software/advligorts/-/merge_requests/613) breaks the filter module detection code in `librts/scripts/buildMapFromHeader.py`https://git.ligo.org/cds/software/advligorts/-/issues/614LLO IPC RFM Errors, and generally Dolphin BW limits as measured by the BW ben...2024-03-12T17:48:53ZEzekiel DohmenLLO IPC RFM Errors, and generally Dolphin BW limits as measured by the BW benchmark.The bandwidth benchmark suggests that we can send 4 IPCs both ways every 16 us. That's 1 IPC every 4 us, or a total of 262144 one way, 524288 both ways.
## LLO IPCs
### After March 5 Rate Decrease
| Direction | Num IPCs | IPCs/se...The bandwidth benchmark suggests that we can send 4 IPCs both ways every 16 us. That's 1 IPC every 4 us, or a total of 262144 one way, 524288 both ways.
## LLO IPCs
### After March 5 Rate Decrease
| Direction | Num IPCs | IPCs/sec | Avg. time for IPC (us)
|-----------|------------|----------|------------------------------------|
|CS -> EX | 30 | 253952 | 3.938 |
|CS -> EY | 30 | 253952 | 3.938 |
|EX -> CS | 29 | 303104 | 3.299 |
|EY -> CS | 27 | 356352 | 2.806 |
##### CS <-> EX total: 557056
##### CS <-> EY total: 610304
### Pre Rate Decrease
| Direction | Num IPCs | IPCs/sec | Avg. time for IPC (us)
|-----------|------------|----------|------------------------------------|
|CS -> EX | 30 | 253952 | 3.938 |
|CS -> EY | 30 | 253952 | 3.938 |
|EX -> CS | 29 | 389120 | 2.570 |
|EY -> CS | 27 | 356352 | 2.806 |
##### CS <-> EX total: 643072
##### CS <-> EY total: 610304
## LHO IPCs
| Direction | Num IPCs | IPCs/sec | Avg. time for IPC (us)
|-----------|------------|----------|------------------------------------|
|CS -> EX | 21 | 180224 | 5.549 |
|CS -> EY | 21 | 180224 | 5.549 |
|EX -> CS | 29 | 413696 | 2.417 |
|EY -> CS | 29 | 413696 | 2.417 |
##### CS <-> EX total: 593920
##### CS <-> EY total: 593920
## LLO IPC Errors
```
IPC L1:FEC-25_IPC_CAL_EX_PCAL_RXPD_OUT_RFM in l1oaf had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-25_IPC_PEM_EX_2_OAF_VEA_MAG_X in l1oaf had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-25_IPC_SUS_ETMX_CAL_CS_L2_LINE in l1oaf had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-25_IPC_SUS_ETMX_CAL_CS_L3_LINE in l1oaf had last error at 2024-03-11 09:34:54 UTC
At IPC 400 out of 1122
At IPC 500 out of 1122
IPC L1:FEC-117_IPC_CAL_EX_PCAL_RXPD_OUT_RFM in l1calcs had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-117_IPC_CAL_EX_PCAL_TXPD_OUT_RFM in l1calcs had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-117_IPC_SUS_ETMX_CAL_CS_L1_LINE in l1calcs had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-117_IPC_SUS_ETMX_CAL_CS_L2_LINE in l1calcs had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-117_IPC_SUS_ETMX_CAL_CS_L3_LINE in l1calcs had last error at 2024-03-11 09:34:54 UTC
At IPC 600 out of 1122
IPC L1:FEC-19_IPC_SBR_ASC_CHARD_PIT_CTL in l1asc had last error at 2024-03-08 21:50:11 UTC
IPC L1:FEC-19_IPC_SBR_ASC_CHARD_YAW_CTL in l1asc had last error at 2024-03-08 21:50:11 UTC
IPC L1:FEC-19_IPC_SBR_ASC_DHARD_PIT_CTL in l1asc had last error at 2024-03-08 21:50:11 UTC
IPC L1:FEC-19_IPC_SBR_ASC_DHARD_YAW_CTL in l1asc had last error at 2024-03-08 21:50:11 UTC
At IPC 700 out of 1122
IPC L1:FEC-10_IPC_ISCEX_ASC_TR_B_SUM_RFM in l1lsc had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-10_IPC_ISCEX_LSC_IR_TRAIR_RFM in l1lsc had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-10_IPC_ISCEX_LSC_ALSPDH_REFL_CTRL in l1lsc had last error at 2024-03-11 09:34:54 UTC
IPC L1:FEC-8_IPC_PEM_ETMX_PI_MUX in l1omc had last error at 2024-03-11 09:34:54 UTC
At IPC 800 out of 1122
At IPC 900 out of 1122
At IPC 1000 out of 1122
IPC L1:FEC-100_IPC_IOP_SUS_EY_ETMY_WDIPC in l1iopseiey had last error at 2024-03-11 23:16:20 UTC
```Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/613when 1pps is not synchronized, report cycle of rising edge2024-02-29T21:08:24ZErik von Reiswhen 1pps is not synchronized, report cycle of rising edgeCurrently if the rising edge of 1pps is not in cycle 0 or 1, then an error is raised, but we should report when the rising edge is seen. It'll be useful for debugging purposes. Also, if the error is small and stable, some users may cho...Currently if the rising edge of 1pps is not in cycle 0 or 1, then an error is raised, but we should report when the rising edge is seen. It'll be useful for debugging purposes. Also, if the error is small and stable, some users may choose to ignore it.https://git.ligo.org/cds/software/advligorts/-/issues/612Dolphin DMA testing and PCIE IPCs to FE local models.2024-02-07T18:59:46ZEzekiel DohmenDolphin DMA testing and PCIE IPCs to FE local models.It appears as though writing to the (PIO) read pointer is safe (as Dolphin does this for the DMA use case). It may be possible to make it so PCIE (Network local) IPCs can be written/read from models on the same FE if we write to the read...It appears as though writing to the (PIO) read pointer is safe (as Dolphin does this for the DMA use case). It may be possible to make it so PCIE (Network local) IPCs can be written/read from models on the same FE if we write to the read and write Dolphin pointers.
Need to decide if we still want this functionality then add to models and verify.advligorts 5.2.0Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/611`fltrConst` Does not have all bits defined.2024-02-05T19:35:40ZEzekiel Dohmen`fltrConst` Does not have all bits defined.Need to add ramping and SDF diff bits to `fltrConst`, and clean up the use of magic numbers around setting the SWSTAT EPICs var.
### Commits Adding SWSTAT
https://git.ligo.org/cds/software/advligorts/-/commit/c50d76d47027940589ae51f16...Need to add ramping and SDF diff bits to `fltrConst`, and clean up the use of magic numbers around setting the SWSTAT EPICs var.
### Commits Adding SWSTAT
https://git.ligo.org/cds/software/advligorts/-/commit/c50d76d47027940589ae51f16c6c4d6734b75383Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/610DAC TX Dropout2024-02-01T17:33:35ZEzekiel DohmenDAC TX DropoutThere is an issue in the RTS where long cycles in (only) a usermodel driving a DAC will cause the IOP to drive the DAC to 0 for the remainder of the second.
I observed high CPU max times on the usermodel corresponding with the zeroed d...There is an issue in the RTS where long cycles in (only) a usermodel driving a DAC will cause the IOP to drive the DAC to 0 for the remainder of the second.
I observed high CPU max times on the usermodel corresponding with the zeroed data.
## Test Setup
A LIGO DAC and 20-bit DAC looped back into a ADC. I added a delay part, and trigger a delay during the diaggui capture .
![image](/uploads/5fffcafac654d517e70df331d16fbedb/image.png)
![image](/uploads/09cfcf0c666c751326e5640b785d521e/image.png)
In the above pictures we can see both DACs are affected by the introduced time glitch.
## Code Issues
#### `src/include/drv/iop_dac_functions.c:162`
```c
/// - -- Determine if memory block has been set with the correct
/// cycle count by control app.
if ( ioMemData->iodata[ mm ][ ioMemCntrDac ].cycle == ioClockDac )
{
dacEnable |= pBits[ card ];
}
else
{
dacEnable &= ~( pBits[ card ] );
dacChanErr[ card ] += 1;
}
... //In the above code, if we ever miss data from the usermodel, the IOP
... //increments the dacChanErr count. And in the code below, if dacChanErr
... //is positive, we zero out the DAC's output.
/// - ---- Read DAC output value from shared memory and reset
/// memory to zero
if ( ( !dacChanErr[ card ] ) && ( iopDacEnable ) )
{
dac_out = ioMemData->iodata[ mm ][ ioMemCntrDac ] .data[ chan ];
/// - --------- Zero out data in case user app dies by next
/// cycle when two or more apps share same DAC module.
ioMemData->iodata[ mm ][ ioMemCntrDac ].data[ chan ] = 0;
}
else
{
dac_out = 0;
status = 1;
}
```
So, when does `dacChanErr` get reset? Once a second...
`src/fe/controllerIop.c:1235`
```c
// *****************************************************************
/// \> Cycle 21, Update ADC/DAC status to EPICS.
// *****************************************************************
if ( hkp_cycle == HKP_ADC_DAC_STAT_UPDATES )
{
pLocalEpics->epicsOutput.ovAccum = overflowAcc;
feStatus |= adc_status_update( &adcinfo );
feStatus |= dac_status_update( &dacinfo );
// pLocalEpics->epicsOutput.fe_status = NORMAL_RUN;
}
```
`src/include/drv`
```c
LIGO_INLINE int
dac_status_update( dacInfo_t* dacinfo )
{
...
dacChanErr[ jj ] = 0;
...
```Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/609Add explicit "excitation" modifier to tpchn file and other testpoint metadata.2024-01-22T22:11:18ZErik von ReisAdd explicit "excitation" modifier to tpchn file and other testpoint metadata.Workstations need to know which channels are excitation.
Currently, this is done by the restricting excitations to certain channel number ranges: 0-10k, 20k-30k. This is needlessly restricting, and an overload of the channel number t...Workstations need to know which channels are excitation.
Currently, this is done by the restricting excitations to certain channel number ranges: 0-10k, 20k-30k. This is needlessly restricting, and an overload of the channel number that makes code changes difficult.
Instead, lets add an explicit "excitation" option to the channels, and not limit the channel numbers.https://git.ligo.org/cds/software/advligorts/-/issues/608Dlphin MX Adapter Testing2024-03-04T17:42:35ZEzekiel DohmenDlphin MX Adapter Testing# Test Roadmap
| Test | Complete? | Pass? | Issues Found |
| --------------------------- | --------- | ------------------ | ---------- |
| 4Km DTS1 RTT Benchmark | Yes | :white_check_mark: | Non...# Test Roadmap
| Test | Complete? | Pass? | Issues Found |
| --------------------------- | --------- | ------------------ | ---------- |
| 4Km DTS1 RTT Benchmark | Yes | :white_check_mark: | None |
| Bandwidth Test | Yes | :white_check_mark: | BW is limited by link length, ~same for IX/MX |
| Whole buffer use test | Yes | :white_check_mark: | Need to make sure we don't `cflush()` past buffer |
| CDSRFM Test | Yes | :x: | Already see IPC errors with 2 FEs |
| Reboot Glitch Testing | No | :grey_question: | Reconfigure HW when done with LR testing |
| Removing Dolphin Drivers Panics Kernel | No | :grey_question: | Looks like it does, checkout a bit more. |
### Other Configurations/Setups to Try
| Change/Test | Complete? | Pass? | Issues Found |
| --------------------------- | --------- | ------------------ | ---------- |
| Dolphin DMA Testing | Yes | :x: | GPL taint, bad max latencies (~25X mean) |
| Swap CDSRFM machine to faster FE | Yes | :x: | Maxes get a bit worse, when outside of RT compatible constraints. |
| Remove fiber rate limit on 4K PCIe bust extender | Yes | :x: | Used with the new FE, no measurable difference. |
| Use the new Adnaco (16 lane?) fiber bus extender | No | :x: | |
# Hardware
- `x2cdsrfm` - Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
- Switched to a `Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz` for the 'new FE' tests
- `x2lsc` - Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz
- `x2iscex` - Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz
- `x2cdsrfm` <-> `x2iscex` Is the 4Km link.
- `x2cdsrfm` <-> `x2lsc` Is the short link.
# 4Km DTS1 RTT Benchmark ~72 Hour Test
<details>
<summary> Histogram of Results (Click to Expand) </summary>
```
[254992.404633] dolphin_client: INFO - Histogram Of All Latencies (ns)
[254992.403096] <36000 : 0
[254992.403096] [36000, 38000) : 0
[254992.403097] [38000, 40000) : 0
[254992.403097] [40000, 42000) : 0
[254992.403098] [42000, 44000) : 5800135909
[254992.403098] [44000, 46000) : 505
[254992.403098] [46000, 48000) : 0
[254992.403099] [48000, 50000) : 0
[254992.403099] [50000, 60000) : 0
[254992.403100] [60000, 70000) : 0
[254992.403100] [70000, 80000) : 0
[254992.403100] [80000, 100000) : 0
[254992.403101] >100000 : 0
[254992.403102] rts_cpu_isolator: LIGO code is done, calling regular shutdown code
[254992.403203] dolphin_client: INFO - Count was 5800136416, err_cnt: 0
[254992.404522] dolphin_client: INFO - min: 42712 ns, max: 45926 ns, avg: 43587 ns
```
</details>
> :white_check_mark: **PASS:** Hosts were `x2iscex` and `x2cdsrfm`. No long transfers.
# Bandwidth Test
The question we want answered with this test is, how many IPCs can the MX adapters support and how does that compare with the IX adapters, and how much headroom do we have in the production system.
## Long Range vs Short range links
It appears as though the call to `clflush_cache_range()` can introduce significant latency on MX adapters.
This is the line where old front-ends would lag on PX adapters. Currently it looks as though the link length might be causing the latency with the flush, as the delay corresponds with the expected propagation delay. Maybe the MX adapters have added something that waits for a response, every so often, as the delay here is only on ~3.5% on flushes.
#### H1 Production Dolphin IPC Use
| IPC Rate | PCIE Total | Max PCIE in Model | RFM Total | Max RFM in Model |
| ---------- | ----- | ---------------------- | ----- | ----------------------- |
| 65536 | 1 | 1 - `h1psldbb` | 0 | NA |
| 16384 | 188 | 52 - `h1lsc` | 66 | 8 - `h1alsex, h1alsey` |
| 4096 | 191 | 60 - `h1seiproc` | 18 | 8 - `h1seiproc` |
| 2048 | 139 | 85 - `h1asc` | 16 | 16 - `h1asc` |
## Results Summary MX vs PX vs IX
The test here simulates `IPCs Per Flush` number of IPCs every `Cycle Time` and measures the time it takes the call to `clflush_cache_range()` to return. The maximum number of Dolphin IPCs (65536 Hz) currently on a single DTS1 model is ***8***, which x2omcpi, x2susetmxpi, x2susetmypi all have, they are all short links.
### 65536 Hz Short Link
![image](/uploads/2187a6c4dc4a574e8471b6f953fe4c20/image.png)
### 65536 Hz 4Km Link
![image](/uploads/1dc561e2b9ee785f70a67100a17e05af/image.png)
### 16384 Hz Short Link
![image](/uploads/dab7afbc6079cad3051da65db8536b44/image.png)
### 16384 Hz 4Km Link
![image](/uploads/4e4352af21576a8a8ca97b3d6d2078a1/image.png)
### Raw Data for Above Tests
The raw data and plotting script can be found this [repo](https://git.ligo.org/ezekiel.dohmen/benchmark_results/-/tree/main/Dolphin/2023_01_IX_vs_MX?ref_type=heads)
# Whole Buffer Use Test
I think the issue was caused by `clflush_cache_range()` always being called with 64 bytes. We need to make sure we don't flush past the end of the buffer on the last element.
# CDSRFM Test
Configuration is to build/run and RFM and iscex/lsc models for MX, just to make sure everything is backwards compatible. The build went fine, but as expected from the bandwidth benchmark there are some IPC errors, ~30 sec for the worst IPC.
In the below screenshot we only expect RFM0 IPCs between x2lsc0 and x2iscex to work. The `X2:OMC-ETMX_LOCK_L` appears to be the worst offender having an error about once ever 30 seconds.
The IPCs in the `x2pemex` and `x2lsc` models also have IPC errors, although they are more rare.
### PX and More Testing
These errors were replicated with PX adapters on the same front ends. I added more instrumentation to the CDSRFM (That slows down the copy loops) and errors stopped/became very rare. This is inline with the hypothesis that the slower IX adapters don't have these issues BECAUSE they are slow. Plan is to throttle the CDSRFM so that we don't do into timing glitch territory as described by the above bandwidth tests.
![image](/uploads/45436b33336658548e2ee17376da64ff/image.png)
# Dolphin DMA Testing
Because the DMA calls taint the kernel module using them in our real-time models may prove impossible. However the test models I have written [here](https://git.ligo.org/ezekiel.dohmen/dma_dolphin) are fine with being tainted as they are very simple.
Another complication with using the DMA calls from the the LIGO isolated threads, is that the `startDMA` functions call the Linux `Schedule()` function. Implemented a worker thread solution that allows the isolated thread to queue the DMA trigger to be done by Linux worker threads that can call `Schedule()`.
### IX 4K Link, 4 IPCs per DMA Trigger (250 Hz)
```
[245454.338120] Histogram Of All DMA Latencies (ns)
[245454.338257] min: 42200, max: 259522, mean: 42732
[245454.338391] <43000 : 98659349
[245454.338519] [43000, 44000) : 2746627
[245454.338646] [44000, 45000) : 2533775
[245454.338772] [45000, 48000) : 462553
[245454.338898] [48000, 50000) : 28272
[245454.339023] [50000, 54000) : 10367
[245454.339148] [54000, 58000) : 1742
[245454.339273] [58000, 62000) : 527
[245454.339398] [62000, 68000) : 482
[245454.339523] [68000, 72000) : 326
[245454.339648] [72000, 78000) : 498
[245454.339772] [78000, 82000) : 425
[245454.339897] [82000, 88000) : 375
[245454.340033] >88000 : 1438
```
### IX Short Link, 4 IPCs per DMA Trigger (250 Hz)
```
[3119557.115838] Histogram Of All DMA Latencies (ns)
[3119557.115976] min: 1347, max: 18470, mean: 3175
[3119557.116118] <1000 : 0
[3119557.116240] [1000, 2000) : 258583
[3119557.116365] [2000, 3000) : 11867
[3119557.116490] [3000, 3500) : 2154
[3119557.116615] [3500, 4000) : 306697
[3119557.116740] [4000, 8000) : 246277
[3119557.116865] [8000, 10000) : 360
[3119557.116990] [10000, 15000) : 255
[3119557.117115] [15000, 20000) : 3
[3119557.117239] >20000 : 0
```
### IX Short Link, 4 IPCs per DMA Trigger (16K Hz)
```
[3121002.953487] Histogram Of All DMA Latencies (ns)
[3121002.953625] min: 1327, max: 50128, mean: 1743
[3121002.953759] <1000 : 0
[3121002.953884] [1000, 2000) : 18049272
[3121002.954023] [2000, 3000) : 56093
[3121002.954148] [3000, 3500) : 6634
[3121002.954272] [3500, 4000) : 3333
[3121002.954397] [4000, 8000) : 2518
[3121002.954522] [8000, 10000) : 36
[3121002.954646] [10000, 15000) : 29
[3121002.954771] [15000, 20000) : 0
[3121002.954895] >20000 : 3
```
# Use new CDSRFM Machine with `Intel(R) Xeon(R) W-3323 CPU @ 3.50GHz`
### Overview and Expected Changes
`dis_diag` reports a maximum payload size of 512 on bother adapters with the new FE.
```
Max payload size (MPS) : 512
```
However `lspci -vvv` still lists the max payload as 128:
```
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
```
### Old vs New 65536 Hz 4Km Link
![image](/uploads/89a89101691272d15cb57f7f09a63ad5/image.png)
### Old vs New 16384 Hz 4Km Link
![image](/uploads/8567aa7727c56a22ab2dc5019021aa28/image.png)
### Old vs New 65536 Hz Short Link
![image](/uploads/198264be5683756b0c5de6da34623d2e/image.png)
### Old vs New 16384 Hz Short Link
Extended the data here a bit, just to see if anything stood out.
![image](/uploads/47ac3801eee485c36692dade786a6878/image.png)
# True Maximum Payload Size Changes
Because the above shows that while `dis_diag` showed a MPS of 512 and `lspci` showed 128, more changes were made to try and get `lspci` to show a larger MPS.
Configuring the `ntb_set_mps` parameter in the `dis_px.conf` file and rebooting the systems allowed for a MPS of 512 to be reported on the W-3323 FEs and 256 on the older W-2245 (iscex) machine.
If I set `ntb_set_mps=3` (force 256 MSP) and reboot the FEs, both `dis_diag` and `lspci` show a max payload size of 256. However after the drivers are configured, the node/speed configured, the new PCIe "functions" that show up have a MaxPayload of 128.
```
19:00.0 Bridge: PLX Technology, Inc. PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: Dolphin Interconnect Solutions AS PEX 8732 32-lane, 8-Port PCI Express Gen 3 (8.0 GT/s) Switch
...
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 128 bytes
...
19:00.1 Intelligent controller [0e80]: Dolphin Interconnect Solutions AS Device 0810
Subsystem: Dolphin Interconnect Solutions AS Device 2810
...
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 4096 bytes
```
#### Results
Same packet size limits and max bandwidth was recorded with the MPS configured as 256 or 512.Ezekiel DohmenEzekiel Dohmenhttps://git.ligo.org/cds/software/advligorts/-/issues/607segfault in awgtpman2024-01-17T22:29:14ZErik von Reissegfault in awgtpmanIn RPC service. It looks possibly like a double-free.
segfault is in libc_free, buried under rpcStartServer call to svc_run.
Either this is a bug in RPC, or we've erroneously freed a pointer that we don't own.
Problem occurred when t...In RPC service. It looks possibly like a double-free.
segfault is in libc_free, buried under rpcStartServer call to svc_run.
Either this is a bug in RPC, or we've erroneously freed a pointer that we don't own.
Problem occurred when testing filters in AWG that were too big, i.e. too many second order sections.https://git.ligo.org/cds/software/advligorts/-/issues/606nds1 changes datablock size near s-trend frame/live data boundary2023-12-20T19:34:16ZJonathan Hanksnds1 changes datablock size near s-trend frame/live data boundaryLooking into issues with reading NDS1 s-trends near the end of the last s-trend frame.
When trending 15 channels (all 5 s-trends of 3 dac kill channells) we have seen nds client errors, that have been traced to the server changing the s...Looking into issues with reading NDS1 s-trends near the end of the last s-trend frame.
When trending 15 channels (all 5 s-trends of 3 dac kill channells) we have seen nds client errors, that have been traced to the server changing the size of the datablock being sent back. When using 3 other channels (PEM, captured via edcu) this is not seen.
<pre>
I got a recording of a session where I ask for 15 channels (all 5 trends of 3 DACKILL channels). What I see it this (output by a script I wrote to deconstruct the data stream).
block = Block(type=<BlockType.DataBlock: (0,)>, blen=84, secs=1, gps=1387053872, nano=0, seqnum=0, data=b'@4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10A\xa0\x00\x00A\xa0\x00\x00@4\x00\x00\x00\x00\x00\x00?\xf0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x01\x00\x00\x00\x01?\xf0\x00\x00\x00\x00\x00\x00?\xf0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00\x01\x00\x00\x00\x01?\xf0\x00\x00\x00\x00\x00\x00')
block = Block(type=<BlockType.DataBlock: (0,)>, blen=28, secs=1, gps=1387053873, nano=0, seqnum=1, data=b'@4\x00\x00\x00\x00\x00\x00\x00\x00\x00\x10A\xa0\x00\x00A\xa0\x00\x00@4\x00\x00\x00\x00\x00\x00')
Notice the difference in data length (also reflected in the blen field [block length]). The client is erroring out on this difference and just doesn't have a good vocabulary for expressing this. The daqd/nds sent data for all the seconds that I requested, all other blocks where at the smaller [wrong] data length
</pre>
This doesn't seem to happen when older data is requested. This hints that it is an issue at a boundary between frame and in-memory data.Jonathan HanksJonathan Hankshttps://git.ligo.org/cds/software/advligorts/-/issues/605awgtpman crash when clearing test points2023-12-12T02:17:09ZErik von Reisawgtpman crash when clearing test pointsAwgtpman crashes and restarts when testpoints are cleared.
A debug version EJ installed on X1 gave the following message:
```Dec 11 10:58:24 x1pxex rts_awgtpman_exec[43834]: /opt/rtcds/rtscore/ej/advligorts/srcawgtpman: ../../src/svc...Awgtpman crashes and restarts when testpoints are cleared.
A debug version EJ installed on X1 gave the following message:
```Dec 11 10:58:24 x1pxex rts_awgtpman_exec[43834]: /opt/rtcds/rtscore/ej/advligorts/srcawgtpman: ../../src/svc.c:167: __xprt_do_unregister: Assertion `xprt != NULL' failed.```
That's an error in TRPC.Erik von ReisErik von Reishttps://git.ligo.org/cds/software/advligorts/-/issues/604We should add a way for the CDSRFM to verify adapter to IFO leg mapping, befo...2023-12-08T20:14:13ZEzekiel DohmenWe should add a way for the CDSRFM to verify adapter to IFO leg mapping, before it forwards IPCs around.advligorts 5.2.0https://git.ligo.org/cds/software/advligorts/-/issues/603Add timing locked check when using LIGO PCI timing card.2023-12-08T20:13:27ZEzekiel DohmenAdd timing locked check when using LIGO PCI timing card.Code is not checking the timing locked bit, and should check/raise error when unlocked.
We think we we should be able to test this on the test stand.Code is not checking the timing locked bit, and should check/raise error when unlocked.
We think we we should be able to test this on the test stand.advligorts 5.2.0https://git.ligo.org/cds/software/advligorts/-/issues/602Add support for Dolphin MX drivers2023-12-01T17:23:53ZKeith ThorneAdd support for Dolphin MX driversAdd support for using Dolphin MX drivers. Most work has been done - dolphin-mx-debian repository created, DolphinMXSource and DolphinMX builds in Jenkins to create the ligo-dolphin-mx-* packages
Remaining work is to modify build of dol...Add support for using Dolphin MX drivers. Most work has been done - dolphin-mx-debian repository created, DolphinMXSource and DolphinMX builds in Jenkins to create the ligo-dolphin-mx-* packages
Remaining work is to modify build of dolphin-proxy-km to add ligo-dolphin-mx to listKeith ThorneKeith Thorne2023-12-04https://git.ligo.org/cds/software/advligorts/-/issues/601Hilight disconnected, not init, not found, dropped on the SDF screen.2023-11-28T19:04:08ZEzekiel DohmenHilight disconnected, not init, not found, dropped on the SDF screen.Disconnected and dropped are CA_SDF only.
Suggested red box around the elements when in unexpected states.Disconnected and dropped are CA_SDF only.
Suggested red box around the elements when in unexpected states.Ezekiel DohmenEzekiel Dohmen