To add on to this, it wasn't obvious to me at first but I think you're right and this is connected to #441 (closed). Figured I'd tag it here so both issues have a record of each other.
It will saturate the Sentry quota. If the client is automatically recovering, then at least we should catch and ignore this error with a FIXME comment.
I modified the error callback function in !949 (merged), I'm waiting to close this issue though to make sure the error is now showing up in the log but not triggering an error in sentry.
Turns out this is still coming from the igwn alert client, not from the kafka producer bootstep. @deep.chatterjee do you have any idea why the try except you added before isn't working anymore? I'm wondering if it actually never worked and we just didn't notice. For example, the last instance of Timeout from kafka that I see in gwcelery-worker.log is 2022-09-02, 3 days after !907 (merged) was merged.
I should have checked before that these problems were coming from the kafka bootstep. Should we revert !949 (merged) since it's coming from the igwn alert client?
Sorry I realized after posting I hadn't stated why I know it's not coming from both, and was editing when you responded.
I confirmed it's only coming from the IGWN alert client by grepping the logs for _TIMED_OUT on playground. The only log file that has these is gwcelery-worker.log, and I noticed that all of the lines I could see came from the IGWNReceiverThread. I then removed all of those with grep -r '_TIMED_OUT' gwcelery-worker.log | grep -v 'IGWNReceiverThread'. The last line of that was adc.errors.KafkaException: Error communicating with Kafka: code=_TIMED_OUT GroupCoordinator: kb-0.prod.hop.scimma.org:9092: 99 request(s) timed out: disconnect (after 60045ms in state UP). The last time the pattern adc.errors.KafkaException showed up in the log was 2022-09-02, way before !852 (merged) was merged.
Okay, I'm currently working on both fixing the versioned filename issue and the updates you wanted in !945 (merged), do you have a minute to prepare a MR for this? If not I'll do it once I finish these