memory corruption in python SWIG
Ndscope developers have found a situation where they can repeatably segfault. Run ndscope from git on the clp2 branch as follows:
python3 -m ndscope --nds lho H1:GRD-ISC_LOCK_STATE_N
This triggers a crash that shows up when:
1 thread is doing an nds2.iterate on H1:GRD-ISC_LOCK_STATE_N 1 thread is doing a background channel listing
The segfault happens when processing the the channel list, accessing the name field.
Running the process under gdb has given the following two partial stack traces:
#0 0x00000000004f2401 in () #1 0x00007fffefd73597 in () at /usr/lib/python3/dist-packages/_nds2.so
This trace was done with a local debug build of the client, hoping to get better information:
#0 0x00000000004f2401 in () #1 0x00007fffefcbb868 in SWIG_FromCharPtrAndSize(char const*, unsigned long) () at /home/jonathan.hanks/Documents/Programming/nds2-client/cmake-build-debug/swig-prefix/src/swig-build/python/python3/_nds2.so #2 0x00007fffefcbb8a5 in SWIG_From_std_string(std::__cxx11::basic_string, std::allocator > const&) () at /home/jonathan.hanks/Documents/Programming/nds2-client/cmake-build-debug/swig-prefix/src/swig-build/python/python3/_nds2.so #3 0x00007fffefcc2ed4 in _wrap_channel_Name () at /home/jonathan.hanks/Documents/Programming/nds2-client/cmake-build-debug/swig-prefix/src/swig-build/python/python3/_nds2.so #4 0x000000000052801a in () #5 0x000000000051d89b in _PyObject_MakeTpCall ()
The crash does not appear to happen when run under valgrind.
I wrote a small test program to try and reproduce the issue in a minimal setting.
#!/usr/bin/env python3 import threading import nds2 server = 'nds.ligo-wa.caltech.edu' port = 31200 can_start = threading.Event() can_stop = threading.Event() class Channel: """Simple channel metadata class Extracts just the info we care about, with a method to update from other instances of the same name. """ def __init__(self, nds_channel): """initialize with an nds channel object""" self.name = nds_channel.name self.online = False self.testpoint = False self.sample_rate = nds_channel.sample_rate if self.sample_rate >= 1: self.sample_rate = int(self.sample_rate) self.update(nds_channel) def update(self, nds_channel): """update metadata from another channel instance of the same name """ assert nds_channel.name == self.name self.online |= nds_channel.channel_type == nds2.channel.CHANNEL_TYPE_ONLINE self.testpoint |= nds_channel.channel_type == nds2.channel.CHANNEL_TYPE_TEST_POINT def wrap(target, name, pre=None, post=None): def f(): global can_start global can_stop # can_start.wait() print('starting {0}'.format(name)) target() print('done with {0}'.format(name)) can_stop.set() return f def find_channels(): channels = {} raw_channels = nds2.find_channels('*', channel_type_mask=nds2.channel.CHANNEL_TYPE_RAW | nds2.channel.CHANNEL_TYPE_ONLINE, hostname=server, port=port) print('nds2.find_channels call completed') for channel in raw_channels: name = channel.name try: channels[name].update(channel) except KeyError: channels[name] = Channel(channel) return channels def read_data(): global can_start global can_stop print('Starting read_data') first = True for buffers in nds2.iterate(['H1:GRD-ISC_LOCK_STATE_N'], hostname=server, port=port): if first: print('Enabling find_channels') can_start.set() first = False elif can_stop.is_set(): print('done') break else: print('.') print('done with read_data') find_thread = threading.Thread(target=wrap(target=find_channels, name='find_channels')) find_thread.start() read_data() find_thread.join()
This is being listed as a nds2 issue as the crashes happen in nds2 client code.