In this line in __mt_alloc<_Tp>::deallocate()
__bin._M_first[__thread_id] = __tmp->_M_next;
tmp is NULL (and remove==-1). Full stack trace folows:
#0 0xb78c0a09 in __gnu_cxx::__mt_alloc<__gnu_cxx::_Hashtable_node<std::pair<DMI::HIID const, DMI::Record::Field> > >::deallocate (this=0xa1f84234, __p=0x8ee2e58, __n=1) at mt_allocator.h:458 #1 0xb78bff76 in __gnu_cxx::hashtable<std::pair<DMI::HIID const, DMI::Record::Field>, DMI::HIID, __gnu_cxx::hash<DMI::HIID>, std::_Select1st<std::pair<DMI::HIID const, DMI::Record::Field> >, std::equal_to<DMI::HIID>, __gnu_cxx::__mt_alloc<DMI::Record::Field> >::_M_put_node (this=0xa1f84234, __p=0x8ee2e58) at hashtable.h:253 #2 0xb78bf7ac in __gnu_cxx::hashtable<std::pair<DMI::HIID const, DMI::Record::Field>, DMI::HIID, __gnu_cxx::hash<DMI::HIID>, std::_Select1st<std::pair<DMI::HIID const, DMI::Record::Field> >, std::equal_to<DMI::HIID>, __gnu_cxx::__mt_alloc<DMI::Record::Field> >::_M_delete_node (this=0xa1f84234, __n=0x8ee2e58) at hashtable.h:544 #3 0xb78bf254 in __gnu_cxx::hashtable<std::pair<DMI::HIID const, DMI::Record::Field>, DMI::HIID, __gnu_cxx::hash<DMI::HIID>, std::_Select1st<std::pair<DMI::HIID const, DMI::Record::Field> >, std::equal_to<DMI::HIID>, __gnu_cxx::__mt_alloc<DMI::Record::Field> >::clear (this=0xa1f84234) at hashtable.h:953 #4 0xb78becda in ~hashtable (this=0xa1f84234) at hashtable.h:327 #5 0xb78bea67 in ~hash_map (this=0xa1f84234) at /home/oms/LOFAR/Timba/DMI/src/Record.cc:16 #6 0xb78a9342 in ~Record (this=0xa1f84208) at /home/oms/LOFAR/Timba/DMI/src/Record.cc:31 #7 0xb76bddf2 in DMI::CountedRefBase::detach (this=0xa2113958) at /home/oms/LOFAR/Timba/DMI/src/CountedRefBase.cc:377 #8 0xb7644b02 in ~CountedRefBase (this=0xa2113958) at CountedRefBase.h:392 #9 0xb7661a5b in ~CountedRef (this=0xa2113958) at Record.h:179 #10 0xb78f2f1a in ~Message (this=0xa21138d8) at /home/oms/LOFAR/Timba/OCTOPUSSY/src/Message.cc:81 #11 0xb76bddf2 in DMI::CountedRefBase::detach (this=0xb221a430) at /home/oms/LOFAR/Timba/DMI/src/CountedRefBase.cc:377 #12 0xb7644b02 in ~CountedRefBase (this=0xb221a430) at CountedRefBase.h:392 #13 0xb764d191 in ~CountedRef (this=0xb221a430) at /home/oms/LOFAR/Timba/OCTOPython/src/DataConv.cc:450 #14 0xb795101d in Octopussy::MTGatewayWP::readerThread (this=0x8646eb0) at /home/oms/LOFAR/Timba/OCTOPUSSY/src/MTGatewayWP2.cc:276 #15 0xb794bb5d in Octopussy::MTGatewayWP::start_readerThread (pwp=0x8646eb0) at /home/oms/LOFAR/Timba/OCTOPUSSY/src/MTGatewayWP.cc:583 #16 0xb7ef2cfd in start_thread () from /lib/tls/libpthread.so.0 #17 0xb7e5a13e in clone () from /lib/tls/libc.so.6
More details:
- Prev stack frame appears sane:
0xb78bff76 in __gnu_cxx::hashtable<std::pair<DMI::HIID const, DMI::Record::Field>, DMI::HIID, __gnu_cxx::hash<DMI::HIID>, std::_Select1st<std::pair<DMI::HIID const, DMI::Record::Field> >, std::equal_to<DMI::HIID>, __gnu_cxx::__mt_alloc<DMI::Record::Field> >::_M_put_node (this=0xa1f84234, __p=0x8ee2e58) at hashtable.h:253
- This frame is OK, deallocating one node:
#0 0xb78c0a09 in __gnu_cxx::__mt_alloc<__gnu_cxx::_Hashtable_node<std::pair<DMI::HIID const, DMI::Record::Field> > >::deallocate (this=0xa1f84234, __p=0x8ee2e58, __n=1) at mt_allocator.h:458 removed=200, looks like it removes just one too many.
Two threads appear to be detaching themselves from messages. Is it possible that the delete operation needs to be protected by a mutex too? But we can't do that, we're deleting the object (and hence the mutex). But perhaps our ref needs to be cleared before we unlock the mutex? Hmm, here's a thought: what happens if, between unlocking the mutex and deleting the target, somebody tries to do a CountedRefBase::copy() on us? Aha, then refcount gets incremented again and then the target is deleted!!! This is definitely a bug -- but is it THE bug? Fixed, now stress-testing.
