Overview of the problem

Well, it seems whenever a lot of publishing is going on, python eventually SEGVs somewhere in garbage collection and/or heap allocation. Obviously something in the heap is corrupted. As of 13/12/05, I have tried:

1. Running with valgrind. No errors detected apart from a bunch of Addr4 errors in PyObject_Free() and PyObject_Realloc(), these are apparently normal. See http://pxr.openlook.org/pxr/source/Misc/README.valgrind for an explanation.

2. Building an Aug 30 2005 version of Timba. Things seemed stable then, but no longer.

3. Added tests of conversion on the kernel side, but nothing fails (apart from the regular valgrind errors mentioned in 1.)

4. Went through the entire OCTOPython code and cleaned up and commented all reference handling. Found a potential leak and a potential under-ref problem with None, but that didn't help. Commit of 13/12/05.

13/12/05: Rebuilding the toolchain

As of 13/12/05: I'm going to rebuild the entire toolchain from scratch, with gcc-3.4. See ./RebuildingPythonNotes for config/build details.

13/12/05: didn't help. Will now rebuild Python with debugging support, and try to valgrind the problem.

13/12/05: crashes in sip. Going back to Qt-3.3.2; sip-4.1.1 and pyqt-3.13.

14/12/05: with the above setup the browser lasts longer but still falls over eventually. Will try to create a full-debug build of the toolchain (Qt-3.3.5, sip-4.3.2, pyqt-3.15.1):

This aborts when a kernel connects, see ./AbortLog. QCustomEvent() is up in the stack somewhere, will look into it later.

Now, let's try a debug build with Qt-3.3.2, sip-4.1.1 and PyQt-3.13.

Next step: trying the same with QScintilla-1.65 built from source:

Further steps: three things to explore.

1: Try a global install of the new qscintilla:

2: Run browser build against python-2.3.5-debug and older sip/pyqt with valgrind (addrcheck tool) and look where the heap is corrupted. Result: no crash yet but it runs too slow, if the bug in 2.2 is the cause, then valgrind may upset the thread timing so much it will never hit the bug...

3. Run browser build against python-2.3.5-debug and newer sip/pyqt with valgrind, and try to figure out what the pyqt problem is, because then we can report it at least. Check if maybe it is only present with Debian's qt-3.3.4 and not qt-3.3.5??

OlegSmirnov/PythonCrashes (last edited 2005-12-28 08:47:57 by OlegSmirnov)