I mentioned in my previous post a lock contention issue when accessing the same list from different threads, leading to worse performance the more threads were used. Turns out this is fixed in Python 3.14.0a7 already!
Unfortunately this version doesn't (yet) seem available from uv
(which is why I hadn't originally tried it), but it can be downloaded and installed manually from the Python website (click Customize at the installation step and check the free-threading box). On Mac OS, the interpreter binaries can then be found at:
/Library/Frameworks/Python.framework/Versions/3.14/bin/python3
/Library/Frameworks/PythonT.framework/Versions/3.14/bin/python3t
With this version we observe much ...