> Instead, many reach for multiprocessing, but spawning processes is expensive
Agreed.
> and communicating across processes often requires making expensive copies of data
SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.
Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?
My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:
- Complicates the implementation of the interpreter
- Complicates C extensions, and
- Causes single-threaded code to run slower
Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?
show comments
pjmlp
On the other news, Microsoft dumped the whole faster Python team, apparently the 2025 earnings weren't enough to keep the team around.
Lets see whatever performance improvements still land on CPython, unless other company sponsors the work.
I guess Facebook (no need to correct me on the name) is still sponsoring part of it.
show comments
heybrendan
I am a Python user, but far from an expert. Occasionally, I've used 'concurrent.futures' to kick off running some very simple functions, at the same time.
How are 'concurrent.futures' users impacted? What will I need to change moving forward?
show comments
AlexanderDhoore
Am I the only one who sort of fears the day when Python loses the GIL? I don't think Python developers know what they’re asking for. I don't really trust complex multithreaded code in any language. Python, with its dynamic nature, I trust least of all.
show comments
YouWhy
Hey, I've been developing professionally with Python for 20 years, so wanted to weigh in:
Decent threading is awesome news, but it only affects a small minority of use cases. Threads are only strictly necessary when it's prohibitive to message pass. The Python ecosystem these days includes a playbook solution for literally any such case. Considering the multiple major pitfalls of threads (i.e., locking), they are likely to become a thing useful only in specific libraries/domains and not as a general.
Additionally, with all my love to vanilla Python, anyone who needs to squeeze the juice out of their CPU (which is actually memory bandwidth) has a plenty of other tools -- off the shelf libraries written in native code. (Honorable mention to Pypy, numba and such).
Finally, the one dramatic performance innovation in Python has been async programming - I warmly encourage everyone not familiar with it to consider taking a look.
show comments
MichaelMoser123
cpython doesn't have a JIT, why is free-threaded python a higher priority than developing a just in time compiler? The later would be more resonant with the typical use case for python and benefit a larger portion of users, wouldn't it? (Wouldn't a backend server project use golang or java to begin with?)
show comments
0x000xca0xfe
I know it's just an AI image... but a snake with two tails? C'mon!
show comments
amelius
The snake in the header image appears to have two tail-ends ...
show comments
aitchnyu
Whats currently stopping me (apart from library support) from running a single command that starts up WSGI workers and Celery workers in a single process?
show comments
pawanjswal
This is some serious groundwork for the next era of performance!
make3
I hate how these threads always devolve in insane discussions about why not using threads is better, while most real world people who have tried to do real world speeding up of Python code realize how amazing it would be to have proper threads with shared memory instead of the processes that have so many limitations, like forcing to pickle objects back and forth, & fork so often just not working in the cloud setting, & spawn being so slow in a lot of applications. The usage of processes is just much heavier and less straightforward.
[deleted]
yunnpp
In 2025, 20 years since multi-core is a thing on consumer devices. Great progress, guys. Can't wait for what the Python community has up its sleeves next.
gitroom
[dead]
GarrickDrgn
[flagged]
show comments
EGreg
I thought this was mostly a solved problem.
Fibers
Green threads
Coroutines
Actors
Queues (eg GCD)
…
Basically you need to reason about what your thing will do.
Separate concerns. Each thing is a server (microservice?) with its own backpressure.
They schedule jobs on a queue.
The jobs come with some context, I don’t care if it’s a closure on the heap or a fiber with a stack or whatever. Javascript being single threaded with promises wastefully unwinds the entire stack for each tick instead of saving context. With callbacks you can save context in closures. But even that is pretty fast.
Anyway then you can just load-balance the context across machines. Easiest approach is just to have server affinity for each job. The servers just contain a cache of the data so if the servers fail then their replacements can grab the job from an indexed database. The insertion and the lookup is O(log n) each. And jobs are deleted when done (maybe leaving behind a small log that is compacted) so there are no memory leaks.
Oh yeah and whatever you store durably should be sharded and indexed properly, so practicalkt unlimited amounts can be stored. Availability in a given share is a function of replicating the data, and the economics of it is that the client should pay with credits for every time they access. You can even replicate on demand (like bittorrent re-seeding) to handle spikes.
This is the general framework whether you use Erlang, Go, Python or PHP or whatever. It scales within a company and even across companies (as long as you sign/encrypt payloads cryptographically).
It doesn’t matter so much whether you use php-fpm with threads, or swoole, or the new kid on the block, FrankenPHP. Well, I should say I prefer the shared-nothing architecture of PHP and APC. But in Python, it is the same thing with eg Twisted vs just some SAPI.
You’re welcome.
show comments
sylware
Got myself a shiny python 3.13.3 (ssl module still unable to compile with libressl) replacing a 3.12.2, feels clearly slower.
What's wrong?
show comments
hello_computer
Opting to enable low-level parallelism for user code in an imperative, dynamically typed scripting language seems like regression. It’s less bad for LISP because of the pure-functional nature. It’s less bad for BEAM languages & Clojure due to immutability. It is less bad for C/C++/Rust because you have a stronger type system—allowing for deeper static analysis. For Python, this is “high priests of a low cult” shitting things up for corporate agendas and/or street cred.
p0w3n3d
Look behind! A free-threaded Python!
bgwalter
This is just an advertisement for the company. Fact is, free-threading is still up to 50% slower, the tail call interpreter isn't much faster at all, and free-threading is still flaky.
Things they won't tell you at PyCon.
show comments
henry700
I find it peculiar how, in a language so riddled with simple concurrency architectural issues, the approach is to painstankingly fix every library after fixing the runtime, instead of just using some better language. Why does the community insist on such a bad language when literally even fucking Javascript has a saner execution model?
> Instead, many reach for multiprocessing, but spawning processes is expensive
Agreed.
> and communicating across processes often requires making expensive copies of data
SharedMemory [0] exists. Never understood why this isn’t used more frequently. There’s even a ShareableList which does exactly what it sounds like, and is awesome.
[0]: https://docs.python.org/3/library/multiprocessing.shared_mem...
Does removal of the GIL have any other effects on multi-threaded Python code (other than allowing it to run in parallel)?
My understanding is that the GIL has lasted this long not because multi-threaded Python depends on it, but because removing it:
- Complicates the implementation of the interpreter
- Complicates C extensions, and
- Causes single-threaded code to run slower
Multi-threaded Python code already has to assume that it can be pre-empted on the boundary between any two bytecode instructions. Does free-threaded Python provide the same guarantees, or does it require multi-threaded Python to be written differently, e.g. to use additional locks?
On the other news, Microsoft dumped the whole faster Python team, apparently the 2025 earnings weren't enough to keep the team around.
https://www.linkedin.com/posts/mdboom_its-been-a-tough-coupl...
Lets see whatever performance improvements still land on CPython, unless other company sponsors the work.
I guess Facebook (no need to correct me on the name) is still sponsoring part of it.
I am a Python user, but far from an expert. Occasionally, I've used 'concurrent.futures' to kick off running some very simple functions, at the same time.
How are 'concurrent.futures' users impacted? What will I need to change moving forward?
Am I the only one who sort of fears the day when Python loses the GIL? I don't think Python developers know what they’re asking for. I don't really trust complex multithreaded code in any language. Python, with its dynamic nature, I trust least of all.
Hey, I've been developing professionally with Python for 20 years, so wanted to weigh in:
Decent threading is awesome news, but it only affects a small minority of use cases. Threads are only strictly necessary when it's prohibitive to message pass. The Python ecosystem these days includes a playbook solution for literally any such case. Considering the multiple major pitfalls of threads (i.e., locking), they are likely to become a thing useful only in specific libraries/domains and not as a general.
Additionally, with all my love to vanilla Python, anyone who needs to squeeze the juice out of their CPU (which is actually memory bandwidth) has a plenty of other tools -- off the shelf libraries written in native code. (Honorable mention to Pypy, numba and such).
Finally, the one dramatic performance innovation in Python has been async programming - I warmly encourage everyone not familiar with it to consider taking a look.
cpython doesn't have a JIT, why is free-threaded python a higher priority than developing a just in time compiler? The later would be more resonant with the typical use case for python and benefit a larger portion of users, wouldn't it? (Wouldn't a backend server project use golang or java to begin with?)
I know it's just an AI image... but a snake with two tails? C'mon!
The snake in the header image appears to have two tail-ends ...
Whats currently stopping me (apart from library support) from running a single command that starts up WSGI workers and Celery workers in a single process?
This is some serious groundwork for the next era of performance!
I hate how these threads always devolve in insane discussions about why not using threads is better, while most real world people who have tried to do real world speeding up of Python code realize how amazing it would be to have proper threads with shared memory instead of the processes that have so many limitations, like forcing to pickle objects back and forth, & fork so often just not working in the cloud setting, & spawn being so slow in a lot of applications. The usage of processes is just much heavier and less straightforward.
In 2025, 20 years since multi-core is a thing on consumer devices. Great progress, guys. Can't wait for what the Python community has up its sleeves next.
[dead]
[flagged]
I thought this was mostly a solved problem.
Basically you need to reason about what your thing will do.Separate concerns. Each thing is a server (microservice?) with its own backpressure.
They schedule jobs on a queue.
The jobs come with some context, I don’t care if it’s a closure on the heap or a fiber with a stack or whatever. Javascript being single threaded with promises wastefully unwinds the entire stack for each tick instead of saving context. With callbacks you can save context in closures. But even that is pretty fast.
Anyway then you can just load-balance the context across machines. Easiest approach is just to have server affinity for each job. The servers just contain a cache of the data so if the servers fail then their replacements can grab the job from an indexed database. The insertion and the lookup is O(log n) each. And jobs are deleted when done (maybe leaving behind a small log that is compacted) so there are no memory leaks.
Oh yeah and whatever you store durably should be sharded and indexed properly, so practicalkt unlimited amounts can be stored. Availability in a given share is a function of replicating the data, and the economics of it is that the client should pay with credits for every time they access. You can even replicate on demand (like bittorrent re-seeding) to handle spikes.
This is the general framework whether you use Erlang, Go, Python or PHP or whatever. It scales within a company and even across companies (as long as you sign/encrypt payloads cryptographically).
It doesn’t matter so much whether you use php-fpm with threads, or swoole, or the new kid on the block, FrankenPHP. Well, I should say I prefer the shared-nothing architecture of PHP and APC. But in Python, it is the same thing with eg Twisted vs just some SAPI.
You’re welcome.
Got myself a shiny python 3.13.3 (ssl module still unable to compile with libressl) replacing a 3.12.2, feels clearly slower.
What's wrong?
Opting to enable low-level parallelism for user code in an imperative, dynamically typed scripting language seems like regression. It’s less bad for LISP because of the pure-functional nature. It’s less bad for BEAM languages & Clojure due to immutability. It is less bad for C/C++/Rust because you have a stronger type system—allowing for deeper static analysis. For Python, this is “high priests of a low cult” shitting things up for corporate agendas and/or street cred.
Look behind! A free-threaded Python!
This is just an advertisement for the company. Fact is, free-threading is still up to 50% slower, the tail call interpreter isn't much faster at all, and free-threading is still flaky.
Things they won't tell you at PyCon.
I find it peculiar how, in a language so riddled with simple concurrency architectural issues, the approach is to painstankingly fix every library after fixing the runtime, instead of just using some better language. Why does the community insist on such a bad language when literally even fucking Javascript has a saner execution model?