Hi!
I wanted to open up a discussion about concurrency. I really want to stay purely in the laravel ecosystem, even for ai. But this topic makes it feel impossible for a mid-large scale production application.
So the core of the question is how does one handle scaling prism with php being blocking io.
I mean even with quite powerful servers just a few handful of concurrent users would mean all threads would be blocked by long running llm calls?
Am i missing something fundamental in my php understanding here or is everyone just accepting this?
Things i can come up to mitigate is:
- High num of workers per physical thread and keep mem available for each thread low.
- Queue stuff, but then we just end up with a ton of "inefficient" jobs since when they run they will mostly wait.
- Does sse handling with yield open up some sort of non-blocking behavior?
Does any old php Yodas have any tricks up their sleeve to make this cpu bound instead of thread count bound?
Hi!
I wanted to open up a discussion about concurrency. I really want to stay purely in the laravel ecosystem, even for ai. But this topic makes it feel impossible for a mid-large scale production application.
So the core of the question is how does one handle scaling prism with php being blocking io.
I mean even with quite powerful servers just a few handful of concurrent users would mean all threads would be blocked by long running llm calls?
Am i missing something fundamental in my php understanding here or is everyone just accepting this?
Things i can come up to mitigate is:
Does any old php Yodas have any tricks up their sleeve to make this cpu bound instead of thread count bound?