For one project (crawler) we needed many parallel PHP processes to execute a job queue. This led to the creation of some classes I'd like to present here and ask, whether they would be useful to be included somewhere else, e.g.
eZ Components. Possible applications of these classes are all areas, where
many background tasks need to be run, e.g.
gearman tasks, search engine indexing, processing of images and videos,
workflow execution,
continuous integration or mail sending. Does anybody have an idea, what
could be a name (theme) of a component holding these classes?
Batch Runner
The batch runner is nothing more then a sophisticated, while loop. It's given a callback to execute and before every execution it checks, whether a maximum execution time has passed, the PHP internal memory limit is near to exceed or whether the system has enough free memory left. The callback itself can return a boolean to indicate whether
more jobs need to be executed or whether the while loop should take a break and sleep for a while.
Signal handler
The signal handler helps with registering and reacting on POSIX Signals. For every signal one can register multiple callbacks to be executed. The user can then decide, when to call the dispatch method, which executes the callbacks. Our use case is, to register the SIGKILL signal and call the dispatch method before every iteration of the
batch runner. This ensures, that the PHP process can safely be terminated between two job executions.
Forks
A fork is represented by an object, which holds the fork callback, it's parameters and the start and stop time of the fork. This object is passed to the fork runner to either simply run it or to run multiple clones of it. After all forks have been set up, the supervise method of the fork is run, which takes care of the forks and restarts
terminated forks if necessary. Since the fork runner is the place where all children are registered, it's also responsible to forward signals like SIGKILL to all children.
Administration interface
As a simple administrative interface I've implemented a server in which I can login via telnet to issue status requests and which could in the future also be used to start and stop forks. The server class is called from the fork runner loop to check for new connections or new commands.
Callback
This is actually not implemented yet, but I'm thinking about a dedicated class to represent a callback. This class should check, whether a callback is valid, provides a __toString method for the callback for log messages and can be used as interface for methods that require a callback parameter.
daemonization
To run the actual PHP binary, I use the wonderful
daemon tool. So the PHP process can stopp after it has leaked enough memory and daemon will restart it. It also takes care of creating a pid file, the actual daemonization and can be configured to pause for a while before attempting a
restart.
Update: The code is available on
github. It's ment as a proof of concept only for now!
Update: Another neat trick is to include all classes the children may need already in the parent process. This reduces execution time and memory.
Update: I had to reenter this blogpost (copy it from planet-php) thanks to the unreliability of my (now) former hosting provider
dogado. No reason to elaborate on this. Just avoid this company!