Intra-grid/Inter-process communication should use keep-alive and multiplexing
Created by: Zapotek
The ArachniRPC protocol was designed to be light-weight and simple in order to aid integration with 3rd party systems. It basically uses 1 socket per call in order not to require multiplexing and make it very simple to be implemented by anyone with access to a serializer (usually YAML since it's multi-platform) and TLS/SSL sockets.
And that's good, that should remain the 3rd-party-facing interface -- i.e. for Dispatchers and simple and master Instances.
However, communication between a master and its slaves is hidden from the user and could use the boost of a more complex and performance-oriented protocol. So, the existing protocol should be amended by adding a high-performance mode which will utilize a binary serializer (most likely Marshal), single connection per master-slave and message multiplexing.
This isn't strictly necessary yet but the distributed crawling algorithm (#207 (closed)) will make good use of it since it will hugely benefit from a super-fast and extra-lightweight (both in size and init/tear-down of messages and connections) RPC protocol as path distribution will require tens or hundreds of thousands of RPC calls.
And since I got going, I might as well mention this too:
Even though the Ruby (MRI) dudes got their heads straight and mapped Ruby threads 1:1 to OS threads, there still is the Global-Interpreter-Lock (GIL) which only schedules one thread at time. And even if they did provide proper threading, because we're using a single-threaded, async, singleton HTTP interface, proper threads would mean very little to us.
And since workload distribution and message-passing has already been implemented for the Grid, we already have a nice and clean IPC system in place which basically allows parallelism via Ruby Processes -- which are proper OS processes and can thus run on multiple cores and CPUs. The ability to truly and easily parallelize scans (even on single machines) will be a huge asset when we get JS integration (#50 (closed)), which will require some serious processing power.
You can go even further with this and have Grid slaves spawn local slave Instances for themselves, now that would be cool.