Overhaul the multi-process scan architecture
Created by: Zapotek
Remove the current multi-process scan code (that nobody uses anyway) and replace it with generic, all-in-one worker processes.
Architecture
Architecture should be similar to the BrowserCluster
but with processes instead of threads.
-
Use method(:my_handler)
callbacks rather thanproc{}
s to help out the GC.-
proc
closures retain their env and we'll need to store a lot of callbacks.
-
-
Take advantage of copy-on-write so preload as much data as possible prior to forking. -
Use Arachni::RPC
for communication.- Use UNIX sockets when available, otherwise TCP/IP.
- Disable SSL.
- Disable compression.
-
Maybe have Dispatchers expose workers. - This will allow multiple machines to share one scan's workload when setup in Grid mode.
- Similar to the existing multi-process system but much more efficient.
-
Should auto-scale by using #695 (closed).
Responsibilities
The workers should perform actions like:
-
HTML/XML parsing. - Can cause 100% usage when parsing very large documents, thus blocking the scan.
- The
Trainer
will massively benefit from this, since it does a lot of parsing during page audits. - Should also perform the subsequent handling of the parsed document and send back the result instead of sending back the parsed document, otherwise there's no point to it.
-
Arachni::Support::Signature
processing.- Signature generation, refinement and matching can cause 100% CPU usage when dealing with very large data sets, thus blocking the scan.
-
Manage browser processes. - The system is already launching Ruby life-line processes to ensure that PhantomJS processes don't zombie-out if the parent process disappears for whatever reason.
- Since we're gonna have the workers, let them deal with that as well to keep the amount of overall processes to a minimum.