Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • A arachni
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 125
    • Issues 125
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 8
    • Merge requests 8
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • Arachni - Web Application Security Scanner Framework
  • arachni
  • Issues
  • #221
Closed
Open
Issue created Jul 02, 2012 by Administrator@rootContributor

Intra-grid/Inter-process communication should use keep-alive and multiplexing

Created by: Zapotek

The ArachniRPC protocol was designed to be light-weight and simple in order to aid integration with 3rd party systems. It basically uses 1 socket per call in order not to require multiplexing and make it very simple to be implemented by anyone with access to a serializer (usually YAML since it's multi-platform) and TLS/SSL sockets.

And that's good, that should remain the 3rd-party-facing interface -- i.e. for Dispatchers and simple and master Instances.

However, communication between a master and its slaves is hidden from the user and could use the boost of a more complex and performance-oriented protocol. So, the existing protocol should be amended by adding a high-performance mode which will utilize a binary serializer (most likely Marshal), single connection per master-slave and message multiplexing.

This isn't strictly necessary yet but the distributed crawling algorithm (#207 (closed)) will make good use of it since it will hugely benefit from a super-fast and extra-lightweight (both in size and init/tear-down of messages and connections) RPC protocol as path distribution will require tens or hundreds of thousands of RPC calls.

And since I got going, I might as well mention this too:

Even though the Ruby (MRI) dudes got their heads straight and mapped Ruby threads 1:1 to OS threads, there still is the Global-Interpreter-Lock (GIL) which only schedules one thread at time. And even if they did provide proper threading, because we're using a single-threaded, async, singleton HTTP interface, proper threads would mean very little to us.

And since workload distribution and message-passing has already been implemented for the Grid, we already have a nice and clean IPC system in place which basically allows parallelism via Ruby Processes -- which are proper OS processes and can thus run on multiple cores and CPUs. The ability to truly and easily parallelize scans (even on single machines) will be a huge asset when we get JS integration (#50 (closed)), which will require some serious processing power.

You can go even further with this and have Grid slaves spawn local slave Instances for themselves, now that would be cool.

Assignee
Assign to
Time tracking