Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #5063
Closed
Open
Issue created Aug 27, 2021 by Derek Bruening@derekbrueningContributor

rare interruptions between instrumentation commit point and app instr execution cause tool inaccuracies

Most tools execute their added instrumentation prior to the execution of the application instruction being observed. If an interruption arrives in between the instrumentation and app execution, the tool can think the app instruction did execute, and when control resumes at the same execution it can double-count the instruction or mess up its state or whatever its instrumentation focus happens to be. This is a general issue with the instrumentation and its app instruction not being one atomic unit.

For always-asynchronous signals, DR always delays delivery until after a block finishes executing. And for synchronous signals, the interrupted instruction is re-executed, so this problem never shows up for "normal" signals. It could happen for a not-normally-asynchronous signal sent asynchronously: e.g., a SIGSEGV sent via SYS_kill. (Some parts of DR's signal code look for is_sys_kill() but whether to deliver now does not, today.)

This can also happen with DR relocating a thread: if its SIGUSR2 arrives in between instru and app. Here, there the client can refuse to relocate.

The same interruptions could happen at any point during a multi-instruction instrumentation sequence and not just right in between the end of the instrumentation and the app instr. For some clients this is fine since they don't "commit" their observation until the final couple of instructions: e.g., a tracing tool not updating its buffer pointer until the end.

For the signal case: DR still needs to call the client's restore_state event. The client could conceivably roll back its instru actions, if it can tell whether it already executed them. It could decode the raw cache instructions to figure this out though this gets hacky and not always foolproof; or if we implement i#3801 it could look at its own IR metadata.

The action items here are to document this more clearly, and perhaps implement handling in our provided tool restore_state events? This adds complexity though. We could look into whether we could disallow this from ever happening: add the is_sys_kill() check (which for modern Linux kernels should be solid) to record_pending_signal(), and for relocation we can have DR detect this and re-try -- though that may lead to a lot of re-tries for heavy-instrumentation clients; most signals are likely to arrive in the middle of instrumentation.

Assignee
Assign to
Time tracking