Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #5036
Closed
Open
Issue created Aug 04, 2021 by Abhinav Anil Sharma@abhinav92003Contributor

Expand A64 SVE scatter/gather memory access instructions

ARM has a rich set of SVE instructions1. For clients that need to instrument memrefs (e.g. drcachesim), we need to expand them to scalar loads and stores, like what we did for x86 scatter/gather in #2985 (closed). These instructions have many variants; I'm summarising my understanding below. 1 and the Scalable Vector Extension of the Arm manual2 have a more detailed discussion.

There are multiple ways in which the memory address can be specified. These are all predicated loads/stores, meaning that an element may be active or inactive, based on special predicate registers. They have either the LD1* or ST1* prefix

  • Scalar + immediate: For contiguous access. Memory address is generated by a 64-bit scalar base and immediate index.
  • Scalar + scalar: For contiguous access. Memory address is generated by a 64-bit scalar base and scalar index which is added to the base address.
  • Scalar + vector: For possible non-contiguous access, also known as gather load/scatter store. Memory addresses are generated by a 64-bit scalar base plus vector index.
  • Vector + immediate: For possible non-contiguous access, also known as gather load/scatter store. Memory addresses are generated by a vector base plus immediate index.

There are variants with different element sizes (unsigned double-word; signed and unsigned byte, halfword, word).

Faults for non-active elements are always suppressed. There are different load instruction variants based on how faults for active elements are treated: besides the usual, each of the above has a “first fault” (faults only for first active element) and “non fault” variants.

For “scalar plus scalar” and “scalar plus immediate” load instructions, there are variants that allow reading contiguous 2/3/4 elements, each to the same element number in 2/3/4 vector registers. These have LDN* or STN* prefix, where N=2/3/4.

There are also some un-predicated instructions (LDR and STR) that use the "scalar + immediate" scheme to load/store vectors or predicate registers.

The x86 scatter/gather that we handled in #2985 (closed) is the "scalar + vector" variant with regular faulting behaviour. More work will be required to adapt drx_expand_scatter_gather to these other variants.

For the contiguous access variants, we could model them as a single memory address with a larger size. But this is not a correct model, because each element can be active/inactive based on the predicate register, so the memory addresses that end up being accessed can be non-contiguous. It'll be correct to model them as scatter/gather, using multiple element-sized accesses.

Assignee
Assign to
Time tracking