Use LDAPR on AArch64 when available to speed up synchronization code
The LDAPR instruction, added in FEAT_LRCPC in v8.3 (or v8.2 with RCpc extension), is faster than LDAR and yet has enough guarantees for most use cases. We should look into using this in DR's atomic sequences when the underlying processor supports it.