ASSERT: Race/crash within DR dispatch when performing unlink_flush during app thread creation

Created by: nextsilicon-itay-bookstein

Describe the bug I wrote a piece of minimally adaptive instrumentation for indirect branches. To adapt to dynamically discovered indirect branch targets, this implementation calls dr_unlink_flush_region (I also tried delay_flush) from a clean-call, flushing the fragment to which it's going to return so that my instrumentation code will be dynamically reconstructed with the newly discovered target.

A series of spurious crashes in the fragment unlink flow, application register corruptions, and other fun things led me to try and debug the problem and narrow the repro down to something more minimal, and I traced it down to the usage of dr_unlink_flush_region at the time that the application is creating a lot of threads. Because it was flaky/racy/non-deterministic, I tried to force/stress the problem by unlinking much more aggresively (I had a high threshold before deciding to unlink).

In addition, I tried using a Debug DR build from most recent master. The assert I encountered is this:

Internal Error: DynamoRIO debug check failure: ../core/dispatch.c:757 wherewasi == DR_WHERE_FCACHE || wherewasi == DR_WHERE_TRAMPOLINE || wherewasi == DR_WHERE_APP || (dcontext->go_native && wherewasi == DR_WHERE_DISPATCH)

Adding a print revealed that the relevant values for this assert were as follows:

wherewasi = 2 (DR_WHERE_DISPATCH), dcontext->go_native = 0

When I tried to use delay_flush instead of unlink_flush I encountered this assert:

Internal Error: DynamoRIO debug check failure: ../core/vmareas.c:9502 false && "stale multi-init entry on frags list"

Because both the application and the DR client are reasonably complex, I tried to narrow a minimal repro down to a tiny DR client and a tiny application. The tiny application simply creates a lot of threads, each calling printf 100 times. The client simply calls dr_unlink_flush_region from a clean-call out of every indirect call and indirect jump, once every few times that the clean-call happens. I had to toy with the thread count and the unlink_flush call ratio to get at a good deterministic repro. I've attached the code for the repro.

The attached tar.gz file contains build.sh, CMakeLists.txt, src/repro_client.c and src/repro_app.c. Note that build.sh nukes <script_dir>/build and re-creates it by invoking cmake. unlink_flush_repro.tar.gz

A plain drrun with the provided client and the provided app should trigger the issue. I haven't tested it on multiple machines to ensure that there's no dependence on core count or anything like that.

I can potentially try to debug this further, but at this point I thought asking here would be a good idea :)

Expected behavior Application should successfully run to completion (albeit slowly).

Versions

What version of DR are you using? Top of master branch from day of posting this
What operating system version are you running on? Debian GNU/Linux 10 (buster)
Is your application 32-bit or 64-bit? 64-bit