drcachesim -record_heap is *very* slow on windows due to symbol lookup
Debug build:
% /usr/bin/time bin64/drrun -t drcachesim -offline -- suite/tests/bin/common.fib.exe only_5
0.00user 0.01system 0:06.06elapsed 0%CPU (0avgtext+0avgdata 3996maxresident)k
% /usr/bin/time bin64/drrun -t drcachesim -offline -record_function 'fib|1' -- suite/tests/bin/common.fib.exe only_5
0.00user 0.01system 0:19.47elapsed 0%CPU (0avgtext+0avgdata 3996maxresident)k
% /usr/bin/time bin64/drrun -t drcachesim -offline -record_heap -- suite/tests/bin/common.fib.exe only_5
0.00user 0.01system 3:35.25elapsed 0%CPU (0avgtext+0avgdata 3972maxresident)k
Adding timestamps for diagnostics, it looks like the system modules are taking 12s-14s each which really adds up.
This is a known issue for large apps where dbghelp.dll takes a long time to query symbols, but it is a little surprising for smaller system libraries.
Possible solutions:
- Only support dynsym (dr_get_proc_address) by default. Require passing an extra option to do drsym lookups? Xref #4187 (closed) where we may want the same thing for ELF.
- Only do one drsym lookup: today it's doing both mangled and demangled. This would presumably cut the time in half. But it would still be too slow!
- Do a single enumeration of symbols and look for function names for each in func_trace, instead of querying into drsyms for each one, in case the dbghelp lookup is what's slow.
- Add drsymcache.