[Feature request] Allow MAX_FRAGMENT_SIZE to take on values greater than USHRT_MAX
Created by: andrewbartolo
Hi again,
I've been developing a fairly heavy-duty instrumentation client that, among other things, wraps and dumps all memory load and store instructions, à la memtrace_simple. Recently, I compiled the SPEC CPU2017-rate suite for x86-64 with gcc 10.2 at -O3
, and saw the following on cam4_r
, parest_r
, and wrf_r
:
Basic block or trace instrumentation exceeded maximum size. Try lowering -max_bb_instrs and/or -max_trace_bbs.
I ran DR with -debug -loglevel 3
, and discovered that, as expected, gcc was doing (an admittedly incredible amount of) inlining and loop unrolling to produce a basic block w/instrumented size exceeding 64KiB.
So, it seems like alternatives at this point are either to lower -max_bb_instrs/-max_trace_bbs
, which entails missing a large and relevant portion of the workload, or to increase MAX_FRAGMENT_SIZE
(and probably increase -max_bb_instrs/-max_trace_bbs
after that) in order to support larger fragments.
Currently, MAX_FRAGMENT_SIZE is maxing out the size of the ushort fragment size field. I took a rudimentary stab at changing this and all related fields from ushort to uint, by just grepping through the codebase. I've attached a patch that attempts to do this, at least for just Linux: max_fragment_size_diff.txt. There's something wrong, though, as the workloads progress well past the point where they asserted out previously, but throw a meta-instruction fault assertion later on.
On one hand, this is a pretty heavyweight use case. On the other hand, the trend of aggressive inlining probably isn't going away, and two extra bytes per fragment (?) for a larger size field seems like a small amount of overhead.
Any thoughts? And happy to help any way I can. Thanks!