Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #3614
Closed
Open
Issue created May 16, 2019 by Derek Bruening@derekbrueningContributor

online drcachesim is too slow (8x slower than cachegrind): regression?

This issue is about improving the overall drcachesim online analysis performance. Xref #1738 on optimizing the cache simulation code. Xref #2001 on optimizing the tracer.

Running SPEC2006 mcf on the test input, we are currently very slow: 8x slower than cachegrind in fact (!!). I believe this is a big regression since I recall running this same performance test years ago and being in the 20 second range? Even the basic_counts tool is slow which makes this separate from #1738. I have not yet profiled this: that would be the first step.

This is a 64-bit release build:

$ /usr/bin/time ./mcf ../../data/test/input/inp.in
MCF SPEC CPU2006 version 1.10
Copyright (c) 1998-2000 Zuse Institut Berlin (ZIB)
Copyright (c) 2000-2002 Andreas Loebel & ZIB
Copyright (c) 2003-2005 Andreas Loebel

nodes                      : 5985
active arcs                : 102404
simplex iterations         : 63475
objective value            : 3180065918
new implicit arcs          : 2393292
active arcs                : 2495696
simplex iterations         : 118645
objective value            : 2060055866
erased arcs                : 2387557
checksum                   : 2997477
done
1.58user 0.04system 0:01.70elapsed 95%CPU (0avgtext+0avgdata 159752maxresident)k

$ /usr/bin/time /work/dr/git/build_x64_rel_tests/bin64/drrun -- ./mcf ../../data/test/input/inp.in
<...>
1.66user 0.03system 0:01.71elapsed 98%CPU (0avgtext+0avgdata 162812maxresident)k

$ /usr/bin/time /work/dr/git/build_x64_rel_tests/bin64/drrun -t drcachesim -- ./mcf ../../data/test/input/inp.in
<...>
---- <application exited with code 0> ----
Cache simulation results:
Core #0 (1 thread(s))
  L1I stats:
    Hits:                    3,367,529,621
    Misses:                          1,830
    Invalidations:                       0
    Miss rate:                        0.00%
  L1D stats:
    Hits:                    1,002,703,388
    Misses:                    369,946,321
    Invalidations:                       0
    Prefetch hits:              47,349,279
    Prefetch misses:           322,597,042
    Miss rate:                       26.95%
Core #1 (0 thread(s))
Core #2 (0 thread(s))
Core #3 (0 thread(s))
LL stats:
    Hits:                      356,111,139
    Misses:                     13,837,012
    Invalidations:                       0
    Prefetch hits:             254,939,620
    Prefetch misses:            67,657,422
    Local miss rate:                  3.74%
    Child hits:              4,417,582,288
    Total miss rate:                  0.29%
224.86user 16.13system 3:21.27elapsed 119%CPU (0avgtext+0avgdata 167692maxresident)k

$ /usr/bin/time /work/dr/git/build_x64_rel_tests/bin64/drrun -t drcachesim -simulator_type basic_counts -- ./mcf ../../data/test/input/inp.in
<...>
---- <application exited with code 0> ----
Basic counts tool results:
Total counts:
  3223195651 total (fetched) instructions
         433 total non-fetched instructions
           0 total prefetches
  1178112041 total data loads
   194412943 total data stores
           1 total threads
    28369030 total scheduling markers
           0 total transfer markers
           0 total function id markers
           0 total function return address markers
           0 total function argument markers
           0 total function return value markers
           0 total other markers
Thread 18131 counts:
  3223195651 (fetched) instructions
         433 non-fetched instructions
           0 prefetches
  1178112041 data loads
   194412943 data stores
    28369030 scheduling markers
           0 transfer markers
           0 function id markers
           0 function return address markers
           0 function argument markers
           0 function return value markers
           0 other markers
124.72user 13.88system 1:44.85elapsed 132%CPU (0avgtext+0avgdata 167700maxresident)k

$ /usr/bin/time valgrind --tool=cachegrind -- ./mcf ../../data/test/input/inp.in
==18032== Cachegrind, a cache and branch-prediction profiler
==18032== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==18032== Using Valgrind-3.14.0 and LibVEX; rerun with -h for copyright info
==18032== Command: ./mcf ../../data/test/input/inp.in
==18032== 
--18032-- warning: L3 cache found, using its data for the LL simulation.

<...>
==18032== 
==18032== I   refs:      3,223,256,389
==18032== I1  misses:            1,764
==18032== LLi misses:            1,741
==18032== I1  miss rate:          0.00%
==18032== LLi miss rate:          0.00%
==18032== 
==18032== D   refs:      1,346,785,609  (1,178,206,897 rd   + 168,578,712 wr)
==18032== D1  misses:      437,406,928  (  423,864,952 rd   +  13,541,976 wr)
==18032== LLd misses:       81,227,588  (   73,769,939 rd   +   7,457,649 wr)
==18032== D1  miss rate:          32.5% (         36.0%     +         8.0%  )
==18032== LLd miss rate:           6.0% (          6.3%     +         4.4%  )
==18032== 
==18032== LL refs:         437,408,692  (  423,866,716 rd   +  13,541,976 wr)
==18032== LL misses:        81,229,329  (   73,771,680 rd   +   7,457,649 wr)
==18032== LL miss rate:            1.8% (          1.7%     +         4.4%  )
24.04user 0.05system 0:24.20elapsed 99%CPU (0avgtext+0avgdata 180484maxresident)k
Assignee
Assign to
Time tracking