Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #3743
Closed
Open
Issue created Jul 19, 2019 by Administrator@rootContributor

-trace_after_instrs unable to skip ~trillion instructions

Created by: rkgithubs

I'm running NCF deep learning workload https://github.com/hexiangnan/neural_collaborative_filtering and trying to collect offline traces for the Neural Matrix Factorization (NeuMF) python application. Workload initially loads data, I'd like to skip this part in tracing so I'm specifying -trace_after_instrs 1000000M (loading the data takes roughly these many instructions according to perf stat). While drrun without cache sim tool and drrun -t drcachesim with -trace_after_instrs 1000M run successfully, former run (-t drcachesim -trace_after_instrs 1000000M) crashes. Decided to not turn on debug as it might skew the instruction count. We want to start tracing the correct region. It seems with DR each iteration is taking 75sec (w/o DR it takes 44 sec)

a) after skipping 1000M instructions

root@r2d2-hal5000:/home/daenerys/workloads/neural_collaborative_filtering# /home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 1000M -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]1501652038.h5 WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. Hit delay threshold: enabling tracing. Using Theano backend. NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP[64,32,16,8]_1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1) Exiting process after ~1048577095 references.

b) skipping 1T instructions Even though error messages are pointing to Theano code; this program has been run several times without DR and with DR (no drachesim; -trace_after_instrs 1000M) and it ran successfully.

/home/daenerys/tools/DynamoRIO/DynamoRIO-x86_64-Linux-7.90.17998-0/bin64/drrun -t drcachesim -offline -trace_after_instrs 1000000M -exit_after_tracing 1000M -- /home/daenerys/tools/ncf/bin/python /home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py --dataset ml-1m --epochs 20 --batch_size 256 --num_factors 8 --layers [64,32,16,8] --num_neg 4 --lr 0.001 --learner adam --verbose 1 --out 1 --mf_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5 --mlp_pretrain /home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP_[64,32,16,8]1501652038.h5 WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions. Using Theano backend. NeuMF arguments: Namespace(batch_size=256, dataset='ml-1m', epochs=20, layers='[64,32,16,8]', learner='adam', lr=0.001, mf_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5', mlp_pretrain='/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP[64,32,16,8]1501652038.h5', num_factors=8, num_neg=4, out=1, path='Data/', reg_layers='[0,0,0,0]', reg_mf=0, verbose=1) Load data done [64.3 s]. #user=6040, #item=3706, #train=994169, #test=6040 Load pretrained GMF (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_GMF_8_1501651698.h5) and MLP (/home/daenerys/workloads/neural_collaborative_filtering/Pretrain/ml-1m_MLP[64,32,16,8]_1501652038.h5) models done.

You can find the C code in this temporary file: /tmp/theano_compilation_error_QEfVMn ERROR (theano.gof.opt): Optimization failure due to: constant_folding ERROR (theano.gof.opt): node: InplaceDimShuffle{x,x}(TensorConstant{0.5}) ERROR (theano.gof.opt): TRACEBACK: ERROR (theano.gof.opt): Traceback (most recent call last): File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node replacements = lopt.transform(node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding no_recycling=[], impl=impl) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 955, in make_thunk no_recycling) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 858, in make_c_thunk output_storage=node_output_storage) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1217, in make_thunk keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1157, in compile keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory key=key, lnk=self, keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key module = lnk.compile_cmodule(location) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule preargs=preargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2396, in compile_str (status, compile_stderr.replace('\n', '. '))) Exception: ('Compilation failed (return status=255): Fatal error: failed to write trace. ', '[InplaceDimShuffle{x,x}(TensorConstant{0.5})]')

^CTraceback (most recent call last): File "/home/daenerys/workloads/neural_collaborative_filtering/NeuMF.py", line 202, in (hits, ndcgs) = evaluate_model(model, testRatings, testNegatives, topK, evaluation_threads) File "/home/daenerys/workloads/neural_collaborative_filtering/evaluate.py", line 48, in evaluate_model (hr,ndcg) = eval_one_rating(idx) File "/home/daenerys/workloads/neural_collaborative_filtering/evaluate.py", line 63, in eval_one_rating batch_size=100, verbose=0) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/engine/training.py", line 1177, in predict self._make_predict_function() File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/engine/training.py", line 735, in _make_predict_function **kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 667, in function return Function(inputs, outputs, updates=updates, **kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/keras/backend/theano_backend.py", line 653, in init **kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function.py", line 317, in function output_keys=output_keys) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 486, in pfunc output_keys=output_keys) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1839, in orig_function name=name) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/compile/function_module.py", line 1519, in init optimizer_profile = optimizer(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 108, in call return self.optimize(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 97, in optimize ret = self.apply(fgraph, *args, **kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 251, in apply sub_prof = optimizer.optimize(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 97, in optimize ret = self.apply(fgraph, *args, **kwargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2540, in apply sub_prof = gopt.apply(fgraph) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2143, in apply nb += self.process_node(fgraph, node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/opt.py", line 2034, in process_node replacements = lopt.transform(node) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/tensor/opt.py", line 6516, in constant_folding no_recycling=[], impl=impl) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 955, in make_thunk no_recycling) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/op.py", line 858, in make_c_thunk output_storage=node_output_storage) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1217, in make_thunk keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1157, in compile keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1624, in cthunk_factory key=key, lnk=self, keep_lock=keep_lock) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 1189, in module_from_key module = lnk.compile_cmodule(location) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cc.py", line 1527, in compile_cmodule preargs=preargs) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/gof/cmodule.py", line 2351, in compile_str p_out = output_subprocess_Popen(cmd) File "/home/daenerys/tools/ncf/local/lib/python2.7/site-packages/theano/misc/windows.py", line 80, in output_subprocess_Popen out = p.communicate() File "/usr/lib/python2.7/subprocess.py", line 800, in communicate return self._communicate(input) File "/usr/lib/python2.7/subprocess.py", line 1417, in _communicate stdout, stderr = self._communicate_with_poll(input) File "/usr/lib/python2.7/subprocess.py", line 1471, in _communicate_with_poll ready = poller.poll() KeyboardInterrupt

Assignee
Assign to
Time tracking