SIGBUS in thread-private fcache_return
From bruen...@google.com on June 19, 2013 17:19:11
Observed here: https://code.google.com/p/drmemory/issues/detail?id=1265 Showed up as an assert that the stack is not readable: 13:clone -----dynamorio/core/synch.c:194 res 15:pthread_test -----dynamorio/core/synch.c:194 res
The assert fires b/c esp points at the very base of the dstack, so naturally reading at a higher address fails.
This was seen on FC18, and also on FC15.
Here is info from earlier debugging:
Assert is here, while processing the SIGBUS:
Jun 12 15:46:59 #6
0xf71d099c in is_at_do_syscall (dcontext=0x4d4cd180, pc=0x4d551f30 "\234[\211\035\240\321LMj",
Jun 12 15:47:01 esp=0x4d52f000 "") at /home/branden/Repositories/DrMem/drmemory/dynamorio/core/synch.c:196
Jun 12 15:47:02 #7
0xf729f82b in is_sys_kill (dcontext=0x4d4cd180, pc=0x4d551f30 "\234[\211\035\240\321LMj", xsp=0x4d52f000 "",
Jun 12 15:47:04 info=0x4d54ec8c) at /home/branden/Repositories/DrMem/drmemory/dynamorio/core/unix/signal.c:3669
Jun 12 15:47:05 #8
0xf72a119f in master_signal_handler_C (xsp=0x4d54ec7c "EW%\367\a")
Jun 12 15:47:07 at /home/branden/Repositories/DrMem/drmemory/dynamorio/core/unix/signal.c:4217
SIGBUS is hit here:
Jun 12 16:01:35 (gdb) x/10i sc->eip
Jun 12 16:01:36 0x55c69f30: pushf
Jun 12 16:01:38 0x55c69f31: pop %ebx
Jun 12 16:01:39 0x55c69f32: mov %ebx,0x55be51a0
Jun 12 16:01:41 0x55c69f38: push $0x0
Jun 12 16:01:43 0x55c69f3a: popf
Jun 12 16:01:44 0x55c69f3b: mov %eax,0x55be52e4
Jun 12 16:01:45 0x55c69f40: push $0x55be5180
Jun 12 16:01:47 0x55c69f45: call 0xf70d9801
Jun 12 16:01:48 0x55c69f4a: lea 0x4(%esp),%esp
Jun 12 16:01:50 0x55c69f4e: jmp 0xf7255568 <unexpected_return>
That's thread-private fcache_return.
esp is 0x55c47000
/proc/self/maps: Jun 12 16:26:54 55c3a000-55c47000 rw-p 00000000 00:00 0 [stack:21398] Jun 12 16:26:56 55c47000-55c49000 ---p 00000000 00:00 0 [stack:21398]
Jun 12 16:21:30 <bruening_> p /x ((dcontext_t*)0x55be5180)->dstack Jun 12 16:21:51 $9 = 0x55c47000
Jun 12 16:08:01 <bruening_> "p heapmgt->vmheap" Jun 12 16:08:28 (gdb) p heapmgt->vmheap Jun 12 16:08:30 $6 = {start_addr = 0x55be0000 "", end_addr = 0x65be0000 <Address 0x65be0000 out of bounds>, alloc_start = 0x0,
Jun 12 17:02:39 (gdb) p /x sig Jun 12 17:02:40 $23 = 0x7 Jun 12 17:02:42 (gdb) p /x *siginfo Jun 12 17:02:43 $24 = {si_signo = 0x7, si_errno = 0x0, si_code = 0x80, _sifields = {_pad = {0x0, 0x0, Jun 12 17:02:45 0xabababab <repeats 27 times>}, _kill = {si_pid = 0x0, si_uid = 0x0}, _timer = {si_tid = 0x0, Jun 12 17:02:46 si_overrun = 0x0, si_sigval = {sival_int = 0xabababab, sival_ptr = 0xabababab}}, rt = {si_pid = 0x0, Jun 12 17:02:48 si_uid = 0x0, si_sigval = {sival_int = 0xabababab, sival_ptr = 0xabababab}}, sigchld = {si_pid = 0x0, Jun 12 17:02:49 si_uid = 0x0, si_status = 0xabababab, si_utime = 0xabababab, si_stime = 0xabababab}, sigfault = { Jun 12 17:02:51 si_addr = 0x0}, sigpoll = {si_band = 0x0, si_fd = 0x0}}} Jun 12 17:19:03 <bruening> 7 is SIGBUS Jun 12 17:21:37 <bruening> so si_code = 0x80 => just SI_KERNEL, no BUS value Jun 12 17:24:43 <bruening> so take a step back: we got SIGBUS on pushf when esp=0x55c47000 base of normal-looking dstack
Jun 12 18:01:08 this time it happens to be right before 'i=250000' but it has happened at just about every point along execution Jun 12 18:01:27 in relation to 'i=nnn' being printed
(separate run) Jun 12 18:05:34 (gdb) p/x dcontext->next_tag Jun 12 18:05:35 $45 = 0x4c3f6280 Jun 12 18:32:39 (gdb) x/10i 0x4dc56280 Jun 12 18:32:41 0x4dc56280: syscall Jun 12 18:32:42 0x4dc56282: mov %eax,0x4dbd119c Jun 12 18:32:44 0x4dc56287: mov $0xf72d95cc,%eax Jun 12 18:32:45 0x4dc5628c: jmp 0x4dc55ec0 Jun 12 18:36:12 <bruening_> ok this is a different run so DR's mem must cover 0x4dc now
Jun 12 18:40:39 (gdb) p/x dcontext->asynch_target Jun 12 18:40:40 $11 = 0xf77d3425 Jun 12 18:40:42 (gdb) x/5i 0xf77d3425 Jun 12 18:40:43 0xf77d3425 <__kernel_vsyscall+5>: mov $0x2b,%ecx Jun 12 18:40:45 0xf77d342a <__kernel_vsyscall+10>: mov %ecx,%ss Jun 12 18:40:46 0xf77d342c <__kernel_vsyscall+12>: mov %ebp,%ecx Jun 12 18:40:48 0xf77d342e <__kernel_vsyscall+14>: pop %ebp Jun 12 18:40:49 0xf77d342f <__kernel_vsyscall+15>: ret
SIGBUS on pushf of anon mmap makes no sense: a race of some kind where that address was a file mmap at the time?
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=1193