HANG: A pending signal is not handled after unmasking leading to a hang in some contexts
Created by: afarallax
A brief description If a signal is delivered to a thread where it is masked, and it becomes a pending signal, then under DynamoRIO the signal is not handled after it becomes unmasked.
Reproduction Below is a test program with comments and a description of its behavior.
/*******************************************************************************
* The signal deadlock issue reproduction test.
*
* Compiling:
* g++ -std=c++14 -g3 -Og -o signal_hang signal_hang.cpp -pthread
*
* Usage:
* signal_hang <suspend delay> <alarm timer in seconds>
*
* Reproduction command:
* dynamorio/bin64/drrun -- ./signal_hang 1 30
*
* If the hang is not reproduced, increase the suspend delay parameter.
* There is an alarm timer aborting the program in case of hang.
*
* The test description:
* At some point there is synchronized state of both threads 1 and 2.
* After it the thread1 sends the signal 30 to the thread2, while this signal
* is masked. The thread2 has a delay and then calls the sigsuspend() syscall
* with an empty mask, so it should get any signal.
* If the syscall is starting to execute and apply the new empty
* signal mask, and then the signal is received by the thread2, there is no
* problem with the deadlock.
* But if the signal is received before the sigsuspend syscall unlock it,
* it is not taken after the unlock, and the thread remains in the suspended
* state endlessly.
* The correct behaviour expected: the signal comes to the thread2, becomes
* pending, and it should be handled after the unlock, so that the thread
* resumes.
*
******************************************************************************/
#include <pthread.h>
#include <semaphore.h>
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <inttypes.h>
#include <assert.h>
sem_t thread1, thread2;
volatile size_t g_delay = 0;
////////////////////////////////
//
static void sig_handler(int signum)
{
assert(signum == 30);
}
static void sig_alarm(int signum)
{
assert(signum == SIGALRM);
printf("*** Alarm signal was received\n*** Abort.\n");
exit(1);
}
////////////////////////////////
//
static void delay()
{
auto tmp = g_delay;
g_delay *= 1000000;
while (g_delay--) {}
g_delay = tmp;
}
static void* test_thread(void* ptr)
{
stack_t sigstack;
sigstack.ss_sp = malloc(SIGSTKSZ);
sigstack.ss_size = SIGSTKSZ;
sigstack.ss_flags = 0;
if (sigaltstack(&sigstack, NULL) == -1)
{
printf("*** sigaltstack() failed. Abort.\n");
exit(2);
}
sigset_t mask;
sigemptyset(&mask);
sigaddset(&mask, 30);
if (pthread_sigmask(SIG_SETMASK, &mask, NULL) != 0)
{
printf("*** pthread_sigmask() failed. Abort.\n");
exit(2);
}
sem_post(&thread2);
sem_wait(&thread1);
delay();
// We expect to have the pending signal 30 at this point.
sigemptyset(&mask);
sigsuspend(&mask); // <------------------- Here the thread 2 hangs.
sem_post(&thread2);
return NULL;
}
////////////////////////////////
//
int main(int argc, char **argv)
{
if (argc != 3)
{
printf("*** Usage: signal_hang <suspend delay> <alarm timer in seconds>\n Abort.\n");
exit(3);
}
g_delay = (size_t) strtoull(argv[1], NULL, 0); // Some undefined time units.
uint32_t alarmCount = (uint32_t) strtoul(argv[2], NULL, 0); // Seconds
// Signal handlers
//
struct sigaction act;
act.sa_handler = sig_handler;
act.sa_flags = SA_ONSTACK;
sigemptyset(&act.sa_mask);
sigaction(30, &act, NULL);
// SIGALRM is used to exit the program in case of a hanging.
act.sa_handler = sig_alarm;
sigaddset(&act.sa_mask, 30);
sigaction(SIGALRM, &act, NULL);
// Create the second thread
//
pthread_t pid;
sem_init(&thread1, 0, 0);
sem_init(&thread2, 0, 0);
pthread_create(&pid, NULL, test_thread, NULL);
sem_wait(&thread2);
printf("==== Ready to send the signal to the second thread. "
"Alarm: %" PRIu64 " seconds. Suspend delay: %" PRIu64 "\n",
(uint64_t) alarmCount, (uint64_t) g_delay);
alarm(alarmCount);
sem_post(&thread1);
// Now the threads are synchronized.
pthread_kill(pid, 30);
sem_wait(&thread2); // <------------------- Here the thread 1 hangs.
printf("==== Completed ====\n");
return 0;
}
Output:
$ DynamoRIO-Linux-8.0.18494/bin64/drrun -- ./signal_hang 1 30
==== Ready to send the signal to the second thread. Alarm: 30 seconds. Suspend delay: 1
*** Alarm signal was received
*** Abort.
The bug is reproduced without any client, regardless the -debug
option.
Versions
The bug is reproduced on DynamoRIO versions for Linux x86-64: 8.0.0
and 8.0.18494
.
Host: Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-42-generic x86_64)
Also the bug was reproduced on a custom DynamoRIO version for Android AArch64.