Skip to content
GitLab
Projects Groups Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
  • D dynamorio
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 1,467
    • Issues 1,467
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 44
    • Merge requests 44
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Package Registry
    • Infrastructure Registry
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • DynamoRIO
  • dynamorio
  • Issues
  • #4431
Closed
Open
Issue created Sep 01, 2020 by Administrator@rootContributor

HANG: A pending signal is not handled after unmasking leading to a hang in some contexts

Created by: afarallax

A brief description If a signal is delivered to a thread where it is masked, and it becomes a pending signal, then under DynamoRIO the signal is not handled after it becomes unmasked.

Reproduction Below is a test program with comments and a description of its behavior.

/*******************************************************************************
 * The signal deadlock issue reproduction test.
 *
 * Compiling:
 * g++ -std=c++14 -g3 -Og -o signal_hang signal_hang.cpp -pthread
 *
 * Usage:
 * signal_hang <suspend delay> <alarm timer in seconds>
 *
 * Reproduction command:
 * dynamorio/bin64/drrun -- ./signal_hang 1 30
 * 
 * If the hang is not reproduced, increase the suspend delay parameter.
 * There is an alarm timer aborting the program in case of hang.
 *
 * The test description:
 *   At some point there is synchronized state of both threads 1 and 2.
 *   After it the thread1 sends the signal 30 to the thread2, while this signal
 *   is masked. The thread2 has a delay and then calls the sigsuspend() syscall
 *   with an empty mask, so it should get any signal.
 *   If the syscall is starting to execute and apply the new empty
 *   signal mask, and then the signal is received by the thread2, there is no
 *   problem with the deadlock.
 *   But if the signal is received before the sigsuspend syscall unlock it,
 *   it is not taken after the unlock, and the thread remains in the suspended
 *   state endlessly.
 *   The correct behaviour expected: the signal comes to the thread2, becomes
 *   pending, and it should be handled after the unlock, so that the thread
 *   resumes.
 *
 ******************************************************************************/

#include <pthread.h>
#include <semaphore.h>
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <inttypes.h>
#include <assert.h>

sem_t thread1, thread2;
volatile size_t g_delay = 0;

////////////////////////////////
//
static void sig_handler(int signum)
{
    assert(signum == 30);
}

static void sig_alarm(int signum)
{
    assert(signum == SIGALRM);

    printf("*** Alarm signal was received\n*** Abort.\n");
    exit(1);
}

////////////////////////////////
//
static void delay()
{
    auto tmp = g_delay;
    g_delay *= 1000000;
    while (g_delay--) {}
    g_delay = tmp;
}

static void* test_thread(void* ptr)
{
    stack_t sigstack;

    sigstack.ss_sp = malloc(SIGSTKSZ);
    sigstack.ss_size = SIGSTKSZ;
    sigstack.ss_flags = 0;
    if (sigaltstack(&sigstack, NULL) == -1)
    {
        printf("*** sigaltstack() failed. Abort.\n");
        exit(2);
    }

    sigset_t mask;
    sigemptyset(&mask);
    sigaddset(&mask, 30);

    if (pthread_sigmask(SIG_SETMASK, &mask, NULL) != 0)
    {
        printf("*** pthread_sigmask() failed. Abort.\n");
        exit(2);
    }

    sem_post(&thread2);
    sem_wait(&thread1);

    delay();

    // We expect to have the pending signal 30 at this point.

    sigemptyset(&mask);
    sigsuspend(&mask); // <------------------- Here the thread 2 hangs.

    sem_post(&thread2);
    return NULL;
}

////////////////////////////////
//
int main(int argc, char **argv)
{
    if (argc != 3)
    {
        printf("*** Usage: signal_hang <suspend delay> <alarm timer in seconds>\n    Abort.\n");
        exit(3);
    }

    g_delay             = (size_t) strtoull(argv[1], NULL, 0);  // Some undefined time units.
    uint32_t alarmCount = (uint32_t) strtoul(argv[2], NULL, 0); // Seconds


    // Signal handlers
    //
    struct sigaction act;
    act.sa_handler = sig_handler;
    act.sa_flags = SA_ONSTACK;

    sigemptyset(&act.sa_mask);
    sigaction(30, &act, NULL);

    // SIGALRM is used to exit the program in case of a hanging.
    act.sa_handler = sig_alarm;
    sigaddset(&act.sa_mask, 30);
    sigaction(SIGALRM, &act, NULL);


    // Create the second thread
    //
    pthread_t pid;
    sem_init(&thread1, 0, 0);
    sem_init(&thread2, 0, 0);

    pthread_create(&pid, NULL, test_thread, NULL);
    sem_wait(&thread2);

    printf("==== Ready to send the signal to the second thread. "
           "Alarm: %" PRIu64 " seconds. Suspend delay: %" PRIu64 "\n",
           (uint64_t) alarmCount, (uint64_t) g_delay);

    alarm(alarmCount);

    sem_post(&thread1);

    // Now the threads are synchronized.

    pthread_kill(pid, 30);

    sem_wait(&thread2); // <------------------- Here the thread 1 hangs.

    printf("==== Completed ====\n");
    return 0;
}

Output:

$ DynamoRIO-Linux-8.0.18494/bin64/drrun -- ./signal_hang 1 30
==== Ready to send the signal to the second thread. Alarm: 30 seconds. Suspend delay: 1
*** Alarm signal was received
*** Abort.

The bug is reproduced without any client, regardless the -debug option.

Versions The bug is reproduced on DynamoRIO versions for Linux x86-64: 8.0.0 and 8.0.18494. Host: Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-42-generic x86_64)

Also the bug was reproduced on a custom DynamoRIO version for Android AArch64.

Assignee
Assign to
Time tracking