Possible deadlock on Failed jobs
Created by: xdc0
I'm noticing that several failed jobs remain locked.
Monitoring the TTL of the lock, I notice that it renews exactly at LOCK_RENEW_TIME / 2
The only place where such lock happens is in: https://github.com/OptimalBits/bull/blob/master/lib/queue.js#L570
By inspecting the logic, I think there is a race condition that is causing a deadlock to take place:
- A job starts being processed and a lock renew loop kicks off
- A job fails and gets moved to
failed
- Just there, the lock renew function calls itself again and attempts to renew the lock
- Before step 3 is completed, the timer is cleared by the
finally
clause in https://github.com/OptimalBits/bull/blob/27321806a95d07439412ffc666088cdd9ebadaa4/lib/queue.js#L627 - The lock renew function completes and sets the timer again in https://github.com/OptimalBits/bull/blob/27321806a95d07439412ffc666088cdd9ebadaa4/lib/queue.js#L570