REDIS Timeout Issue
Created by: elucidsoft
I have no idea why this happens, but it does. When the Redis server gets rebooted, and your Queue loses its connection it doesn't reconnect. We have discussed this before, you believe its an issue with IORedis. But I am not sure now, after spending several hours playing with this I can consistently repeat the issue now.
Some observations:
- The only method that throws an exception is queue.add
- All other methods seem to work fine, reconnect to Redis fine and return data fine.
- queue.clien.ping() returns 'PONG' fine, but queue.add fails with an exception.
I created the following rather ugly, horribly ugly code for my healthcheck as a temp workaround until I can find the root cause of this issue.
const job = await this.queueService.queue.add(null, {removeOnComplete: true, delay: 9999}); await job.remove();
This code will ALWAYS throw an exception in this scenario, so its a pretty good check to see...
Environment:
Kubernetes cluster, I have tried the following environments and it seems to happen on each with different errors:
Single Redis Instance: Same behavior, but you get an ECCONREFUSED error from IORedis.
Sentinel Redis with 3 instances master/slave, if you kill all of them simultaneously you get ALL SENTINELS are down error from IORedis.
Both appear to behave exactly the same. If you reboot your app, everything works again.