I am seeing issues with our sidekiq process where jobs are stuck after sidekiq isn’t shutdown correctly. According to the sidekiq wiki this can happen due to sidekiq not being shutdown correctly. Batches · mperham/sidekiq Wiki · GitHub
- If you find that batches are stuck with Pending jobs, especially right around a deployment, verify you are gracefully restarting Sidekiq as designed: send TSTP as early as possible, TERM as late as possible, and never use kill -9.
- Seeing "positive pending" batches but can't find those pending jobs? They are likely in a super_fetch private queue. This can happen if your deploys are misconfigured and creating orphaned jobs. Check your -t shutdown timeout value (default: 25) and make sure your deploy tool is giving Sidekiq at least N+5 (i.e. 30) seconds before killing the process.
Hey kcoleman_hb!
When currently running containers are shutdown, we use the Docker defaults to stop the container. This means that we run a docker stop
command, which issues a SIGTERM
to PID 1, followed by a SIGKILL
to all running processes after 10 seconds if the container still has not stopped.
If you jobs are taking more than 10 seconds to stop and re-queue, we’ve seen another client successfully use an approach where you use the before_release
commands in .aptible.yml to quiet your Sidekiq workers ahead of the step in a deploy where any container would be asked to stop. This mean you can take up to 30 minutes (the timeout on before_release) to be sure your jobs have stopped properly, and then continue with deploy. The only downside of that approach is you have to be careful to un-quiet your workers if your before_release commands fail after quieting. If your jobs are being orphaned/need more reliable execution, you may also want to take a look at Sidekiq Pro’s super_fetch
as well.
— Michael