Graceful exit on deploy


#1

During a deployment (especially for apps that are not serving web requests), how should I ensure that an app/container gracefully exits? There exists documentation for when a container exceeds memory limits. Does that apply for container exits during deployments, too?


#2

Right now, we send SIGTERM to PID 1 in your container, which is slightly different to what we do when your app runs out of memory (where we send it to all processes in the container).

Unfortunately, this isn’t necessarily easy to handle, since Enclave runs a shell as PID 1 to interpret your Procfile line, and shells don’t normally forward signals to subprocesses. If you want to run your own process as PID 1, then you need something like this:

# using exec here ensures your shell is replaced with your app as PID 1
# this ensures your app process is the one that receives the SIGTERM
app: exec some command --with arguments

This may change over time (if it changes, we may expose some level of configuration) in .aptible.yml).

That being said, during a deploy, we first drain web traffic going to app containers before terminating them. In other words, by the time your container receives SIGTERM, it’s no longer receiving traffic.

This means that as far as web apps are concerned, exiting gracefully is often not very important. For comparison, exiting gracefully in e.g. a background worker process is more important, since you might want to e.g. push back running jobs to the queue, or at the very least stop accepting new jobs.

Let me know what you think!

Cheers,


#3

Thanks, Thomas.

So it sounds like I don’t need to worry about web apps. For background workers I’ll use the Procfile change you mention.


#4

Combining this with restarting the app after a crash, would this be?

app: while true; do exec some command --with arguments; sleep 1; done

#5

No; that won’t work at all. To understand why, you’ll need to understand how processes work in Unix in general and Aptible in particular.

When you’re deployed on Enclave, the process hierarchy looks like this:

- /bin/sh (PID 1) (running your Procfile line) 
    \ - some command --with arguments (whatever your Procfile line runs)

But, when you use the exec instruction in a shell, it replaces the shell process with whatever your are calling. In other words, the subcommand takes over from the shell (at that point, the shell stops existing).

This means that if you use while true; do exec some command --with arguments; sleep 1; done in your Procfile, you’ll enter the while loop once, then run exec and replace your shell with some command.

This means you’ll be left with a much simpler process hierarchy:

- some command --with arguments (PID 1)

This is good because your process is now running as PID 1 so it’ll receive SIGTERM from Enclave, but unfortunately, it also means that there isn’t a shell to run your loop anymore, so your auto restarting won’t work.

So, in your case, what you need is a PID 1 process that does two things:

  • Handles SIGTERM by sending it to your app
  • Automatically restarts your app.

One way to do this is to have PID 1 proxy SIGTERM signals to all processes in your container, then have that process spawn your shell. As it turns out, you can use Tini (of which I’m the author) to do this.

To do so, add the following file at the root of your repo as entrypoint.sh:

#!/bin/sh
# Start tini in process group mode and exec so that it runs as PID 1 and
# forwards signals to all processes in the container. Have it spawn a shell and
# passthrough all arguments received by the entrypoint to said shell so the
# Procfile line is intepreted properly.
exec /tini-static -g -- sh -c "$*"

Then, add the following to your Dockerfile:

ENV TINI_VERSION v0.14.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static /tini-static
RUN chmod +x /tini-static

ADD entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/entrypoint.sh"]

With this, you can continue using your existing Procfile while loop (i.e. without the exec change), but signals will now be sent to all processes in your container.

Cheers,


#6

Thanks again, Thomas!


#7

When the container receives SIGTERM, is it already disconnected from the log drain? I’ve just setup a node process to trap SIGTERM, and I’m not seeing this in my logs.

  process.on('SIGTERM', () => {
    logger.info('Received SIGTERM. Gracefully exiting.')
    process.exit()
  })

#8

Currently, both operations (shutting down app containers and disconnecting logging) happen at roughly the same time — technically we do wait for the app containers to be shutdown first before disconnecting logging, but due to buffering this doesn’t necessarily mean all logs will be captured.

This isn’t very intuitive, so we’re investigating ways to ensure all logs are routed before shutting down (this would potentially make deploys / restarts take a little longer).


#9

I added the recommended tini-static code and the changes to my docker file. My proc file looks like the following

event: while true; do NEW_RELIC_CONFIG_FILE=newrelic.ini newrelic-admin run-program python kombu_app.py; sleep 1; done
web: while true; do NEW_RELIC_CONFIG_FILE=newrelic.ini newrelic-admin run-program gunicorn -w 4 -b 0.0.0.0:8080 flask_app:app --log-config logging.conf; sleep 1; done

In my python app, I’ve added a signal handler that hits a web hook and fires off a ticket when the process shuts down to see if it’s working and I’m not catching the sigterm. I’ve confirmed running the same command on my system does catch the sigterm, but on aptible it’s not submitting the ticket. Any thoughts on how to go about debugging this?


#10

@jesse : unfortunately, this might be a little racy (something I overlooked earlier). It’s likely your shell and underlying Python app are racing to exit.

I don’t know how you tested this locally, but on Aptible, what’s probably happening is:

  • your shell exits
  • as a result, Tini exits
  • as a result, the entire container exits before your Python app has time to notify you

If you’d like, we can confirm this. Just turn up logging in Tini by adding -vvv as an argument, then restart your app and let me know — I’ll get you the logs so you can see exactly what happened.

At this point, I recommend you use an actual process supervisor to make this work; toying with shells and signals will probably not be a good use of your time. To do so, I recommend using supervisord. This is what we use in places where we’ve needed process supervision.

The following configuration file might help you get started:

[supervisord]
nodaemon=true

[program:redis]
command = newrelic-admin run-program python kombu_app.py
autostart = true
autorestart = true
stdout_logfile = /dev/stdout
stdout_logfile_maxbytes = 0
stderr_logfile = /dev/stderr
stderr_logfile_maxbytes = 0

Your Procfile will need to look like app: NEW_RELIC_CONFIG_FILE=newrelic.ini exec supervisord -c "/etc/supervisord.conf" (assuming you dropped the configuration file in /etc/supervisord.conf).


#11

I’m running python 3 and don’t want to deal with having two environments so I decided to go with circus rather than supervisord, but that totally did the trick. Signals are being handled and cleanup is working properly now. Thanks for all the help.


#12

Happy to hear it! :slight_smile:


#13

@jesse @alansempre: I wanted to give you a heads up that while loops are no longer necessary (or even encouraged) on Enclave: as of today, we restart containers automatically when they exit, at the platform level!

This is documented here: https://www.aptible.com/support/topics/enclave/how-to-auto-restart-your-app/


#14

Thanks @thomas

I ended up using the PM2 process manager and confirmed that signals are forwarded correctly. Here’s how it’s setup:

In pm2.config.js

module.exports = {
  apps: [{
    name: 'app',
    script: './build/index.js',
    interpreter_args: '-r newrelic',
    env_production: {
      'NODE_ENV': 'production'
    },
  }]
}

In Dockerfile

CMD ["/usr/src/app/node_modules/.bin/pm2-docker", "--raw", "pm2.config.js", "--only app", "--env production"]

In Procfile:

web: /usr/src/app/node_modules/.bin/pm2-docker --raw pm2.config.js --only app --env production


#15

@alansempre :+1:, pm2-docker is indeed great choice!


#16

@alansempre as an aside, if you have a single service, note that you don’t need a Procfile anymore: Enclave falls back to your CMD in the absence of a Procfile!