Datadog container for APM traces and log collection (an update)

I’ve recently updated the container I had running to use Datadog Agent v6.12.1, and in so doing, I was able to document how the process works. I’m posting this in case it helps others.

ALSO, DataDog now signs BAAs! Let those logs rip!

NOTE: This information does not apply to container metrics! Metrics are set up through a push-button metric drain that can drain to DataDog without a container!

  1. Basic principles
    The dockerized DataDog Agent, under normal (that is, non Aptible Deploy) circumstances, attaches to the docker daemon via the unix socket. In Aptible, this isn’t possible, and so the standard features of the Agent won’t work.
    That said, you can still get trace data and logs forwarded by the agent via TCP forwarding from various containers/log drains: the trace and log collection agents listen on specific ports and then forwards them to DataDog’s API.
    The agent lives in a container with two endpoints, a tcp endpoint and a tls endpoint. The TLS endpoint is necessary because Deploy log drains only send to TCP/TLS, whereas the Agent is listening in plaintext.

  2. Trace agent
    The trace agent requires configuration within your application itself to send the traces. In our case, since we have a ruby app, we use a gem to send these traces to the TCP endpoint. (You can see my config in an earlier post.)

  3. Log collection
    Log collection from other Deploy containers requires the setup of a custom collection agent.

  4. Building the Docker container, part one
    Because we need to configure this custom collection agent, we need to build our own Docker image.

  • Clone this repo
  • Download the .deb datadog-agent package for the build. (Check your versions, architectures, etc.)
  • cd to the directory Dockerfiles/agent and move the binary package to that location.
  • It’s probably not a bad idea to attempt to build it just as is, to see that it does build.
  1. Building the Docker container, part two
    Now that we’ve got the potential to build a container, we can make our modifications
  • You need to add port 10518 to the EXPOSE line
  • You need to COPY in the relevant configuration file for the custom log collection agent
    The documentation on this isn’t terribly clear. It describes putting the same configuration file in two places. Based on the log output, I suspect that it is only needed as a conf.yaml file under conf.d, but I did put it in both places and I haven’t bothered to check if it works without both of them.

Here are the two lines I added to the Dockerfile:

COPY logs.conf.yaml /etc/datadog-agent/welldapp.d/conf.yaml
COPY logs.conf.yaml /etc/datadog-agent/conf.d/aptible.d/conf.yaml

And here is the collection file that I placed in those locations:

init_config:

instances:
    
logs:
  - type: tcp
    port: 10518
    service: welldapp
    source: aptible

Note how the server and source names match the names of the destination folders in which they are placed in the Docker image.

  1. Configuring the Agent via ENV variables
    I was able to achieve all the configuration I needed using ENV variables.
APTIBLE_PRIVATE_REGISTRY_USERNAME=<removed>
APTIBLE_PRIVATE_REGISTRY_PASSWORD=<removed>
DD_API_KEY=<removed>
DD_APM_ENABLED=true
DD_APM_ENV=enclave
DD_APM_NON_LOCAL_TRAFFIC=true
DD_HOSTNAME=dd-container-on-enclave
DD_LOG_LEVEL=off
DD_LOGS_CONFIG_LOGS_DD_URL=tcp-encrypted-intake.logs.datadoghq.com:10516
DD_LOGS_ENABLED=true
DD_LOGS_CONFIG_CONTAINER_COLLECT_ALL=false
DD_PROCESS_AGENT_ENABLED=disabled
  • The DD_APM_ENV variable is new since 6.1.1; previously all my metrics were env:none, so it’s nice to be able to set it to something.
  • Note that we use the URL for encrypted log intake because we have a BAA.
  • I have set the last two variable to false since, without a properly connected Agent, that functionality isn’t available anyway. (They are likely false by default, but I wanted to be clear.)
  1. Configuring the endpoints
    The agent’s container needs two endpoints.
  • TCP endpoint. This specifies BOTH ports that the Agent is listening for traffic on. In my case, that’s 8126 and 10518.
  • TLS endpoint. This specifies only the port that the log drain will be connected to. In my case, that’s 10518.
  • You should set these up using the CLI
  1. Set up log drains
    Each of the aptible environments from which you want DataDog to collect logs. The log drain must point at the TLS endpoint.

Questions? Feel free to ask.

And if there is anyone out there proficient in the right code base, I’d love to work with you on customizing the DataDog agent itself to have an Aptible Deploy mode built in so that we could override the expectation of the Docker socket connection and thus get rid of the constant error stream that this lack of connection generates. There are probably enough DataDog/Aptible users that they’d be willing to consider a pull request.