Dockerfile: Creating Images with Best Practices

Cloud Native applications are delivered as containers. Containers, and more specifically, container images encapsulate the application and all its dependencies in a portable format. Containerizing an application involves creating a Dockerfile, building the image and pushing it to an image registry.

Create a Dockerfile

To containerize an application, create a Dockerfile alongside the application’s source code. The Dockerfile has instructions describing how to build an image for your application.

The following is an example Dockerfile that describes how to containerize a Node.js application. It includes some of the most common Dockerfile instructions:

FROM node:10

WORKDIR /usr/src/app

COPY package*.json ./

RUN npm install

COPY . .

EXPOSE 8080

CMD [ "node", "server.js" ]

Note: The above Dockerfile is a simple example for brevity. It does not follow all best practices, specifically, the image retains build tools and other unnecessary dependencies. See Use Multi-stage Docker Builds below for more information.

For more information about the Dockerfile, see the official reference documentation.

Work with Container Images

Once you have a Dockerfile, you can build, push, pull and run the image.

To build the image:

docker build . -t <image-name>:<image-tag>

To push the image:

docker push <image-name>:<image-tag>

To pull the image:

docker pull <image-name>:<image-tag>

To run the image:

docker run <image-name>:<image-tag>

Tag Your Image

Container images are identified by a name and a tag. For example, for the nginx:1.17.2 image, nginx is the name and 1.17.2 is the tag.

Image tags convey version information about the image. When containerizing an application, document the tagging strategy that you will follow.

Even though possible, do not reuse (or move) image tags. Ensure that your tags are immutable and unique throughout the lifetime of the image.

The following are recommended tagging strategies:

  • Semantic Versioning: Use a version number that follows the Semantic Versioning Specification. Under this specification, version numbers are of the form x.y.z, where x is the major version, y is the minor version, and z is the patch version. For example, the container that has version v1.1.15 of your application would be called my-app:v1.1.15. See the full specification for more details.
  • Git commit hash: Use the Git commit SHA-1 hash of the source code as the version number. This ensures that your tag is unique and references a specific version of your application. For example, the container built from commit c41fa1d in your source code repository would result in an image called my-app:c41fa1d.

Choose a Base Image

Container images are composed of multiple layers. Each instruction in a Dockerfile represents an image layer. The FROM instruction in a Dockerfile is the first layer of the container, also known as the base image. Subsequent instructions in the Dockerfile are applied on top of the base image.

Selecting a base image is a crucial decision for your application. The application itself will dictate most of the decision, but whenever possible, prefer images that:

  • Are published by a reputable vendor
  • Are updated continuously with the latest patches
  • Have an open source Dockerfile that you can inspect
  • Do not have unnecessary tools or libraries

Distroless Base Image

Distroless is an open source base image that enables you to build slim images that only contain your application and its dependencies. There are multiple images that you can choose from, according to your application stack.

Scratch Base Image

If you can build your application as a statically-linked binary, consider using scratch as the base image. The FROM scratch instruction tells the build process that you want to use an empty image as a starting point. Using the scratch base image enables you to build a minimal container that only includes your application.

For example, the following Dockerfile produces a container image that has a single, statically-linked binary:

FROM scratch
COPY my-app /my-app
CMD ["/my-app"]

Use Multi-stage Docker Builds

Application container images should be as minimal as possible. They should not include any tools or libraries that are not needed to run the application. This not only reduces the image size on disk, but also reduces the attack surface of your application.

Docker’s multi-stage build enables you to produce minimal container images for your applications. A multi-stage Dockerfile has multiple FROM instructions that can reference different images. Typically, the first stage references a larger image that contains all the tooling required to build the application. Once the application is built, the build artifacts are copied into a new stage with a smaller base image. The build stage is discarded along with all the build tooling (which is unnecessary at runtime).

The following Dockerfile is an example of a multi-stage build for a Go application. Note the image in stage 1 contains tools required for building the application. The final output, from stage 2, only contains what is needed at runtime, achieving a more minimal final image.

# Build stage
FROM golang:1.12.7 as build

WORKDIR /my-app

COPY go.mod .
RUN go mod download

COPY main.go .
ENV CGO_ENABLED=0
RUN go build -o my-app

# Final stage
FROM scratch
COPY --from=build /my-app/my-app /my-app
CMD ["/my-app"]

Leverage the Cache

Docker maintains a cache to speed up the build of container images. When building an image, each instruction in the Dockerfile results in an image layer that can be reused in later builds if no changes are made. If a layer changes, the cache is invalidated, and all the following layers must be built from scratch.

The cache also helps minimize data transfer during pull operations. When pulling an image, the container runtime only downloads image layers that it does not already have in the cache. This can have a significant impact in Cloud Native platforms such as Kubernetes, where multiple hosts download the same image at the same time.

When building images, Docker uses the following checks to determine if something has changed in the image layer:

  • For ADD and COPY Dockerfile instructions, calculate the checksum of the files referenced in the instruction. Compare the resulting checksum against the checksum of the files that exist in the cached layer.
  • For the rest of the Dockerfile instructions, compare the instruction itself against the instruction used to produce the cached layer.

To reap the benefits of the cache, place Dockerfile instructions that change often at the bottom of your Dockerfile. This ensures that layers that do not change often can be reused from the cache.

For example, the following Dockerfile downloads the application dependencies in a separate layer before copying the application source code. This results in a faster build time because the build process downloads dependencies only when they change.

FROM golang:1.12.7 as build

WORKDIR /my-app

# Download dependencies
COPY go.mod .
RUN go mod download

# Copy source code and build
COPY main.go .
ENV CGO_ENABLED=0
RUN go build -o my-app

Combine Run Instructions

When possible, combine multiple RUN instructions into one. This reduces the number of layers in your image and ensures that any cache-dependent commands (for example, apt-get update and apt-get install commands) run as a single unit.

For example, instead of:

RUN apt-get update
RUN apt-get install curl

Do:

RUN apt-get update && apt-get install curl

When combining commands, the pipe character as processed by the /bin/sh -c interpreter will succeed even if only the last command in the chain succeeded.

To ensure you don’t come across unexpected behaviour due to errors in any stage of the pipe, you can prepend your commands that include pipes with 

RUN set -o pipefail && wget -O - https://somesite.com/records.txt | wc -l > /number

Containerize a Single Process

The container and the process it contains share the same life cycle. For this reason, a container should encapsulate a single process.

Let us consider the LAMP (Linux, Apache, MySQL, PHP) stack as an example. Instead of creating a single container that includes the Apache Web Server, MySQL and PHP application, create three separate containers for each of the components. By decoupling the life cycle of the application components, they can be managed and scaled independently.

If you must run multiple processes in a single container, consider using a process manager such as supervisord. Note, however, that this is not recommended over the single process per container approach.

Handle Linux Signals

Container engines use Linux signals to manage the life cycle of containers. Thus, it is important that your application handles Linux signals properly. Doing so allows you to gracefully terminate your application and avoid user-facing errors or data corruption.

In addition to handling Linux signals, your application is responsible for reaping any orphaned or zombie processes that it might create. Failing to do so can result in PID or memory starvation.

Init Systems in Containers

If you are unable to handle Linux signals or reap zombie processes in your application, you can use an init system to do it for you. Typical init systems such as systemd or SysV, however, are not designed to run inside containers and can be large and complex.

Use tini if you need to include an init system in your container. Tini is a lightweight init system specifically designed for running within containers. It handles Linux signals properly and reaps zombie processes on behalf of your application.

Annotate Your Container Images

Container image names and tags provide limited information about an image. Thus, images should be annotated with additional metadata, such as the maintainers, creation date, version, and more.

You can annotate images using two methods:

Static labels defined in the Dockerfile:

FROM scratch LABEL org.opencontainers.image.authors="Jane Smith <jsmith@example.com>"

Dynamic labels defined using the --label flag of the docker build command:

docker build -t my-app:v0.1.0 --label "org.opencontainers.image.version"="0.1.0"

The following is a subset of pre-defined annotations proposed by the Open Containers Image Format Specification:

org.opencontainers.image.created: date and time on which the image was built (string, date-time in RFC 3339)

org.opencontainers.image.authors: contact details of the people or organization responsible for the image (string)

org.opencontainers.image.version: version of the packaged software (may match a label or tag in the source code repository)

For the full list of annotations, see the Annotations document of the OCI Image Format specification.

ENV variables

You can use ENV instead of hard-coding values in your Dockerfile. This allows you to make changes in a single location in your Dockerfile and keep it efficient and clean:

ENV NGINX_VERSION=1.18.0 
RUN curl -SL http://nginx.org/download/nginx-$NGINX_VERSION.tar.gz | tar -zxC && ...

It’s important to note that creating environment variables with ENV creates an intermediate layer that will persist even if you unset this variable in a later step, causing the variable to exist in your final container. If you want to avoid this situation you should set, use and unset in the same line.

Add Non-root User

Containers should run as a non-root user. This practice will aid in mitigating exploitation of vulnerabilities and potentially getting root level access (or privileges) to the host.

Containers that use scratch as the base image can leverage multi-stage docker builds to create the user and copy over relevant assets. For example, the following adduser command can be completed in the example image and any configuration copied over.

FROM  alpine:3.11 as base
RUN   addgroup -g devgroup && \
      adduser -u 10001 developer -G devgroup

FROM scratch
COPY  --from=base /etc/passwd /etc/passwd
USER  developer:devgroup
COPY  docker-entrypoint.sh /usr/local/bin/

ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["custom-process"]

The end result of the above is the container running as the developer user.

Leave a Reply

Your email address will not be published. Required fields are marked *