Updated on 2025-04-23 GMT+08:00

Writing a Quality Dockerfile

This document walks you through how to compile an efficient Dockerfile, using the containerization of an application as an example. Based on the practices of SWR, this file exemplifies how to create images of fewer layers and smaller size to speed up image build process.

The following figure shows a common architecture of an enterprise portal website. This website consists of a web server that provides web services, and a database that stores user data. Normally, the website is deployed on a single server.

To containerize the application, a Dockerfile may be written as follows:

FROM ubuntu

ADD . /app

RUN apt-get update  
RUN apt-get upgrade -y  
RUN apt-get install -y nodejs ssh mysql  
RUN cd /app && npm install

# this should start three processes, mysql and ssh
# in the background and node app in foreground
# isn't it beautifully terrible? <3
CMD mysql & sshd & npm start

However, the preceding Dockerfile, including the CMD command, is problematic.

To rectify and optimize the Dockerfile, here are some tips:

Run Only One Process in Each Container

Technically, multiple processes, including database, frontend, backend, and SSH, can run on the same Docker container. However, this is not what containers are built for. Stuffing all the processes into one container not only makes the image extremely large in size, but also prolongs the container building time and wastes resources when you perform horizontal scaling. This is because the whole container has to be rebuilt every time you make small adjustments and the number of containers for each application can only be equally added during scaling in.

Therefore, usually, an application is split into microservices before containerization. You will benefit a lot from the microservice architecture:

  • Independent scaling: After an application is split into independent microservices, you can adjust the number of pods for each microservice separately.
  • Faster development: Since microservices are decoupled, they can be coded independently from each other.
  • Security assurance through isolation: For an overall application, if a security vulnerability exists, attackers can use this vulnerability to obtain the permission for all functions of the application. However, in a microservice architecture, if a service is attacked, attackers can only obtain the access permission for this service, but cannot intrude other services.
  • Stabler service: If one microservice breaks down, other microservices can still run properly.

To optimize the preceding sample Dockerfile, run the web application and MySQL in different containers.

You can modify them separately. As shown in the following example, MySQL is deleted from the sample Dockerfile. Only Node.js is installed.

FROM ubuntu

ADD . /app

RUN apt-get update  
RUN apt-get upgrade -y

RUN apt-get install -y nodejs 
RUN cd /app && npm install

CMD npm start

Do Not Upgrade the Tag During Image Build

To reduce image complexity, dependency, size, and build time, do not install any unnecessary packages in your images. For example, do not include a text editor in a database image.

Contact the package maintenance personnel if a package in the base image is out of date but you do not know which package it is. To upgrade a specific package automatically, for example, foo, run the apt-get install -y foo command.

apt-get upgrade brings great uncertainty to image build. Inconsistency between images might occur as you are not sure what packages have been installed by apt-get upgrade during image build. Therefore, apt-get upgrade is usually deleted.

The following is the sample Dockerfile without apt-get upgrade:

FROM ubuntu

ADD . /app

RUN apt-get update

RUN apt-get install -y nodejs
RUN cd /app && npm install

CMD npm start

Merge RUN Commands of Similar Update Frequency

Like an onion, a Docker image consists of many layers. To modify an inner layer, you need to delete all outer layers. Docker images have the following features:

  • Each command in a Dockerfile creates an image layer.
  • Image layers are cached and reused.
  • Cached image layers expire when the files they copy or variables specified in image build change.
  • When a cached image layer expires, its subsequent cached image layers expire accordingly.
  • Image layers are immutable. If a file is added into a layer and then deleted in the next layer, the file still exists in the image. The file just turns unavailable in the Docker container.

Therefore, merge multiple commands that are of similar updating probability to avoid unnecessary costs. In the sample Dockerfile, Node.js and npm are installed together. That means Node.js is reinstalled each time the source code is modified, which is time and resource consuming.

FROM ubuntu

ADD . /app

RUN apt-get update \  
    && apt-get install -y nodejs \
    && cd /app \
    && npm install

CMD npm start

It would be better to write the Dockerfile as follows:

FROM ubuntu

RUN apt-get update && apt-get install -y nodejs  
ADD . /app
RUN cd /app && npm install

CMD npm start

Specify an Image Tag

If no tag is specified for an image, the image will be tagged with latest by default. For example, the FROM ubuntu command is equivalent to FROM ubuntu:latest. During an image update, latest will point to a new tag. The image build may fail.

In the sample Dockerfile, tag the ubuntu image with 16.04 as follows:

FROM ubuntu:16.04

RUN apt-get update && apt-get install -y nodejs  
ADD . /app  
RUN cd /app && npm install

CMD npm start

Delete Unnecessary Files

Assume that you have updated the apt-get sources, installed some software packages, and saved them in the /var/lib/apt/lists/ directory.

However, these files are not required to run applications. To make the Docker image more lightweight, it is advised to delete these unnecessary files.

Therefore, in the sample Dockerfile, the files in the /var/lib/apt/lists/ directory are deleted.

FROM ubuntu:16.04

RUN apt-get update \  
    && apt-get install -y nodejs \
    && rm -rf /var/lib/apt/lists/*

ADD . /app  
RUN cd /app && npm install

CMD npm start

Select a Suitable Base Image

In our sample Dockerfile, ubuntu is selected as the base image. However, as you only need to run a node program, there is no need to use a general base image. A node image would be a better choice.

A node image tagged with alpine is recommended. Alpine is a lightweight Linux distribution with a size of only 4 MB.

FROM node:7-alpine

ADD . /app  
RUN cd /app && npm install

CMD npm start

Set WORKDIR and CMD

WORKDIR can be used to set a default directory where RUN, CMD, and ENTRYPOINT commands will be run.

CMD provides default commands to be executed when running a container from an image. Write the commands in an array.

FROM node:7-alpine

WORKDIR /app  
ADD . /app  
RUN npm install

CMD ["npm", "start"]

(Optional) Use ENTRYPOINT

ENTRYPOINT is optional because it increases complexity. ENTRYPOINT is a script that is executed by default. It uses the specified commands as its parameters. It is usually used to create executable Docker images.

FROM node:7-alpine

WORKDIR /app  
ADD . /app  
RUN npm install

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Use exec in ENTRYPOINT

In the preceding ENTRYPOINT script, exec is used to run a node application. If exec is not used in ENTRYPOINT, the container cannot be successfully closed because the SIGTERM signal is interrupted by the bash process. The process started by exec can replace the bash process. In this way, all signals can work normally.

Use COPY Preferentially

COPY is simply used to copy files to images. ADD is more complex and can be used to download remote files and decompress packages.

FROM node:7-alpine

WORKDIR /app

COPY . /app
RUN npm install

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Change the Order of COPY and RUN

Place the parts that are infrequently changed in the front of your Dockerfile to make the most out of the image cache.

In the sample Dockerfile, the source code changes frequently. Every time the image is built, npm needs to be reinstalled. To avoid this issue, copy package.json first, then install npm, and at last copy the rest of the source code. In this way, changes of the source code will not result in repetitive installation of npm.

FROM node:7-alpine

WORKDIR /app

COPY package.json /app  
RUN npm install  
COPY . /app

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Set Default Environment Variables, Mapping Ports, and Data Volumes

Environment variables may be required when running a Docker container. Setting default environment variables in Dockerfile is a good choice. In addition, you can set mapping ports and data volumes in the Dockerfile. Example:

FROM node:7-alpine

ENV PROJECT_DIR=/app

WORKDIR $PROJECT_DIR

COPY package.json $PROJECT_DIR  
RUN npm install  
COPY . $PROJECT_DIR

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Environment variables specified by ENV can be used in containers. If you only need to specify variables for image build, you can use ARG instead.

Use EXPOSE to Set Listening Ports

EXPOSE is used to describe which ports your containers will listen on. For example, set EXPOSE 80 for an Apache image and EXPOSE 27017 for a MongoDB image.

For external access, use a flag to map ports when executing docker run.

FROM node:7-alpine

ENV PROJECT_DIR=/app

WORKDIR $PROJECT_DIR

COPY package.json $PROJECT_DIR  
RUN npm install  
COPY . $PROJECT_DIR

ENV APP_PORT=3000
EXPOSE $APP_PORT

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Use VOLUME to Manage Data Volumes

VOLUME is used to access database storage files, configuration files, or files and directories of created containers. You are advised to use VOLUME to manage the image modules that can change or the modules modifiable for users.

In the sample Dockerfile, a media directory is added.

FROM node:7-alpine

ENV PROJECT_DIR=/app

WORKDIR $PROJECT_DIR

COPY package.json $PROJECT_DIR  
RUN npm install  
COPY . $PROJECT_DIR

ENV MEDIA_DIR=/media \  
    APP_PORT=3000

VOLUME $MEDIA_DIR  
EXPOSE $APP_PORT

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Use Labels to Configure Image Metadata

Add labels to help organize images, record permissions, and automate image build. Starting with LABEL, add one or more labels with each label occupying one line.

If your string contains spaces, put the string in quotation marks ("") or convert the spaces into escape characters. If the string itself contains quotation marks, convert the quotation marks.

FROM node:7-alpine  
LABEL com.example.version="0.0.1-beta"

Use HEALTHCHECK

When running a container, you can enable the --restart always option. In this case, the Docker daemon restarts the container when the container crashes. This option is useful for containers that need to run for a long time. What if a container is running but unavailable? HEALTHCHECK enables Docker to periodically check the health status of containers. You only need to specify a command. If the containers are normal, 0 is returned. Otherwise, 1 is returned. When the request fails and the curl --fail command is run, a non-zero state is returned. Example:

FROM node:7-alpine  
LABEL com.example.version="0.0.1-beta"

ENV PROJECT_DIR=/app  
WORKDIR $PROJECT_DIR

COPY package.json $PROJECT_DIR  
RUN npm install  
COPY . $PROJECT_DIR

ENV MEDIA_DIR=/media \  
    APP_PORT=3000

VOLUME $MEDIA_DIR  
EXPOSE $APP_PORT  
HEALTHCHECK CMD curl --fail http://localhost:$APP_PORT || exit 1

ENTRYPOINT ["./entrypoint.sh"]  
CMD ["start"]

Compile the .dockerignore File

The functions and syntax of the .dockerignore file are similar to those of the .gitignore file. You can ignore unnecessary files to accelerate image build and reduce image size.

Before image build, Docker needs to prepare the context by collecting all required files to the process. By default, the context contains all files in the Dockerfile directory. However, files in directories such as .git are unnecessary.

Example:

.git/