Writing a Quality Dockerfile
This document walks you through how to compile an efficient Dockerfile, using the containerization of an application as an example. Based on the practices of SWR, this file exemplifies how to create images of fewer layers and smaller size to speed up image build process.
The following figure shows a common architecture of an enterprise portal website. This website consists of a web server that provides web services, and a database that stores user data. Normally, the website is deployed on a single server.
To containerize the application, a Dockerfile may be written as follows:
FROM ubuntu ADD . /app RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y nodejs ssh mysql RUN cd /app && npm install # this should start three processes, mysql and ssh # in the background and node app in foreground # isn't it beautifully terrible? <3 CMD mysql & sshd & npm start
However, the preceding Dockerfile, including the CMD command, is problematic.
To rectify and optimize the Dockerfile, here are some tips:
- Run Only One Process in Each Container
- Do Not Upgrade the Tag During Image Build
- Merge RUN Commands of Similar Update Frequency
- Specify an Image Tag
- Delete Unnecessary Files
- Select a Suitable Base Image
- Set WORKDIR and CMD
- (Optional) Use ENTRYPOINT
- Use exec in ENTRYPOINT
- Use COPY Preferentially
- Change the Order of COPY and RUN
- Set Default Environment Variables, Mapping Ports, and Data Volumes
- Use EXPOSE to Set Listening Ports
- Use VOLUME to Manage Data Volumes
- Use Labels to Configure Image Metadata
- Use HEALTHCHECK
- Compile the .dockerignore File
Run Only One Process in Each Container
Technically, multiple processes, including database, frontend, backend, and SSH, can run on the same Docker container. However, this is not what containers are built for. Stuffing all the processes into one container not only makes the image extremely large in size, but also prolongs the container building time and wastes resources when you perform horizontal scaling. This is because the whole container has to be rebuilt every time you make small adjustments and the number of containers for each application can only be equally added during scaling in.
Therefore, usually, an application is split into microservices before containerization. You will benefit a lot from the microservice architecture:
- Independent scaling: After an application is split into independent microservices, you can adjust the number of pods for each microservice separately.
- Faster development: Since microservices are decoupled, they can be coded independently from each other.
- Security assurance through isolation: For an overall application, if a security vulnerability exists, attackers can use this vulnerability to obtain the permission for all functions of the application. However, in a microservice architecture, if a service is attacked, attackers can only obtain the access permission for this service, but cannot intrude other services.
- Stabler service: If one microservice breaks down, other microservices can still run properly.
To optimize the preceding sample Dockerfile, run the web application and MySQL in different containers.
You can modify them separately. As shown in the following example, MySQL is deleted from the sample Dockerfile. Only Node.js is installed.
FROM ubuntu ADD . /app RUN apt-get update RUN apt-get upgrade -y RUN apt-get install -y nodejs RUN cd /app && npm install CMD npm start
Do Not Upgrade the Tag During Image Build
To reduce image complexity, dependency, size, and build time, do not install any unnecessary packages in your images. For example, do not include a text editor in a database image.
Contact the package maintenance personnel if a package in the base image is out of date but you do not know which package it is. To upgrade a specific package automatically, for example, foo, run the apt-get install -y foo command.
apt-get upgrade brings great uncertainty to image build. Inconsistency between images might occur as you are not sure what packages have been installed by apt-get upgrade during image build. Therefore, apt-get upgrade is usually deleted.
The following is the sample Dockerfile without apt-get upgrade:
FROM ubuntu ADD . /app RUN apt-get update RUN apt-get install -y nodejs RUN cd /app && npm install CMD npm start
Merge RUN Commands of Similar Update Frequency
Like an onion, a Docker image consists of many layers. To modify an inner layer, you need to delete all outer layers. Docker images have the following features:
- Each command in a Dockerfile creates an image layer.
- Image layers are cached and reused.
- Cached image layers expire when the files they copy or variables specified in image build change.
- When a cached image layer expires, its subsequent cached image layers expire accordingly.
- Image layers are immutable. If a file is added into a layer and then deleted in the next layer, the file still exists in the image. The file just turns unavailable in the Docker container.
Therefore, merge multiple commands that are of similar updating probability to avoid unnecessary costs. In the sample Dockerfile, Node.js and npm are installed together. That means Node.js is reinstalled each time the source code is modified, which is time and resource consuming.
FROM ubuntu ADD . /app RUN apt-get update \ && apt-get install -y nodejs \ && cd /app \ && npm install CMD npm start
It would be better to write the Dockerfile as follows:
FROM ubuntu RUN apt-get update && apt-get install -y nodejs ADD . /app RUN cd /app && npm install CMD npm start
Specify an Image Tag
If no tag is specified for an image, the image will be tagged with latest by default. For example, the FROM ubuntu command is equivalent to FROM ubuntu:latest. During an image update, latest will point to a new tag. The image build may fail.
In the sample Dockerfile, tag the ubuntu image with 16.04 as follows:
FROM ubuntu:16.04 RUN apt-get update && apt-get install -y nodejs ADD . /app RUN cd /app && npm install CMD npm start
Delete Unnecessary Files
Assume that you have updated the apt-get sources, installed some software packages, and saved them in the /var/lib/apt/lists/ directory.
However, these files are not required to run applications. To make the Docker image more lightweight, it is advised to delete these unnecessary files.
Therefore, in the sample Dockerfile, the files in the /var/lib/apt/lists/ directory are deleted.
FROM ubuntu:16.04 RUN apt-get update \ && apt-get install -y nodejs \ && rm -rf /var/lib/apt/lists/* ADD . /app RUN cd /app && npm install CMD npm start
Select a Suitable Base Image
In our sample Dockerfile, ubuntu is selected as the base image. However, as you only need to run a node program, there is no need to use a general base image. A node image would be a better choice.
A node image tagged with alpine is recommended. Alpine is a lightweight Linux distribution with a size of only 4 MB.
FROM node:7-alpine ADD . /app RUN cd /app && npm install CMD npm start
Set WORKDIR and CMD
WORKDIR can be used to set a default directory where RUN, CMD, and ENTRYPOINT commands will be run.
CMD provides default commands to be executed when running a container from an image. Write the commands in an array.
FROM node:7-alpine WORKDIR /app ADD . /app RUN npm install CMD ["npm", "start"]
(Optional) Use ENTRYPOINT
ENTRYPOINT is optional because it increases complexity. ENTRYPOINT is a script that is executed by default. It uses the specified commands as its parameters. It is usually used to create executable Docker images.
FROM node:7-alpine WORKDIR /app ADD . /app RUN npm install ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Use exec in ENTRYPOINT
In the preceding ENTRYPOINT script, exec is used to run a node application. If exec is not used in ENTRYPOINT, the container cannot be successfully closed because the SIGTERM signal is interrupted by the bash process. The process started by exec can replace the bash process. In this way, all signals can work normally.
Use COPY Preferentially
COPY is simply used to copy files to images. ADD is more complex and can be used to download remote files and decompress packages.
FROM node:7-alpine WORKDIR /app COPY . /app RUN npm install ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Change the Order of COPY and RUN
Place the parts that are infrequently changed in the front of your Dockerfile to make the most out of the image cache.
In the sample Dockerfile, the source code changes frequently. Every time the image is built, npm needs to be reinstalled. To avoid this issue, copy package.json first, then install npm, and at last copy the rest of the source code. In this way, changes of the source code will not result in repetitive installation of npm.
FROM node:7-alpine WORKDIR /app COPY package.json /app RUN npm install COPY . /app ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Set Default Environment Variables, Mapping Ports, and Data Volumes
Environment variables may be required when running a Docker container. Setting default environment variables in Dockerfile is a good choice. In addition, you can set mapping ports and data volumes in the Dockerfile. Example:
FROM node:7-alpine ENV PROJECT_DIR=/app WORKDIR $PROJECT_DIR COPY package.json $PROJECT_DIR RUN npm install COPY . $PROJECT_DIR ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Environment variables specified by ENV can be used in containers. If you only need to specify variables for image build, you can use ARG instead.
Use EXPOSE to Set Listening Ports
EXPOSE is used to describe which ports your containers will listen on. For example, set EXPOSE 80 for an Apache image and EXPOSE 27017 for a MongoDB image.
For external access, use a flag to map ports when executing docker run.
FROM node:7-alpine ENV PROJECT_DIR=/app WORKDIR $PROJECT_DIR COPY package.json $PROJECT_DIR RUN npm install COPY . $PROJECT_DIR ENV APP_PORT=3000 EXPOSE $APP_PORT ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Use VOLUME to Manage Data Volumes
VOLUME is used to access database storage files, configuration files, or files and directories of created containers. You are advised to use VOLUME to manage the image modules that can change or the modules modifiable for users.
In the sample Dockerfile, a media directory is added.
FROM node:7-alpine ENV PROJECT_DIR=/app WORKDIR $PROJECT_DIR COPY package.json $PROJECT_DIR RUN npm install COPY . $PROJECT_DIR ENV MEDIA_DIR=/media \ APP_PORT=3000 VOLUME $MEDIA_DIR EXPOSE $APP_PORT ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Use Labels to Configure Image Metadata
Add labels to help organize images, record permissions, and automate image build. Starting with LABEL, add one or more labels with each label occupying one line.

If your string contains spaces, put the string in quotation marks ("") or convert the spaces into escape characters. If the string itself contains quotation marks, convert the quotation marks.
FROM node:7-alpine LABEL com.example.version="0.0.1-beta"
Use HEALTHCHECK
When running a container, you can enable the --restart always option. In this case, the Docker daemon restarts the container when the container crashes. This option is useful for containers that need to run for a long time. What if a container is running but unavailable? HEALTHCHECK enables Docker to periodically check the health status of containers. You only need to specify a command. If the containers are normal, 0 is returned. Otherwise, 1 is returned. When the request fails and the curl --fail command is run, a non-zero state is returned. Example:
FROM node:7-alpine LABEL com.example.version="0.0.1-beta" ENV PROJECT_DIR=/app WORKDIR $PROJECT_DIR COPY package.json $PROJECT_DIR RUN npm install COPY . $PROJECT_DIR ENV MEDIA_DIR=/media \ APP_PORT=3000 VOLUME $MEDIA_DIR EXPOSE $APP_PORT HEALTHCHECK CMD curl --fail http://localhost:$APP_PORT || exit 1 ENTRYPOINT ["./entrypoint.sh"] CMD ["start"]
Compile the .dockerignore File
The functions and syntax of the .dockerignore file are similar to those of the .gitignore file. You can ignore unnecessary files to accelerate image build and reduce image size.
Before image build, Docker needs to prepare the context by collecting all required files to the process. By default, the context contains all files in the Dockerfile directory. However, files in directories such as .git are unnecessary.
Example:
.git/
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot