How to copy files into a Docker container without using the COPY command
If you’ve ever written a Dockerfile, you’ve likely used the COPY command to transfer files and folders into your container.
While useful, there are times when you need to add files to a running container without stopping it or rebuilding the image. Let’s explore alternative methods for transferring files into your Docker container.
data:image/s3,"s3://crabby-images/f505f/f505f9db35bd448c73c358844b0fee484ac0c8a7" alt=""
This guide is ideal for any Data Guy or Software Engineers that want to develop in a container or try and update code with various data without stoping their container.
Let’s go!
Prerequisites
Before starting, make sure you have:
- Basic understanding of Docker (this tutorial can help)
- Docker installed on your machine
Getting Started
Here are 3 methods to copy files into a running container without using the COPY command in your Dockerfile:
- Using
docker cp
command - Using bind mounts
- Using
wget
orcurl
Preparations
Before diving into the methods, let’s do some preparation:
- Download this csv data that we’ll use and save it to your workspace
- Run a FastAPI container where we’ll push our data and check their presence using a specific endpoint.
Pull the Docker image for tutorial
docker pull ghcr.io/bricefotzo/docker-init-example:main
Run the container
docker run -d --name my-api -p 8000:8000 ghcr.io/bricefotzo/docker-init-example:main
Navigating to the URL localhost:8000
in your browser, you’ll see this:
data:image/s3,"s3://crabby-images/962ab/962ab37ae7f5e2aa2b0b9079fa817edeb4b8499d" alt=""
The one we’ll use to check data is the second(/tree
).
You can try this endpoint to check what is inside the folder of the app: app
.
data:image/s3,"s3://crabby-images/e128e/e128e7378ce70ec3607b30675408e874953f4b74" alt=""
As you can see, listing the directory of the project(/app
), we have 4 files and 1 folder. In the following methods, we’ll use different methods to check data availability.
Let’s start transferring files !
Method 1: Using docker cp Command
With my-api
container running, we’ll add data in it without stopping it or rebuilding the image.
The docker cp
command allows you to copy files or directories between the host and a container's filesystem, even after the container is running.
How to use docker cp:
Usage:
docker cp [OPTIONS] CONTAINER:SRC_PATH DEST_PATH|-
docker cp [OPTIONS] SRC_PATH|- CONTAINER:DEST_PATH
Copy a file from host to container:
docker cp ./iris.csv my-api:/app
#Successfully copied 6.66kB to my-api:/app
Copy a directory from host to container:
docker cp ./data my-api:/app
#Successfully copied 12.8kB to my-api:/app
As you can see, 2 items were added: iris.csv
and data
.
data:image/s3,"s3://crabby-images/f3cdc/f3cdc49e21ecd00310ca30aea089d4a107dfb74a" alt=""
Checking in /app/data
, we have 2 files: iris.csv
and iris2.csv
.
data:image/s3,"s3://crabby-images/543fd/543fd203fb2bcd5e0a2d591ba74bf2511c0cfe23" alt=""
Copy a file from container to host:
Let’s copy the file main.py into our host and let’s check what is inside.
docker cp my-api:/app/main.py ./
#Successfully copied 2.56kB to /Users/<username>/workspace/docker-init-example/data/./
When to use docker cp :
- When you need to update files in a running container without rebuilding the image.
- For temporary changes or debugging purposes.
Method 2: Using Bind Mounts
Bind mounts allow you to mount a directory from the host filesystem into the container. This method provides a direct mapping between the host and container, so changes in the host directory are immediately reflected in the container and vice versa.
How to use bind mounts:
Create a workspace on our host and add some files:
mkdir api-data
touch api-data/data1.csv
touch api-data/data2.csv
Start another API container with volume mounts:
docker run -d --name my-api2 -v ./api-data:/app/data/api -p 8001:8000 ghcr.io/bricefotzo/docker-init-example:main
The container is named my-api2
and the volume bind is done between the local folder api-data
that we just created and the container folder /app/data/api
.
Note: The port binding is changed to avoid conflict with the previous container. You can clean up the previous one and continue using the same container name and port if you want.
data:image/s3,"s3://crabby-images/20a95/20a95d8b76aca7c4f846bdde8eb864ce0ce4cc9d" alt=""
Let’s just add another file into the local directory api-data
and check the content of /app/data/api
.
touch api-data/data3.csv
Let’s check:
data:image/s3,"s3://crabby-images/99cf3/99cf35699ac69c4c267ec0b93bd8dbfc2236a6d4" alt=""
When to use bind mounts:
- For development environments where you want to reflect code changes in real-time without rebuilding the image.
- When you need persistent storage that directly maps to the host filesystem.
Method 3: Using wget or curl
Sometimes, it’s more efficient to download files directly from the internet into your container using tools like wget
or curl
.
How to use wget or curl:
Download a file using docker exec and wget:
docker exec -it my-api2 /bin/sh -c "wget https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data -O /app/data/api/iris.csv"
Output:
--2024-06-12 18:29:52-- https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: '/app/data/api/iris.csv'
/app/data/api/iris.csv [ <=> ] 4.44K --.-KB/s in 0.001s
2024-06-12 18:29:54 (3.36 MB/s) - '/app/data/api/iris.csv' saved [4551]
Download a file using docker exec and curl:
docker exec -it my-api2 /bin/sh -c "curl -o /app/data/api/iris2.csv https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
Output:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 4551 0 4551 0 0 8538 0 --:--:-- --:--:-- --:--:-- 8586
Then let’s check if the data have been downloaded in the target folder.
data:image/s3,"s3://crabby-images/65f1d/65f1df3f44b7a22d1dfc3548f9d152583a5754ca" alt=""
When to use wget or curl:
- When you need to download files from a remote server directly into the container.
- For small files or configuration scripts needed at runtime.
Conclusion
While the COPY command in a Dockerfile is useful for static file transfers during image build time, these alternative methods offer greater flexibility for various scenarios:
docker cp
for on-the-fly file transfers.- Bind mounts for real-time development and persistent storage.
wget
orcurl
for downloading files directly into the container.
By utilizing these methods, you can manage file transfers more effectively in your Docker workflows, making your tasks as a data engineer more efficient and streamlined.
Let’s connect to continue the discussion!