When working on a development project, it is often necessary to have a pre-populated database for testing and development purposes. This process is known as seeding the database.

Seeding, in the context of databases, refers to the process of populating a database with initial data sets. This data serves as a foundation for testing functionalities, demonstrating application behavior, and providing a starting point for development. Seeding is particularly useful during:

  • Application Development: Seeding allows developers to test application logic against a populated database, ensuring functionalities interact with data as intended.

  • Integration Testing: During integration testing, seed data can be used to simulate real-world scenarios and verify proper data flow between different components of the system.

  • Demonstration Environments: Seeding a database with sample data can be beneficial for showcasing application features and functionalities in a demonstration environment.

Prerequisites : Before we begin, ensure you have the following prerequisites:

  1. Docker and Docker Compose are installed on your machine.

  2. A MongoDB image pulled from Docker Hub.

  3. A JSON file containing the data you want to seed into the database.

Creating a Docker Compose File

The first step is to create a Docker Compose file that defines the services required for our project. In this case, we need a MongoDB service. Create a file named docker-compose.yml and add the following code:

version: '3'
services:
  mongodb:
    image: mongo
    container_name: mongodb
    ports:
      - "27017:27017"
    volumes:
      - ./data:/data/db
      - ./seed.js:/docker-entrypoint-initdb.d/seed.js
    environment:
      MONGO_INITDB_ROOT_USERNAME: root
      MONGO_INITDB_ROOT_PASSWORD: example
    command: ["mongod", "--auth"]

In the above code, we define a service named mongodb that uses the official MongoDB image from Docker Hub. We also specify a container name, expose port 27017, and mount two volumes. The first volume mounts the ./data directory on the host machine to the /data/db directory inside the container. This allows us to persist data between container restarts. The second volume mounts the seed.js file to the /docker-entrypoint-initdb.d directory inside the container. This directory is used by the MongoDB image to execute scripts during container startup.

We also specify an empty command to disable MongoDB's built-in authentication. This is not recommended for production environments, but it simplifies our setup for development purposes.

// seed.js
db = db.getSiblingDB('mydb');


db.mycollection.insert({
   "name": "John Doe",
   "age": 30,
   "city": "New York",
   "has_children": false,
   "children": null
});


print("Data has been written to the collection");

Explanation:

  1. Import the json module: This module provides functionalities to work with JSON data.

  2. Prepare the data: Ensure your data is in a dictionary or list format.

  3. Specify the file path: Determine where you want to save the JSON file.

  4. Write the data to a JSON file:

    • Open the file in write mode.

    • Use json.dump() to write the data to the file.

    • The indent parameter is optional but makes the JSON file more readable.

Starting the MongoDB Service

Now that we have our Docker Compose file, seed script, and JSON data file, we can start our MongoDB service. Run the following command in the same directory as your docker-compose.yml file:

docker-compose up -d

This will start the MongoDB service in detached mode, which means it will run in the background.

Checking the Database

To verify that our seed script has populated the database with data, we can use the MongoDB shell to connect to our instance and query the mycollection collection. Run the following command:

docker exec -it mongodb mongosh -u root -p example

This will open the MongoDB shell and connect to our mydb database. Once connected, run the following command to query the mycollection collection:

use mydb;
db.mycollection.find().pretty();

This should output the following:

[
  {
    _id: ObjectId('667936bf4690147f33a26a13'),
    name: 'John Doe',
    age: 30,
    city: 'New York',
    has_children: false,
    children: null
  }
]

This confirms that our seed script has successfully populated the database with data.

Handling Errors

It is important to handle errors that may occur during the seeding process. In the above code, we have added error handling to ensure that the script exits with a non-zero exit code if an error occurs. This can be useful for integrating the seeding process into a larger build or deployment pipeline.

For example, we can use the && operator to ensure that the MongoDB service is only started if the seed script completes successfully. Add the following code to the end of your docker-compose.yml file:

This will run the npm install command to install any dependencies required by the seed script, followed by the npm run seed command to execute the seed script. Finally, the exec mongod command will start the MongoDB service.

Seeding a MongoDB database using Docker Compose offers several significant advantages for development and testing workflows. Here are some key benefits:

Automated Initialization

Docker Compose allows you to automate the database initialization process by defining seed data and executing it alongside other containerized services. This ensures that the database is populated with the necessary data every time a new environment is spun up, which is particularly useful when setting up new development environments or testing scenarios.

Consistent Environments

By using Docker Compose, you can ensure consistent database states across different environments (development, testing, production) by seeding data as part of your container setup. This consistency is crucial for predictable behavior during application development and testing phases.

Highly Repeatable and Quick Environment Setup

Docker Compose simplifies the process of setting up a MongoDB container and seeding it with initial data. This makes it easy to quickly spin up new development environments with the exact same configuration, reducing the time spent on setup and ensuring that all team members work with the same data.

Reduced Development Setup Time

By automating the seeding process, Docker Compose significantly reduces the time spent on setting up a development environment. This allows developers to focus on writing code rather than manually populating the database every time they start a new project or environment.

Default Data in Repository

When using Docker Compose, the seed data is stored in the repository, ensuring that new environments automatically receive the default data. This eliminates the need for manual data population and ensures that all environments start with the same baseline data.

Conclusion

We have explored how to seed a MongoDB database using Docker Compose. We have created a Docker Compose file that defines a MongoDB service, a seed script that populates the database with data, and a JSON data file containing the data we want to seed.

By following these steps, you can easily pre-populate your MongoDB database for testing and development purposes. It is important to note that the above approach is suitable for development and testing environments only. In production environments, it is recommended to use a more secure and robust method for seeding data, such as using a dedicated seeding service or using MongoDB's built-in tools for data import and export.

It is important to ensure that the data being seeded is properly sanitized and validated to prevent any potential security vulnerabilities or data inconsistencies. This can be achieved by using a data validation library or by implementing custom validation logic in the seed script.