Boost GitHub Actions Speed with Effective Dependency Caching

Boost GitHub Actions Speed with Effective Dependency Caching

As DevOps Engineers, we are in the world of “automation.” From writing scripts, whether in BASH or Python, to creating CI/CD pipelines to automate the whole software development process to deployment, CI/CD (Continuous Integration and continuous development) are essential parts of the software development process and GitHub Actions became the most popular tool in automating these process.

In this blog, we will explore GitHub Actions where we will see the effective caching strategy to follow in multiple applications because caching plays a very important role in making our pipeline execution faster and this improves the software delivery processes. Before, moving forward check out my previous blogs:

Building a Serverless Web Application with AWS Lambda, API Gateway, DynamoDB, S3

End-to-End DevOps for a Golang Web App: Docker, EKS, AWS CI/CD

Deploying Your Website on AWS S3 with Terraform

Learn How to Deploy Scalable 3-Tier Applications with AWS ECS

Introduction to GitHub Actions and Dependency Caching

Nowadays, most companies are migrating towards GitHub Actions from Jenkins because of security issues and Jenkins is self-hosted for most of the companies so we need to maintain the servers at our data center. GitHub Actions is another CI/CD platform where developers can automate their workflows directly within the GitHub repositories.

In GitHub Actions many features enhance the security, and fill some of the bottlenecks in the pipeline but when it comes to dependency caching many users often face performance bottlenecks, especially when installing dependencies or repeating tasks across multiple jobs. Dependency Caching is a technique that should be followed to decrease the pipeline build time which will also decrease the resource consumption. By caching the artifacts, and libraries we can use these as artifacts in the other builds also.

In GitHub Actions, it has the built-in mechanism through the actions/cache actions, where users can cache the dependencies by using this action in the pipeline.

Best Practices for Implementing Dependency Caching

  1. Use Effective Cache Keys: Hash key files like package.json or requirements.txt to keep the cache relevant.

  2. Avoid Cache Conflicts: Use unique keys for different environments or branches to prevent mix-ups.

  3. Implement Cache Invalidation: Change cache keys when dependencies update.

  4. Manage Cache Size: Big caches can slow things down, so stick to caching only the essential files.

How Caching Works in GitHub Actions

Now, we will implement the caching action in the NodeJs application, where it will cache the Docker image layers by using the gha cache, i.e. GitHub Actions cache, which is by default. Bit GitHub Actions cache provides only caching storage of 10GB, so most corporate environments use the remote shared cache to store artifacts on Nexus or AWS CodeArtifacts, and for Docker Builds, they can use Harbor or DockerHub.

If I had to go with like I want to store node_modules for NodeJs or requirements.txt (venv) for Python we can use GitHub Actions Cache or S3-compatible storage (Minio). For now, let’s implement docker caching layers on the GitHub Actions cache and check the build speed.

GitHub Code: github.com/amitmaurya07/DevSecOps-GHA

.github/workflows/pipeline.yml:

name : NodeJs Application and Caching
on:
    workflow_dispatch:
    push:
        branches:
            - master
jobs:
    docker-build:
        runs-on: ubuntu-latest
        steps:
            - name: Checkout Code
              uses: actions/checkout@v3

            - name: Cache Docker layers
              uses: actions/cache@v3
              with:
                path: /tmp/.buildx-cache
                key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
                restore-keys: |
                  ${{ runner.os }}-buildx-

            - name: Login to docker-hub
              uses: docker/login-action@v1
              with:
                username: ${{ vars.DOCKER_USERNAME }}
                password: ${{ secrets.docker_pat }}

            - name: Set up Docker Buildx
              uses: docker/setup-buildx-action@v3

            - name: Build and Push Image
              uses: docker/build-push-action@v6
              with:
                context: .
                push: true
                tags: "amaurya07/devsecops_app:${{ github.ref_name }}"
                cache-from: type=local,src=/tmp/.buildx-cache
                cache-to: type=local,dest=/tmp/.buildx-cache,new=true

Let’s simplify this pipeline.yml file and break it into two parts to understand it properly.

name : NodeJs Application and Caching
on:
    workflow_dispatch:
    push:
        branches:
            - master

In the first part, we had given the name of the pipeline “NodeJs Application and Caching“ then we had set the trigger action by defining on when the pipeline will trigger here the pipeline will trigger on push from the master branch and workflow_dispatch means that we can trigger manually pipeline when we want. It means when anything is pushed on the master branch the pipeline is executed automatically.

jobs:
    docker-build:
        runs-on: ubuntu-latest
        steps:
            - name: Checkout Code
              uses: actions/checkout@v3

            - name: Cache Docker layers
              uses: actions/cache@v3
              with:
                path: /tmp/.buildx-cache
                key: ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }}
                restore-keys: |
                  ${{ runner.os }}-buildx-

            - name: Login to docker-hub
              uses: docker/login-action@v1
              with:
                username: ${{ vars.DOCKER_USERNAME }}
                password: ${{ secrets.docker_pat }}

            - name: Set up Docker Buildx
              uses: docker/setup-buildx-action@v3

            - name: Build and Push Image
              uses: docker/build-push-action@v6
              with:
                context: .
                push: true
                tags: "amaurya07/devsecops_app:${{ github.ref_name }}"
                cache-from: type=local,src=/tmp/.buildx-cache
                cache-to: type=local,dest=/tmp/.buildx-cache,new=true

In the second part, we defined the job after the trigger i.e. docker-build then we defined the runs-on which means where the steps in the job that are defined are executed so we defined ubuntu-latest which is provided by GitHub Actions runners. In the Actions tab of the repository navigate to the Runners section and check the name of the runner in Standard GitHub-hosted runners.

Now come, the steps that are defined in a job there are 4 steps in the jobs (Checkout Code, Login to DockerHub, Docker Buildx, then Docker Build and Push)

  1. Checkout Code: Actions is actions/checkout@v3 which will checkout the source code from the repository.

  2. Cache Docker Layers: Actions is actions/cache@v3 where it will cache the docker layers in locally in the /tmp directory where the key is ${{ runner.os }}-buildx-${{ hashFiles('**/Dockerfile') }} that means the compute the hashes of content inside the Dockerfile whenever the Dockerfile changes then a new cache will be used.

  3. Login to DockerHub: Actions is docker/login-action@v1 where it uses the variable and secrets that are in the repository for login to dockerhub to push the image.

  4. Docker Buildx: Actions is docker/setup-buildx-action@v3 where we need to set up the docker build as we are using the build and push action in the next step and it is also used to cache the docker image layers and it can build multiple stages or architectures in parallel, reducing build time for complex Dockerfiles.

  5. Docker Build and Push: In this step set the context to the present working directory where the Dockerfile is present then set the push to true with tags then the cache-from has the type=local where the src is set to the /tmp directory where the cache is stored in the earlier step and in cache-to is also set to type=local which has the new=true means forces the creation of new caches rather than updating the existing caches.

As we can see in the next build it was importing the cache from the gha and all the layers are cached in the next build.

Node.js

- uses: actions/cache@v3
  with:
    path: node_modules
    key: node-${{ hashFiles('package-lock.json') }}

Python

- uses: actions/cache@v3
  with:
    path: ~/.cache/pip
    key: pip-${{ hashFiles('requirements.txt') }}

Java (Maven)

- uses: actions/cache@v3
  with:
    path: ~/.m2/repository
    key: maven-${{ hashFiles('pom.xml') }}

Conclusion: The Future of Fast CI/CD with GitHub Actions

Dependency caching is a great way to speed up your CI/CD pipelines with GitHub Actions. By using the right caching strategies, you can cut down build times, get faster feedback, and save on resources. Give caching a try today to create workflows that are more efficient and scalable.

GitHub Code: https://github.com/amitmaurya07/DevSecOps-GHA

Twitter : x.com/amitmau07

LinkedIn: linkedin.com/in/amit-maurya07

If you have any queries you can drop the message on LinkedIn and Twitter.

Support My Work ☕️
If you found this blog helpful and want to support my work, consider buying me a coffee! Your support keeps me motivated to create more content like this.

Thank you for your support!