GitLab Container Registry is a convenient choice to store Docker images when using GitLab CI. When every pipeline produces a new Docker image tag, you might want to clean up these image tags periodically. By default GitLab only offers a simplified Cleanup policy, which relies on regular expressions to clean up old image tags. But this approach does not take into account which image tags were recently deployed to your environments.

In this blogpost we outline an alternative image tag cleanup mechanism. We query the GitLab API to see which image tags were recently deployed to our environments, and retain these image tags in case we want to rollback.

Tag Docker images

First we tag our Docker images consistently, to contain the commit SHA as suffix.

Listing 1. .gitlab-ci.yaml snippet to tag docker images consistenly with commit SHA suffix
# https://gitlab.com/gitlab-org/gitlab/-/blob/master/lib/gitlab/ci/templates/Docker.gitlab-ci.yml
build_image:
  stage: build
  services:
    - docker:dind
  before_script:
    - export TAG=$(git show -s --format=%as | awk -F '-' '{print $1"."$2"."$3}').${CI_PIPELINE_IID}-${CI_COMMIT_SHORT_SHA}
    - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
  script:
    - docker build --pull --network host -t "$CI_REGISTRY_IMAGE:$TAG" -t "$CI_REGISTRY_IMAGE:$CI_PIPELINE_IID-$CI_COMMIT_SHORT_SHA" .
    - docker push "$CI_REGISTRY_IMAGE:$TAG"
    - docker push "$CI_REGISTRY_IMAGE:$CI_PIPELINE_IID-$CI_COMMIT_SHORT_SHA"
  rules:
    - if: $CI_COMMIT_BRANCH
      exists:
        - Dockerfile

This will produce tags containing the commit date, pipeline ID and commit SHA as shown below.

Listing 2. Sample image tags.
registry.gitlab.com/acme/frontend:2022.10.22.176-4a52ccd4
registry.gitlab.com/acme/frontend:2022.10.24.179-42ee0ccb
registry.gitlab.com/acme/frontend:2022.10.25.182-a38b3425

Track deployment environments

Secondly we track any deployments we do to our target environments, using GitLab CI environments.

Listing 3. Track deployments to environments
variables:
  APPNAME: $CI_PROJECT_NAME

deploy_test:
  stage: test
  needs: ["build_image"]
  resource_group: test
  variables:
    NAMESPACE: test
  before_script:
    - export TAG=$(git show -s --format=%as | awk -F '-' '{print $1"."$2"."$3}').${CI_PIPELINE_IID}-${CI_COMMIT_SHORT_SHA} (1)
  script:
    - "/deploy.sh $APPNAME $NAMESPACE $CI_REGISTRY_IMAGE:$TAG"
  environment:
    name: test (2)
    deployment_tier: testing

deploy_production:
  stage: deploy
  only:
    - main
  when: manual
  resource_group: production
  variables:
    NAMESPACE: production
  before_script:
    - export TAG=$(git show -s --format=%as | awk -F '-' '{print $1"."$2"."$3}').${CI_PIPELINE_IID}-${CI_COMMIT_SHORT_SHA}
  script:
    - "/deploy.sh $APPNAME $NAMESPACE $CI_REGISTRY_IMAGE:$TAG"
  after_script:
    - 'echo "$GITLAB_USER_NAME deployed $CI_PROJECT_NAME commit _$CI_COMMIT_TITLE_ to $NAMESPACE."'
  environment:
    name: production (3)
    deployment_tier: production
1 Notice how we construct the same image tag once again, as argument to our deployment script.
2 We track deployments to a test environment separately.
3 And we track deployments to the production environment.

This way GitLab will track deployments, and expose this information through their Environments API.

Query the GitLab API

Finally we need a small application to query the GitLab API.

We will create a Spring Boot application using Maven.

pom.xml file with dependency on gitlab4j-api
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.7.5</version>
    <relativePath />
  </parent>
  <groupId>acme</groupId>
  <artifactId>cleanup-unused-image-tags</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
    <dependency>
      <groupId>org.gitlab4j</groupId>
      <artifactId>gitlab4j-api</artifactId>
      <version>5.0.1</version>
    </dependency>
  </dependencies>
</project>

We invoke the Spring Boot application directly from GitLab CI.

Listing 4. .gitlab-ci.yaml
report:
  image: eclipse-temurin:17.0.4_8-jdk
  only:
    - schedules
  variables:
    app_gitlabhost: $CI_SERVER_URL
  script:
    - './mvnw spring-boot:run'

We need a bit of plumbing to connect to the GitLab API and query active projects.

GitLabClient.java
import static java.time.Instant.now;

import java.time.Duration;
import java.util.Date;
import java.util.List;

import org.gitlab4j.api.Constants.ProjectOrderBy;
import org.gitlab4j.api.Constants.SortOrder;
import org.gitlab4j.api.GitLabApi;
import org.gitlab4j.api.GitLabApiException;
import org.gitlab4j.api.models.GroupProjectsFilter;
import org.gitlab4j.api.models.Project;
import org.springframework.stereotype.Component;

@Component
class GitLabClient implements AutoCloseable {

  private final GitLabApi gitlabApi;

  public GitLabClient(String gitlabhost, String accesstoken) {
    this.gitlabApi = new GitLabApi(gitlabhost, accesstoken);
  }

  public List<Project> getProjects(String namespace, boolean includeSubGroups) throws GitLabApiException {
    return getGitlabApi().getGroupApi()
        .getProjectsStream(namespace, new GroupProjectsFilter()
            .withArchived(false)
            .withSimple(true)
            .withOrderBy(ProjectOrderBy.PATH)
            .withSortOder(SortOrder.ASC)
            .withIncludeSubGroups(includeSubGroups))
        // Only recently changed projects
        .filter(project -> project.getLastActivityAt().after(Date.from(now().minus(Duration.ofDays(365)))))
        .toList();
  }

  @Override
  public void close() throws Exception {
    this.gitlabApi.close();
  }

  public GitLabApi getGitlabApi() {
    return gitlabApi;
  }

}

And then finally we end with the class to clean up unused image tags.

Listing 5. CleanUpImageTags.java
import java.util.List;
import java.util.Objects;
import java.util.Set;

import org.gitlab4j.api.ContainerRegistryApi;
import org.gitlab4j.api.EnvironmentsApi;
import org.gitlab4j.api.GitLabApiException;
import org.gitlab4j.api.models.Deployment;
import org.gitlab4j.api.models.Project;
import org.gitlab4j.api.models.RegistryRepository;
import org.gitlab4j.api.models.RegistryRepositoryTag;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.boot.ApplicationArguments;
import org.springframework.boot.ApplicationRunner;
import org.springframework.stereotype.Component;

@Component
class CleanUpImageTags implements ApplicationRunner {

  private static final Logger log = LoggerFactory.getLogger(CleanUpImageTags.class);

  private final GitLabClient gitlabClient;

  public CleanUpImageTags(GitLabClient gitlabClient) {
    this.gitlabClient = gitlabClient;
  }

  @Override
  public void run(ApplicationArguments args) throws Exception {
    EnvironmentsApi environmentsApi = gitlabClient.getGitlabApi().getEnvironmentsApi();
    for (Project project : gitlabClient.getProjects("acme", true)) {

      // Retrieve current deployments, to not delete images for any of those
      List<String> currentDeployments = environmentsApi
          .getEnvironmentsStream(project.getId())
          .map(env -> {
            try {
              // Look up last deployment, which is null on input, but not on per-env call
              return environmentsApi.getEnvironment(project.getId(), env.getId()).getLastDeployment();
            } catch (GitLabApiException e) {
              throw new RuntimeException(e);
            }
          })
          .filter(Objects::nonNull)
          .map(Deployment::getSha)
          .map(sha -> sha.substring(0, 8))
          .toList();
      if (currentDeployments.isEmpty()) {
        log.warn("No current deployments found for {}; Skipping container registry tag deletion as a precaution",
            project.getPathWithNamespace());
        continue;
      }

      // Retrieve build sha's recently deployed to production environments
      List<String> productionShasToRetain = gitlabClient.getGitlabApi().getDeploymentsApi()
          .getProjectDeploymentsStream(project.getId())
          .filter(deployment -> Set.of("production").contains(deployment.getEnvironment().getName()))
          .map(Deployment::getSha)
          .map(sha -> sha.substring(0, 8))
          .toList();
      if (productionShasToRetain.isEmpty()) {
        log.warn("No production deployments found for {}; Skipping container registry tag deletion as a precaution",
            project.getPathWithNamespace());
        continue;
      }

      // Clean up any container registry tags never deployed to production
      ContainerRegistryApi containerRegistryApi = gitlabClient.getGitlabApi().getContainerRegistryApi();
      for (RegistryRepository repository : containerRegistryApi.getRepositories(project.getId())) {
        List<String> shaContainerTagsToRemove = containerRegistryApi
            .getRepositoryTagsStream(project, repository.getId())
            .map(RegistryRepositoryTag::getName)
            .filter(tag -> currentDeployments.stream().noneMatch(tag::endsWith))
            .filter(tag -> productionShasToRetain.stream().noneMatch(tag::endsWith))
            .toList();
        for (String tag : shaContainerTagsToRemove) {
          log.info("Delete {} container tag {} from {}",
              project.getPathWithNamespace(), tag, repository.getLocation());
          containerRegistryApi.deleteRepositoryTag(project.getId(), repository.getId(), tag);
        }
      }
    }
  }
}

Conclusion

We have seen how the default cleanup policy options in GitLab are typically insufficient to retain deployed image tags. By tracking our deployments in GitLab CI, we can query that information through the API. This allows us to only remove image tags that have not recently been deployed to production environments.

shadow-left