Solving Helm Repository Performance Issues with Automated Chart Cleanup in ChartMuseum

A real-world DevOps troubleshooting guide on reducing Helm repository size, improving CI/CD performance, and automating old chart cleanup in ChartMuseum using retention policies.

Sowmya N | TechGalary

5/7/20263 min read

Introduction

If your Jenkins pipelines suddenly start failing during Helm operations with errors like: fatal error: runtime: out of memory

while executing commands such as: helm fetch --untar

then your Helm repository itself might be the hidden problem.

We recently faced this exact issue while using ChartMuseum as our internal Helm repository. After investigation, we discovered that the actual root cause was an oversized index.yaml file caused by years of accumulated Helm chart versions.

In this blog, I’ll explain:

  • What caused the issue

  • Why Helm repositories become slow over time

  • How this affected Jenkins pipelines

  • The cleanup strategy we implemented

  • Best practices for Helm chart retention

The Problem

Our CI/CD pipelines continuously published new Helm chart versions during every deployment.

Example:

my-app-1.0.1.tgz

my-app-1.0.2.tgz

my-app-1.0.3.tgz

Since we were using ChartMuseum, every published chart version was added to the repository metadata file: index.yaml

Over time:

  • thousands of old chart versions accumulated

  • index.yaml became extremely large

  • Helm repository operations became slower

  • Jenkins agents started consuming excessive memory

Eventually, some Jenkins builds failed with: fatal error: runtime: out of memory during: helm fetch --untar

Root Cause Analysis

Initially, it looked like a Jenkins memory issue.

But after deeper analysis, the real problem turned out to be:

  • oversized Helm repository metadata

  • excessive old chart versions

  • high memory usage while Helm processed repository indexes

This especially impacts:

  • Kubernetes-based Jenkins agents

  • low-memory CI/CD containers

  • parallel build environments

The Helm repository itself had become bloated over time.

Existing Repository Structure

We were storing:

  • development charts

  • QA releases

  • production releases

  • snapshot builds

inside ChartMuseum.

Example:

my-app:
1.0.1
1.0.2
1.0.3
...
1.0.742

Most historical versions were no longer required, but they still remained inside the repository metadata.

The Solution

To solve this issue, we decided to implement an automated Helm chart cleanup policy using the ChartMuseum DELETE API.

The plan was simple:

  1. Read all chart versions

  2. Keep only the latest required versions

  3. Delete older unused versions

  4. Reduce the size of index.yaml

Understanding ChartMuseum DELETE API

ChartMuseum provides APIs to manage chart versions.

List all charts

curl https://chartmuseum.example.com/api/charts

Get chart versions

curl https://chartmuseum.example.com/api/charts/my-app

Delete specific chart version

curl -X DELETE \
https://chartmuseum.example.com/api/charts/my-app/1.0.12

This API became the foundation of our cleanup automation.

Cleanup Strategy

We introduced retention policies based on environments.

Environment Retention Policy

Dev Charts Keep latest 5

QA Charts Keep latest 10

Production Charts Keep latest 20

This helped us:
  • reduce repository size

  • improve Helm performance

  • stabilize Jenkins pipelines

  • reduce memory usage

Cleanup Pipeline Design

Instead of manually deleting charts, we created a separate cleanup pipeline.

Pipeline Flow

This allowed us to safely clean historical data without affecting active deployments.

Important Safeguards

Before deleting any charts, we added several safety checks.

1. Dry Run Mode

Initially, the cleanup pipeline only printed what would be deleted:

DRY_RUN=true

This helped validate cleanup logic safely.

2. Protect Active Versions

We ensured the cleanup never deleted:

  • currently deployed versions

  • stable production releases

  • tagged release builds

Example:

helm list -A

3. Semantic Version Sorting

Simple string sorting can produce incorrect results:

1.0.10
1.0.2

Instead, we used semantic version sorting to correctly identify older versions.

Why We Did Not Integrate Cleanup Directly Into the Publish Pipeline

Initially, we considered deleting old versions immediately after publishing new charts.

Example:

Publish chart

Delete oldest version

However, we decided against this approach.

Reasons
  • rollback versions may still be required

  • failed deployments can create operational risks

  • debugging becomes difficult if versions disappear immediately

Instead, we implemented:

  • scheduled cleanup jobs

  • configurable retention policies

  • safer operational maintenance

Results After Cleanup

After implementing automated chart cleanup:

  • Helm repository size reduced significantly

  • index.yaml became much smaller

  • Jenkins OOM failures disappeared

  • Helm fetch operations became faster

  • CI/CD stability improved

Best Practices

If you are managing Helm repositories at scale, I strongly recommend:

Use Retention Policies

Never allow unlimited chart accumulation.

Separate Cleanup Jobs

Avoid cleanup logic inside deployment pipelines.

Use:

  • Jenkins scheduled jobs

  • Kubernetes CronJobs

  • maintenance pipelines

Keep Production Releases Longer

Development charts can be aggressively cleaned, but production releases should have longer retention periods.

Monitor Repository Growth

Track:

  • index.yaml size

  • repository response time

  • Helm fetch latency

  • chart count growth

Final Thoughts

Helm repositories are often ignored until they start affecting CI/CD performance.

A simple automated cleanup strategy can:

  • improve Jenkins stability

  • reduce memory usage

  • speed up Helm operations

  • keep repositories manageable

For teams using ChartMuseum in enterprise environments, implementing Helm chart retention policies should be considered an essential operational practice rather than an optional optimization.