Skip to main content

Command Palette

Search for a command to run...

GitOps for Network Engineers - Deploying Nautobot

Deploying Our First Network Automation App - Nautobot!

Updated
GitOps for Network Engineers - Deploying Nautobot

Previous Articles in the Series

Bridging the Gap: GitOps for Network Engineers - Part 1 (Deploying ArgoCD)

Bridging the Gap: GitOps for Network Engineers - Part 2 (Deploying Critical Infrastructure with ArgoCD)

Intro

Here we go! Time to deploy something network automation engineers actually use: Nautobot. For those who are unfamiliar, Nautobot is an open-source Network Source of Truth and automation platform. It gives you a clean API, GraphQL, plugins, and jobs for modeling your network and driving intent-based automation. In a GitOps workflow, Nautobot becomes the living database of network intent and inventory, while Argo CD ensures the platform itself is deployed and maintained declaratively. It’s one of my favorite tools because you can’t have a solid network automation foundation without a solid source of truth (okay, “source of intent” if you prefer). Either way, Nautobot is among the best, kudos to the Network to Code team for a great product. Before we dive in, let’s quickly recap previous GitOps for Network Engeineers posts. If you haven’t read those yet, I’d recommend starting there first. The links are posted above.

Part 1 established the groundwork: why GitOps matters for network engineers (intent-as-code, reviews, rollbacks), installing Argo CD, connecting it to Git, and proving the reconcile loop with a simple, Git-managed deployment.

Part 2 leveled that foundation into a production-ready platform. We declaratively integrated:

  • MetalLB for external service IPs

  • Traefik for ingress routing and TLS

  • Rook-Ceph for durable, cluster-native storage

  • A secrets stack using External Secrets backed by HashiCorp Vault, all continuously managed by ArgoCD.

As a result, the platform can now:

  • Expose apps securely via external IPs and ingress rules

  • Persist data with Ceph-backed volumes

  • Manage secrets without committing them to Git

  • Treat infrastructure the same as applications: defined in code, reconciled by Argo CD

Instead of stamping this post as “Part 3,” I’m branching it off from the foundation posts. That gives me room to play with future installments while still keeping them under the GitOps for Network Engineers umbrella when it makes sense. The goal here is simple: bring a basic Nautobot deployment online, fully managed by ArgoCD, using the same GitOps patterns we established earlier. Specifically, we will:

  • Add the main Nautobot Helm chart to ArgoCD

  • Define (or confirm) a StorageClass for Nautobot’s persistent needs

  • Allocate a MetalLB IP for Traefik to serve Nautobot externally

  • Create Secrets for DB, Redis, and an initial Nautobot superuser

  • Compose Kustomize resources to wrap Helm and environment overlays

  • Author custom values.yml for your environment

  • Deploy the App

When we are done our deployment will include 5 total pods:

  • Nautobot Web (frontend/API) - serves the UI plus REST/GraphQL endpoints

  • Nautobot Celery Worker - executes background jobs and plugin tasks

  • Nautobot Celery Beat - schedules periodic tasks for the worker

  • PostgreSQL - primary application database for Nautobot objects/state

  • Redis - cache and message broker backing Celery queues

This deployment will not include any building of custom container images, Nautobot plugins, or custom Nautobot configurations. I’m planning that for a future post.

Let’s dive in.

Adding Nautobot’s Helm Chart

First things first: let’s add the Nautobot Helm chart to Argo CD. If you followed the earlier posts, this will feel familiar. In the examples below, I’m using my prod-home Argo CD Project, you’ll see that name throughout. Your Project name can (and likely will) be different; substitute your own wherever you see prod-home.

Step 1: Add the Helm Repo

  • Helm Repo URL:
    https://nautobot.github.io/helm-charts/

In the ArgoCD UI:

  • Go to Settings → Repositories

  • Click + CONNECT REPO

  • Enter the Helm repo URL

  • Choose Helm as the type

  • Give the repo a name (Optional)

  • Chose the project you created earlier to associate this repo to (mine was ‘prod-home’)

  • No authentication is needed for this public repo

  • When done, click CONNECT

Once added, ArgoCD can now pull charts from this source.

Note: As seen in Part 2, you’ll also need to add the GitHub repo that contains your custom configuration files, like Helm values.yml files and Kustomize overlays.

  • If you're using my example repo, add https://github.com/leothelyon17/kubernetes-gitops-playground.git as another source, of type Git.

  • If you're using your own repo, just make sure it's added in the same way so ArgoCD can pull your values and overlays when syncing.

Step 2: Create the ArgoCD Application

Head to the Applications tab and click + NEW APP to start the deployment.

Here’s how to fill it out:

  • Application Name: nautobot (or in my case nautobot-prod)

  • Project: Select your project (e.g., prod-home)

  • Sync Policy: Manual for now (we’ll automate later)

  • Repository URL: Select the Helm repo you just added

  • Chart Name: nautobot

  • Target Revision: Use the latest or specify a version (latest is recommended)

  • Cluster URL: Use https://kubernetes.default.svc if deploying to the same cluster (mine might be different than the default, dont worry.)

  • Namespace: nautobot or nautobot-prod to match the ArgoCD application name. Check the box for creating the namespace if it doesn’t exist already in your kubernetes cluster

Click CREATE when finished.

If everything is in order you should see the App created like the screenshot below, though your’s will be all yellow status and ‘OutOfSync’ -

Just like before, ArgoCD will immediately show you all the Kubernetes objects it plans to create. Don’t hit Sync yet. We haven’t done the configuration of the databases, secrets, or persistent storage, so a deploy right now would fail. Databases would fail to mount their volumes. We’ll get there.

For this first section, the goal was simple: pull in the main Nautobot Helm chart, which we’ve done. In previous posts, we’d usually fine-tune the ArgoCD Application to point at our Kustomize overlays or custom helm values. We’ll come back to that once all those pieces exist; if you do this in the Application now ArgoCD will fail on the missing paths. Onward.

Overview for Nautobot’s Helm Values

Here we’ll take a quick pass over Nautobot’s default Helm values so we know exactly where our overrides will land later.

Defaults can be found here:
https://github.com/nautobot/helm-charts/blob/develop/charts/nautobot/values.yaml

For this deployment, we’ll customize these core sections:

  • superuser - bootstrap admin (username/email/password).

  • postgresql - point at our Postgres (in-cluster or external), version, storage, and connection settings.

  • redis - enable/disable and wire the cache/queue endpoint (persistence optional).

A few optional knobs worth calling out:

  • Replicas: under both nautobot and celery, you can set replicas: 1 for dev or tight clusters; bump later as you scale. I will be setting the replicas to ‘1’.

  • Image: under nautobot.image set a specific tag (or a custom image) if you don’t want “latest.” Unless you know what you are doing, leave the defaults for this deployment.

  • Ingress: the chart can create it, but we’re keeping that off and handling exposure via our Kustomize IngressRoute pattern.

That’s it for the big call-outs. We’ll circle back and set those values once the rest of the pieces (storage, secrets, and ingress) are in place later in the post.

Add Persistent Storage

For Nautobot, the one thing that absolutely needs persistence is the primary database, by default that’s PostgreSQL, and it should live on durable storage. Redis handles caching/queuing, and persistence there is optional: if you need cached data to survive pod restarts or rolling updates, back it with a PVC; otherwise keep it ephemeral and let it rebuild as needed.

In the earlier Part 2 post we previously created two CephFS storage classes with Rook-Ceph. For this post I’m using the rook-cephfs-retain class for Postgres and rook-cephfs-delete for Redis (optional) which we will see later in our helm custom values.

CephFS StorageClass (Retain)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  # Name you’ll reference from PVCs (spec.storageClassName)
  name: rook-cephfs-retain
# CSI driver that provisions CephFS-backed volumes via Rook
provisioner: rook-ceph.cephfs.csi.ceph.com

parameters:
  # ----- Tell the CSI driver which Ceph cluster/filesystem to use -----

  # Namespace where your Rook-Ceph cluster runs (operator, mons/osds, etc.)
  # If your cluster is in a different namespace, update this and the secrets below.
  clusterID: rook-ceph

  # Name of the CephFS filesystem (created during CephFS setup)
  # You can confirm with `ceph fs ls`.
  fsName: k8s-ceph-fs

  # Ceph pool backing the filesystem (required when provisionVolume is true)
  # Must match the pool configured for your fsName.
  pool: k8s-ceph-fs-replicated

  # ----- CSI secrets for provisioning/expansion/node-stage (auto-created by Rook) -----

  # Secret used by the provisioner sidecar to create volumes
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph

  # Secret used by the controller for volume expansion operations
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-cephfs-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph

  # Secret used on the node to stage/mount volumes
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-cephfs-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph

  # ----- Optional: choose the client implementation for CephFS mounts -----
  # If omitted, CSI auto-detects. Kernel client is typical in prod.
  # mounter: kernel

# Keep PVs (and data) when PVCs are deleted—safer for DBs and long-lived data
reclaimPolicy: Retain

# Allow growing PVCs in place (kubectl patch ... size: 40Gi, etc.)
allowVolumeExpansion: true

# Mount-time options passed to the client
mountOptions:
  # Uncomment for verbose client debug logs during troubleshooting
  # - debug

Why choose Retain vs Delete

  • Retain keeps the PV (and data) when its PVC is deleted.
    Use it for anything you don’t want accidentally destroyed (databases, long-lived app data, easy rollbacks). The trade-off is manual cleanup later.

  • Delete removes the PV and backend data when the PVC goes away.
    Great for ephemeral/dev workloads where you don’t care about the data. Trade-off: once it’s gone, it’s gone.

Why allowVolumeExpansion is important

  • Lets you grow PVCs in place as your data grows (no migrate-and-restore dance).

  • With CephFS + CSI, online expansion is supported; Kubernetes handles the resize.

  • You still need available capacity in the Ceph cluster. This just makes growth operationally simple.

Use this class for your Nautobot Postgres PVC. Redis persistence is optional. Enable it only if you truly need cache durability.

Add this storage class or classes to your Rook-Ceph deployment if you haven’t already (below) and let’s move forward.

MetalLB IP Pool for Traefik

Before we can expose apps to the outside world, Traefik needs an externally reachable IP from MetalLB. “Public” here just means outside the cluster (it can still be RFC1918). Since we already set up MetalLB in the earlier posts, this is a quick tweak.

1) Give MetalLB an address to hand out

Add a single IP (or a range) to your existing IPAddressPool. I like dedicating a single /32 for Traefik so DNS stays stable.

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: prod-traefik-pool
  namespace: metallb-prod
spec:
  addresses:
    - 192.168.101.161/32  # Traefik LB IP

If you don’t already have one, pair the pool with an L2Advertisement (MetalLB won’t announce addresses without it):

apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: prod-traefik-l2adv
  namespace: metallb-prod
spec:
  ipAddressPools:
    - prod-traefik-pool

Note: Pick an unused IP in your LAN (outside DHCP scope). Then sync your Argo CD app for MetalLB.

2) Pin that IP on the Traefik Service

In your Traefik Helm values, set the Service to LoadBalancer and assign the static IP:

service:
  enabled: true
  type: LoadBalancer
  spec:
    loadBalancerIP: 192.168.101.161
  # optional, preserves client source IP if you care about logs:
  externalTrafficPolicy: Local

Sync your Traefik app. You should see the EXTERNAL-IP appear:

kubectl -n kube-system get svc

traefik-prod  LoadBalancer   10.233.23.76   192.168.101.161   32400:30228/TCP,80:31007/TCP,443:32150/TCP   108d

3) (Optional) DNS now or later

Once the IP is live, create a DNS A record (e.g., nautobot.example.local → 192.168.101.161). We’ll wire the IngressRoute host to match this in the next steps.

That’s it. Traefik now has a stable, outside-facing address; we can safely publish Nautobot behind it.

Exposing Nautobot using an Ingress Route

With Traefik now holding an external IP, we can move on to exposing Nautobot to the outside world, time to configure the IngressRoute so users and devices can reach it.

This part is straightforward if you already have an ingress controller. If not, jump back to the Part 2 post for deploying Traefik in-cluster. By default, the Nautobot Helm chart does not create any Ingress/IngressRoute resources.

You can use the Nautobot chart values to let it create ingress, but we’re leaving those at the defaults. Instead, we’ll handle exposure in the overlay with a Traefik IngressRoute. I prefer this split: Helm owns the app; Kustomize owns how it’s exposed. It’s a repeatable, cookie-cutter pattern across apps and keeps odd edge cases out of chart values. Goal here is simple, publish the web UI outside the cluster. Nothing fancy.

A working IngressRoute example is below, and can also be found in my GitOps Playground repository in the apps/nautobot/overlays/prod folder -

---
# --> (Example) Create an IngressRoute for your service...
 apiVersion: traefik.io/v1alpha1
 kind: IngressRoute
 metadata:
   name: nautobot-prod-ingressroute  # <-- Replace with your IngressRoute name
   namespace: nautobot-prod  # <-- Replace with your namespace
 spec:
   entryPoints:
     - websecure
   routes:
     - match: Host(`nautobot.home.nerdylyonsden.io`)  # <-- Replace with your FQDN
       kind: Rule
       services:
         - name: nautobot-prod-default  # <-- Replace with your service name
           port: 80
# --> (Optional) Add certificate secret
   tls:
     secretName: prod-apps-certificate-secret # < cert-manager will store the created certificate in this secret.
# <--

The main points to cover here are:

  • Namespace - Make sure the manifest’s namespace matches where Nautobot will live.

  • EntryPoints - Use only websecure so traffic is encrypted at least up to Traefik inside the cluster.

  • Host rule - routes.match must match the public DNS A record users will hit for Nautobot.

  • Service wiring - services.name and services.port must match the Nautobot Service. In my setup the name is <namespace>-default; adjust if yours differs.

  • Port - Defaults to 80 unless you’ve changed it in the Service.

  • TLS / certs - If you have a cluster cert solution (e.g., cert-manager), wire it here. If not, leave the TLS section out for now; I’ll cover this in an advanced post.

Note: To check the Service name and port you can go click into the app on ArgoCD, whether fully deployed or not, click the Service → Summary Tab → Desired Manifest as shown below -

Note (again): The Service also exposes port 443, but we’re not using it. Nautobot needs additional app-level config to terminate HTTPS directly. For now we’ll keep TLS at Traefik and speak HTTP to the Service. End-to-end HTTPS on Nautobot itself is out of scope for this post (maybe a future one).

Once the IngressRoute is set the way you want, drop it into your environment overlay (e.g., apps/nautobots/overlays/prod/ingress-route.yml) and commit it. That’s it for this piece, on to the next section.

Deploying Securely - Creating Our Secrets

For a starter implementation of Nautobot with some basic security we are going to need the following secrets stored in Vault -

  • Super User Login Credentials (which will include a password and API token)

  • Postgres DB Credentials

  • Redis DB Credentials

We’re going to keep credentials out of Git and let External Secrets (ESO) fetch them from HashiCorp Vault at deploy time. The two things we need to cover here are: (1) enabling Kubernetes authentication in Vault with a role dedicated to Nautobot, and (2) adding the actual secrets into Vault under the /secret path.

You should hopefully have an existing instance of Hashicorp Vault already if you’ve been following along with the previous posts.

Kubernetes Authentication + Nautobot Role

Why we need it:
External Secrets runs inside your cluster. It needs a secure, short-lived way to prove to Vault, “I’m allowed to read only the Nautobot secrets.” Vault’s Kubernetes auth method does exactly that by validating a pod’s service account token against the cluster API and mapping it to a least-privilege policy.

What the role does:

  • Binds a specific ServiceAccount + Namespace (e.g., the one where Nautobot lives) to a read-only policy for your Nautobot secret paths.

  • Issues short-lived Vault tokens to ESO when it presents the Kubernetes JWT. No root tokens or static creds in manifests.

  • Scopes access to exactly the secret paths you choose (nothing more).

First piece that has to be done (if never done previously) is not unable the Kubernetes Authentication method. For enabling through the GUI follow the steps below:

  1. In the left-hand pane, click Access.

  2. Under Authentication, click Enable New Method (top right).

  3. Under Infra, choose Kubernetes.

  4. Leave the options at their defaults and click Enable Method.

  5. Back on the Authentication Methods list, you should now see kubernetes/ and token/. Click kubernetes/.

  6. Click Configuration (top area), then Configure (right side).

  7. In Configuration, set Kubernetes host to your API URL (I use the Kube-VIP URL from earlier posts). If you don’t have one, you can use https://kubernetes.default.svc.

Kubernetes auth is now configured. Next, create the Nautobot role.


Create the Nautobot role:

  1. From Authentication Methods, select kubernetes/.

  2. Click Create role (right side).

  3. Use these values (adjust as needed for your environment):

    • Name: nautobot

    • Alias name source: serviceaccount_name

    • Bound service account names: nautobot-prod

    • Bound service account namespaces: nautobot-prod

    • Under Tokens → Generated Token’s Policies: add nautobot (we’ll create this policy next)

  4. Leave other token settings at their defaults; other fields can remain blank.

  5. Click Save.

That’s all we need for the Nautobot role. The referenced ServiceAccount will be created by our Helm deployment a bit later.


Create the ACL policy:

  1. In the left-hand pane, click Policies.

  2. Click Create ACL policy.

  3. Enter a policy name (e.g., nautobot).

  4. Paste in the policy content. Note: for simplicity, I start from the default policy and add read and list capabilities for the upcoming secrets paths (shown below).

# Allow tokens to look up their own properties
path "auth/token/lookup-self" {
    capabilities = ["read"]
}

# Allow tokens to renew themselves
path "auth/token/renew-self" {
    capabilities = ["update"]
}

# Allow tokens to revoke themselves
path "auth/token/revoke-self" {
    capabilities = ["update"]
}

# Allow a token to look up its own capabilities on a path
path "sys/capabilities-self" {
    capabilities = ["update"]
}

# Allow a token to look up its own entity by id or name
path "identity/entity/id/{{identity.entity.id}}" {
  capabilities = ["read"]
}
path "identity/entity/name/{{identity.entity.name}}" {
  capabilities = ["read"]
}


# Allow a token to look up its resultant ACL from all policies. This is useful
# for UIs. It is an internal path because the format may change at any time
# based on how the internal ACL features and capabilities change.
path "sys/internal/ui/resultant-acl" {
    capabilities = ["read"]
}

# Allow a token to renew a lease via lease_id in the request body; old path for
# old clients, new path for newer
path "sys/renew" {
    capabilities = ["update"]
}
path "sys/leases/renew" {
    capabilities = ["update"]
}

# Allow looking up lease properties. This requires knowing the lease ID ahead
# of time and does not divulge any sensitive information.
path "sys/leases/lookup" {
    capabilities = ["update"]
}

# Allow a token to manage its own cubbyhole
path "cubbyhole/*" {
    capabilities = ["create", "read", "update", "delete", "list"]
}

# Allow a token to wrap arbitrary values in a response-wrapping token
path "sys/wrapping/wrap" {
    capabilities = ["update"]
}

# Allow a token to look up the creation time and TTL of a given
# response-wrapping token
path "sys/wrapping/lookup" {
    capabilities = ["update"]
}

# Allow a token to unwrap a response-wrapping token. This is a convenience to
# avoid client token swapping since this is also part of the response wrapping
# policy.
path "sys/wrapping/unwrap" {
    capabilities = ["update"]
}

# Allow general purpose tools
path "sys/tools/hash" {
    capabilities = ["update"]
}
path "sys/tools/hash/*" {
    capabilities = ["update"]
}

# Allow checking the status of a Control Group request if the user has the
# accessor
path "sys/control-group/request" {
    capabilities = ["update"]
}

# Allow a token to make requests to the Authorization Endpoint for OIDC providers.
path "identity/oidc/provider/+/authorize" {
    capabilities = ["read", "update"]
}

# Allow a token to access nautobot db secrets
path "secret/nautobot-prod-db-credentials" {
    capabilities = ["read", "list"]
}

# Allow a token to access nautobot superuser secrets
path "secret/nautobot-prod-superuser-credentials" {
    capabilities = ["read", "list"]
}

That’s it. Kubernetes Auth, the Nautobot Role, and the policy are set. Let’s finally add our actual secrets to Vault.

Add Secrets to Vault (under secret/)

We’ll store the credentials and app secrets that Nautobot (and its dependencies) need under a clear, predictable hierarchy in the /secret (KV) mount.

What to store for a “basic but secure” deploy:

  • Superuser: password and API token (for first login and automation).

  • Database Passwords: Postgres and Redis

If the KV (Key/Value) secrets engine isn’t enabled yet, start here. Otherwise, skip to Create the secrets.

Enable the KV secrets engine

  1. In the left navigation, click Secrets Engines.

  2. Click Enable new engine + (top right).

  3. Choose KV under “Generic.”

  4. Set the Path to secret; leave other options at defaults.

  5. Click Enable Engine.

If this is a fresh Vault and KV wasn’t previously enabled, you should now see it listed alongside the existing engines.

Create the secrets

  1. In the left navigation, click Secrets Engines.

  2. Select the new secret (KV) engine.

  3. Click Create secret + (right side).

  4. For Path, enter nautobot-prod-db-credentials (or your preferred name).

  5. Under Secret data, add a key postgres-pass with its value.

  6. Click Add and create a second key redis-pass with its value.

  7. Click Save.

If completed correctly it should look like below -

Repeat the process for the Superuser secret. Create a new secret with two keys (for example, password and api-token) and save.

How ESO and Vault Work Together (high-level)

Once the role and secrets exist and ArgoCD goes to deploy the app, ESO will:

  • Use the Kubernetes auth role to obtain a short-lived Vault token (via its ServiceAccount).

  • Read the exact keys under /secret/nautobot... as defined by your policy.

  • Materialize a single Kubernetes Secret (or multiple, your call) in the Nautobot namespace with the names/keys your Helm chart expects.

With Vault and External Secrets in place, we now have a clean, Git-free path for credentials: a Kubernetes auth role that scopes exactly who can read what, a tidy set of KV paths for Nautobot’s superuser + databases, and ESO ready to materialize those values as Kubernetes Secrets when Argo CD reconciles. That closes the loop on “secure by default” for this deployment. Next up, we’ll use everything we’ve built so far (storage classes, ingress patterns, secrets) to assemble our Kustomize resources and configure the Nautobot Helm chart the GitOps way.

The Rest of the Kustomize Resources

Earlier we created the IngressRoute Kustomize file to publish Nautobot through Traefik. Now we’ll add the rest of the overlay, mostly focused on integrating in the work from the Secrets section. We’ll also add a top-level kustomization.yml to bundle these pieces so the cluster can build them as a single unit. Once this overlay is in place, everything we’ve prepared (storage, secrets, and ingress) comes together as one declarative package.

The first file will be the ClusterSecretStore - cluster-secret-store.yml

ClusterSecretStore: ESO’s shortcut to Vault

A ClusterSecretStore is a cluster-wide connection profile that tells External Secrets (ESO) how to reach Vault, which KV (“/secret”) mount to read, and how to auth (Kubernetes auth + Vault role). Use ClusterSecretStore to share one Vault setup across namespaces; use SecretStore if you want it namespace-scoped. For a more simplistic deployment I choose to use a ClusterSecretStore.

What this sets:

  • server – Vault URL reachable from the cluster

  • path/version – your KV mount (e.g., secret, v2)

  • auth.kubernetes – use SA token login; role maps SA+namespace → read-only policy

  • serviceAccountRef – which SA ESO uses to authenticate

Repo Example (with comments):

apiVersion: external-secrets.io/v1
kind: ClusterSecretStore
metadata:
  name: vault-backend                 # cluster-wide handle ESO will reference
spec:
  provider:
    vault:
      # Where Vault is reachable from the cluster using a cluster internal URL
      # (Many setups use http://vault.vault.svc:8200 or https with proper CA)
      server: "http://hashi-vault-prod-0.hashi-vault-prod-internal.hashi-vault-prod.svc.cluster.local:8200"

      # The KV mount path and version you enabled in Vault
      path: "secret"                  # e.g., 'secret', 'kv', etc.
      version: "v2"                   # be explicit to avoid surprises

      # Authenticate to Vault using the Kubernetes auth method
      auth:
        kubernetes:
          mountPath: "kubernetes"     # must match your Vault auth mount path
          role: "nautobot"            # Vault role bound to SA+namespace with read-only policy
          serviceAccountRef:
            name: nautobot-prod       # SA whose token ESO will use for login
            namespace: nautobot-prod  # namespace where that SA lives

How it flows: ESO reads this store → logs into Vault with the SA token → gets a short-lived token for the nautobot role → pulls only the allowed keys → renders Kubernetes Secrets for Helm/Kustomize.

The next pair of files are for the database and superuser secret creation.

ExternalSecrets: mapping Vault data into Kubernetes Secrets

Why these exist: An ExternalSecret tells ESO which Vault keys to read and how to materialize them as a plain Kubernetes Secret that Helm/Kustomize can mount. We’ll use two: one for database/redis creds and one for the Nautobot superuser. An ExternalSecret is namespace-scoped.

Database & Redis ExternalSecret (commented)

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: nautobot-prod-db-external-secret   # ESO resource name
  namespace: nautobot-prod                 # where the resulting K8s Secret will live
spec:
  refreshInterval: "1h"                    # re-sync cadence from Vault
  secretStoreRef:
    name: vault-backend                    # points to our (Cluster)SecretStore
    kind: ClusterSecretStore
  target:
    name: nautobot-prod-db-secrets         # name of the K8s Secret ESO will create/update
    creationPolicy: Owner                  # ESO owns and reconciles this Secret
  data:
    - secretKey: postgres-password         # key inside the K8s Secret
      remoteRef:
        key: secret/data/nautobot-prod-db-credentials  # Vault path (KV v2 HTTP style)
        property: postgres-pass            # field inside that Vault doc

    - secretKey: password                  # duplicate key for charts expecting 'password'
      remoteRef:
        key: secret/data/nautobot-prod-db-credentials
        property: postgres-pass

    - secretKey: redis-password            # Redis password (optional if Redis is unauthenticated)
      remoteRef:
        key: secret/data/nautobot-prod-db-credentials
        property: redis-pass

Superuser ExternalSecret (commented)

apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: nautobot-prod-superuser-external-secret
  namespace: nautobot-prod
spec:
  refreshInterval: "1h"
  secretStoreRef:
    name: vault-backend
    kind: ClusterSecretStore
  target:
    name: nautobot-prod-superuser-secrets  # K8s Secret with Nautobot bootstrap creds
    creationPolicy: Owner
  data:
    - secretKey: password                  # superuser password
      remoteRef:
        key: secret/data/nautobot-prod-superuser-credentials
        property: superuser-pass

    - secretKey: api_token                 # superuser API token
      remoteRef:
        key: secret/data/nautobot-prod-superuser-credentials
        property: superuser-api-token

Notes

  • Key naming: The secretKey entries become keys in your Kubernetes Secret. Align them with whatever your Helm values or manifests expect.

  • KV v2 pathing: Some setups prefer the logical path (e.g., nautobot-prod-db-credentials) rather than the HTTP-style secret/data/.... Use the style that matches how your ClusterSecretStore is configured.

  • Duped mappings: Having both postgres-password and password mapped to the same Vault value is fine if different consumers expect different key names.

  • Refresh: refreshInterval controls how quickly rotations in Vault propagate to Kubernetes. Pick something that fits your rotation policy.

Kustomize: Bundling our Resources Together

Time to bundle everything we’ve created into a single overlay Kustomize can build (and Argo CD can track). Keep this file in your environment overlay (e.g., overlays/prod/).

What this overlay does:

  • Registers Vault access for ESO via the ClusterSecretStore

  • Pulls database + superuser creds via ExternalSecret objects

  • Publishes Nautobot through Traefik with our IngressRoute

## kustomize.yml

# The building blocks we created earlier
resources:
  - cluster-secret-store.yml        # ESO → Vault connection (cluster-scoped; namespace here is ignored)
  - external-secrets-db.yml         # Database & Redis credentials from Vault → K8s Secret
  - external-secret-superuser.yml   # Nautobot superuser creds from Vault → K8s Secret
  - ingress-route.yml               # Traefik exposure for Nautobot

Notes

  • Order of operations: Kustomize doesn’t enforce ordering, but Argo CD will reconcile until everything is healthy. If you want strict sequencing later, you can add Argo CD sync waves via annotations.

  • Where this fits: Your Argo CD Application will point at this folder (done in the next section). Once synced, ESO will authenticate to Vault, create the Kubernetes Secrets, and Traefik will expose the app host defined in your IngressRoute.

Commit this file alongside the four resources, and you’ve got a clean, declarative package ready for Argo CD to manage.

The Final Pieces - Custom Helm Values + ArgoCD App Manifest

Custom Helm values (values-prod.yml)

This file wires Nautobot to the secrets that will be deployed, dials replicas down for a tidy first deploy, and pins persistence to the CephFS StorageClass(es). Drop it next to your overlay (e.g., apps/nautobot/values-prod.yml) and reference it from your Argo CD Application (next section).

# values-prod.yml
nautobot:
  # Keep it small for the first sync; scale later.
  replicaCount: 1

  # Probes off for initial bring-up (migrations can make probes flap).
  # Once stable, consider enabling these.
  livenessProbe:
    enabled: false
  readinessProbe:
    enabled: false

  # Bootstrap superuser from our ExternalSecret-backed K8s Secret.
  superUser:
    existingSecret: "nautobot-prod-superuser-secrets"   # created by ESO
    existingSecretPasswordKey: "password"               # key in that Secret
    existingSecretApiTokenKey: "api_token"              # key in that Secret
    username: "jeff"                                    # static bootstrap username

celery:
  # One worker to start; bump if you run jobs/heavy plugins.
  replicaCount: 1

serviceAccount:
  # Leave token mounted, used for ESO/ClusterSecretStore
  automountServiceAccountToken: true

postgresql:
  # Using the chart’s built-in PostgreSQL with CephFS persistence.
  primary:
    persistence:
      enabled: true
      size: "2Gi"                          # starter size; expand later
      storageClass: "rook-cephfs-retain"   # keep data if PVC is deleted
      accessModes: ['ReadWriteOnce']       # DB should be single-writer
  auth:
    # Pull the password from the ExternalSecret-created Secret.
    existingSecret: nautobot-prod-db-secrets

redis:
  # Enable persistence if you want cache/queue data to survive restarts.
  master:
    persistence:
      enabled: true
      size: "1Gi"
      storageClass: "rook-cephfs-delete"   # okay to delete for cache data
      accessModes: ['ReadWriteOnce']
  auth:
    enabled: true
    existingSecret: nautobot-prod-db-secrets

Why these choices

  • Probes disabled (initially): first runs often include migrations; turning probes off avoids noisy restarts. Re-enable once everything is healthy.

  • CephFS everywhere: aligns with the storage classes you built earlier.

    • rook-cephfs-retain for Postgres so accidental PVC deletes don’t nuke data.

    • rook-cephfs-delete for Redis because it’s cache/queue data.

  • ReadWriteOnce for DB/Redis: even though CephFS supports RWX, keeping databases single-writer reduces foot-guns (performance issues, data corruption, or scalability bottlenecks).

  • Secrets via ESO: existingSecret keys point at the Kubernetes Secrets materialized from Vault, so nothing sensitive lives in Git or in the helm values.

Rounding Out the Argo CD Application

Now that Helm (the app) and Kustomize (secrets + ingress) are defined and your custom Helm values exist we just need to finish the Argo CD Application so it points at both sources and deploys them to the right place (below).

project: prod-home
destination:
  server: https://prod-kube-vip.jjland.local:6443
  namespace: nautobot-prod
syncPolicy:
  syncOptions:
    - CreateNamespace=true
sources:
  - repoURL: https://nautobot.github.io/helm-charts/
    targetRevision: 2.5.5
    helm:
      valueFiles:
        - $values/apps/nautobot/values-prod.yml
    chart: nautobot
  - repoURL: https://github.com/leothelyon17/kubernetes-gitops-playground.git
    path: apps/nautobot/overlays/prod
    targetRevision: HEAD
    ref: values

Copy and paste the above file in the ArgoCD GUI or edit manually. Same as similar app manifests that were configured in previous posts.

Deploying and Syncing the App

With everything bundled via Kustomize and correctly referenced by ArgoCD, it’s time to deploy.

Open the Argo CD Application and click Sync. You should see the Helm release create a batch of Kubernetes objects. To focus on what we built in this post, look for:

  • PVCs bound to your CephFS StorageClasses and mounted by the pods

  • PostgreSQL and Redis pods coming up Healthy

  • Secrets flow: ClusterSecretStore and ExternalSecret resources showing Synced, and the resulting Kubernetes Secrets present in the namespace

  • IngressRoute created and admitted by Traefik (host matches your DNS A record)

If all of the above is green, the Argo CD app should land in Synced / Healthy. Screenshots below show an example of what you should see.

Storage

Secrets

IngressRoute/Traefik

The Application Pods

Note: It can take a little while for the app to show Healthy and become reachable. On the first deploy, once Nautobot connects to Postgres it will run initial database migrations to create tables—this adds extra time on top of the normal startup. If you’re curious, watch the nautobot-init logs for migration progress.

If everything’s green in Argo CD and the pods look steady, open the host defined in your IngressRoute. You should land on the Nautobot login page. Sign in with the superuser credentials you stored in Vault (surfaced via External Secrets and referenced in your custom Helm values). If login fails, check the logs for the nautobot-init container. On first start it runs migrations and bootstraps the superuser. You’ll see log messages confirming the account creation (not the raw secrets), which is a quick way to verify the secret wiring end to end.

If you can log in, CONGRATULATIONS! You’ve just deployed Nautobot on Kubernetes, fully managed the GitOps way.

Troubleshooting Tips

If your deployment isn’t landing cleanly, work through these quick checks, organized by the same pieces we built in this post.


1) Argo CD & Kustomize

What to look for

  • App stuck in OutOfSync or Progressing.

  • Sync immediately fails

  • Resources missing from the tree.

Checks

  • Open the Argo CD git diff for the app: look for bad paths/filenames in kustomization.yml.

  • Verify the repo folder the Application points to contains:

    • cluster-secret-store.yml

    • external-secrets-db.yml

    • external-secret-superuser.yml

    • ingress-route.yml

    • values-prod.yml (referenced by your Helm app)

  • Confirm files paths in the ArgoCD App manifest

  • Double check all YAML syntax


2) Secrets pipeline: Vault → ESO → K8s Secret

Symptoms

  • ExternalSecrets show Not Synced, Nautobot init fails with missing env/creds.

Checks

  • ClusterSecretStore:

    • Server URL reachable inside the cluster?

    • auth.kubernetes.mountPath matches your Vault auth mount?

    • role name matches the role you created in Vault?

  • ExternalSecret:

    • Conditions should be Ready=True; if not, describe it for a clear error (auth denied, key not found, etc.).

    • Verify Vault paths/field names match exactly (KV v2 pathing trips people up).

  • ServiceAccount binding:

    • The SA referenced in the store exists in the right namespace, and your Vault role binds to that SA+namespace.

3) Storage: CephFS StorageClass & PVCs

Symptoms

  • PVCs stuck in Pending; pods can’t mount volumes.

Checks

  • StorageClass name in Helm values matches your CephFS SC (e.g., rook-cephfs-retain).

  • Access modes fit usage:

    • Postgres/Redis: ReadWriteOnce (single writer).

    • Nautobot media/static (if used): ReadWriteMany.

  • Rook-Ceph health:

    • OSDs/MONs healthy, pool/FS exists, quota not exceeded.
  • If PVC deleted but PV persists:

    • That’s expected with reclaimPolicy: Retain; either reuse or manually clean it up before recreating.

4) Postgres & Redis (built-in charts)

Symptoms

  • DB pod CrashLoopBackOff; app can’t connect.

Checks

  • Secrets:

    • The existingSecret names line up with what the subcharts expect, and key names (password, postgres-password, redis-password) match your ExternalSecret outputs.
  • Persistence:

    • Correct StorageClass; PVC bound.
  • Logs:

    • Postgres: authentication/permissions, initdb errors.

    • Redis: refuses connections or auth errors if auth.enabled=true.


5) Nautobot app (web/worker/beat)

Symptoms

  • Web never becomes Ready, 502 via Traefik, or superuser not created.

Checks

  • nautobot-init logs:

    • Confirms migrations and superuser bootstrap; errors here usually mean secret keys missing/wrong.
  • Probes:

    • We disabled probes initially—good. If you enabled them early, they can flap during migrations; disable, sync, let it settle, then re-enable.
  • Environment wiring:

    • Confirm the Helm values reference the K8s Secret keys you created (names and casing must match).

6) Ingress, Traefik & DNS

Symptoms

  • 404/503 at the browser, TLS errors, or wrong host.

Checks

  • IngressRoute:

    • routes.match host matches your DNS A record exactly.

    • entryPoints: ["websecure"] and Traefik has that entrypoint enabled.

  • Traefik Service:

    • Has an EXTERNAL-IP from MetalLB; DNS A record points to it.
  • If using certs later:

    • Don’t reference cert-manager resources yet if you haven’t set them up; keep TLS simple at Traefik.

7) MetalLB (external reachability)

Symptoms

  • Traefik never gets an external IP; no traffic into the cluster.

Checks

  • IPAddressPool contains the IP/range; it’s unused on your LAN.

  • L2Advertisement exists for that pool.

  • Traefik Service type: LoadBalancer and (optionally) loadBalancerIP matches your chosen IP.


8) Resources & scheduling

Symptoms

  • Pods Pending or OOMKilled.

Checks

  • Nodes have capacity; Ceph/DB pods especially need memory/CPU.

  • Start small (single replicas) then scale up.

  • If OOMs, raise limits/requests or add memory.


9) Naming & key mismatches (sneaky but common)

What to verify

  • Secret names and keys in:

    • ExternalSecrettarget Secret

    • Helm values (e.g., existingSecret, existingSecretPasswordKey, etc.)

  • Namespace consistency across all manifests (nautobot-prod vs something else).


10) Quick sanity commands (lightweight)

  • Objects at a glance: kubectl -n nautobot-prod get all

  • ESO health: kubectl -n nautobot-prod get externalsecret,secretstore,clustersecretstore

  • PVCs: kubectl -n nautobot-prod get pvc

  • Describe failures: kubectl -n nautobot-prod describe <kind> <name>

  • App logs: kubectl -n nautobot-prod logs deploy/nautobot -c nautobot-init --tail=100

Ultimately, if you don’t know where to start, USE THE CONTAINER LOGS. ArgoCD makes it so easy and usually you can find the issue in the logs themselves.

Summary

Well, we did it. We didn’t just get Nautobot running, we established a repeatable pattern for network-automation apps or any containerized app: ArgoCD for reconciliation, Kustomize for environment shaping, Vault + External Secrets for credentials, Traefik + MetalLB for reachability, and CephFS for persistence. That stack gives a stable runway to ship changes the same way every time, through Git, without snowflakes or manual tweaks. This same method can be used to deploy both on-prem, in the cloud, or a mix of the two.

Why this helps your automation journey

  • Trustable intent: Nautobot becomes the system of record for sites, devices, IPAM, and custom models exposed via REST/GraphQL for pipelines and tools.

  • Safe, auditable change: Every tweak (charts, values, secrets wiring, ingress) goes through Git reviews and rolls back cleanly. Drift is visible; fixes are deterministic.

  • Fewer blockers: Secrets are handled with least-privilege, storage/ingress are standardized, so you can focus on workflows, not plumbing.

  • From dev to prod: The same pattern scales to new apps (observability, chatops, CI/CD helpers) with minimal friction. Copy the overlay, adjust values, and commit.

Where I’m going next

  • An advanced Nautobot deployment (plugins, app config, HTTPS/certs, SSO).

  • Integrations with other GitOps-deployed apps.

  • A NetBox deployment for folks who prefer that app. Love it too!

This is the moment where GitOps stops being theory and starts accelerating real network automation and manageable application delivery.

Thanks for reading!