Auto-Discover GitHub Repos in Backstage

I already have Backstage running on Ubuntu 24.04 with:

  • Keycloak OIDC login
  • Users & groups synced from LDAP (OpenDJ) via ldapOrg
  • Backstage backend running directly from source (no Docker) in production mode

The next step is:

I don’t want to click “Register existing component” for every new repo. Backstage should just find repos that have catalog-info.yaml and import them automatically.

In this post I:

  • Configure Backstage to scan all GitHub repos under maksonlee.
  • Make it automatically import any repo that has a catalog-info.yaml in its default branch.

No TechDocs here. Just discovery and the Software Catalog.


Environment

Backstage:

  • URL: https://backstage.maksonlee.com
  • Running from ~/homelab-backstage (no Docker)
  • Production start command:
cd ~/homelab-backstage

NODE_ENV=production \
AUTH_SESSION_SECRET='your-32-byte-random-hex' \
AUTH_OIDC_CLIENT_ID='backstage' \
AUTH_OIDC_CLIENT_SECRET='your-real-keycloak-client-secret' \
LDAP_BIND_PASSWORD='your-ldap-bind-password' \
yarn --cwd packages/backend start \
  --config ../../app-config.yaml \
  --config ../../app-config.production.yaml

Identity:

  • Keycloak: https://keycloak.maksonlee.com (realm maksonlee.com)
  • LDAP base DN: dc=maksonlee,dc=com (users/groups from OpenDJ)

GitHub:

  • Account: github.com/maksonlee
  • Several private repos, including maksonlee/beepbeep

  1. Create a fine-grained GitHub PAT for Backstage

Backstage needs credentials to read private repos. For that I use a fine-grained Personal Access Token (PAT).

Short version:

  • PAT = a token that GitHub issues to tools (like Backstage) instead of using your username/password.
  • Fine-grained PAT = newer style where you can limit which repos and what permissions it has.

Logged in as maksonlee on GitHub:

  • Go to
    Settings → Developer settings → Personal access tokens → Fine-grained tokens → Generate new token.
  • Basic info:
    • Token name: backstage-read-repos
    • Resource owner: maksonlee
    • Expiration: pick something sane (e.g. 90 days). For lab you can choose “No expiration”.
  • Repository access:
    • Choose All repositories.
  • Repository permissions → Repositories:
    • Contents: Read-only
    • Metadata: Read-only
  • Click Generate token, copy the token string, and store it safely.

On the Backstage server I’ll expose it as GITHUB_TOKEN when starting the backend.


  1. Configure GitHub integration in Backstage

Tell Backstage to use GITHUB_TOKEN whenever it talks to github.com.

In ~/homelab-backstage/app-config.production.yaml, add (or edit) the top-level integrations.github block:

integrations:
  github:
    - host: github.com
      token: ${GITHUB_TOKEN}

This means:

  • At runtime, Backstage reads the GitHub token from the GITHUB_TOKEN env var.
  • All calls to github.com from the backend use that token.

No extra configuration needed for non-Enterprise github.com.


  1. Add the GitHub catalog backend module

Next, I add the GitHub entity provider so the catalog can scan my repos.

  • Install the module

On the Backstage host:

cd ~/homelab-backstage
yarn --cwd packages/backend add @backstage/plugin-catalog-backend-module-github
  • Register it in packages/backend/src/index.ts

My backend uses createBackend() and already has catalog + LDAP:

// catalog plugin
backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(
  import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);

// LDAP org provider: sync Users & Groups from LDAP into the catalog
backend.add(import('@backstage/plugin-catalog-backend-module-ldap'));

I insert the GitHub module between the catalog core and LDAP provider:

// catalog plugin
backend.add(import('@backstage/plugin-catalog-backend'));
backend.add(
  import('@backstage/plugin-catalog-backend-module-scaffolder-entity-model'),
);

// GitHub discovery provider: auto-import entities from GitHub repos
backend.add(import('@backstage/plugin-catalog-backend-module-github'));

// LDAP org provider: sync Users & Groups from LDAP into the catalog
backend.add(import('@backstage/plugin-catalog-backend-module-ldap'));

All other plugins (app, proxy, auth, permission, search, kubernetes, notifications, signals, etc.) stay as they were.


  1. Configure GitHub discovery in app-config.production.yaml

Now I configure a catalog provider that scans all repos under the maksonlee account.

In app-config.production.yaml I already had catalog configured for LDAP. I extend it with a github provider section:

catalog:
  locations:
    # Example demo locations, can be removed later
    - type: file
      target: ./examples/entities.yaml

    - type: file
      target: ./examples/template/template.yaml
      rules:
        - allow: [Template]

  providers:
    ldapOrg:
      default:
        target: ldaps://ldap.maksonlee.com

        bind:
          dn: "uid=backstage,ou=system,dc=maksonlee,dc=com"
          secret: ${LDAP_BIND_PASSWORD}

        schedule:
          frequency: PT1H
          timeout: PT15M
          initialDelay: PT3M

        users:
          - dn: "ou=people,dc=maksonlee,dc=com"
            options:
              scope: sub
              filter: "(&(objectClass=inetOrgPerson)(uid=*))"
            map:
              rdn: uid
              name: uid
              displayName: cn
              email: mail
              memberOf: isMemberOf
            set:
              metadata.namespace: default

        groups:
          - dn: "ou=organization,ou=groups,dc=maksonlee,dc=com"
            options:
              scope: sub
              filter: "(objectClass=groupOfNames)"
            map:
              rdn: cn
              name: cn
              displayName: cn
              description: description
              members: member
            set:
              metadata.namespace: default
              spec.type: team

    github:
      maksonlee:
        organization: 'maksonlee'
        catalogPath: '/catalog-info.yaml'
        filters:
          repository: '.*'
        schedule:
          frequency: { minutes: 30 }
          timeout: { minutes: 3 }

What this means:

  • github: – we’re configuring GitHub catalog providers.
  • maksonlee: – this is just an ID for this provider (shows up in logs).
  • organization: 'maksonlee' – scan all repos under github.com/maksonlee.
  • catalogPath: '/catalog-info.yaml' – in each repo’s default branch, look for /catalog-info.yaml.
  • filters.repository: '.*' – include all repo names (can be narrowed later).
  • schedule.frequency: 30 minutes – refresh every 30 minutes.
  • schedule.timeout: 3 minutes – give the job up to 3 minutes to run.

Important: I don’t set any branch filter. The provider always uses each repo’s default branch (whether that’s main or master).


  1. Restart the backend and confirm discovery is running

Now I restart Backstage with GITHUB_TOKEN set:

cd ~/homelab-backstage

GITHUB_TOKEN='your-fine-grained-pat' \
NODE_ENV=production \
AUTH_SESSION_SECRET='your-32-byte-random-hex' \
AUTH_OIDC_CLIENT_ID='backstage' \
AUTH_OIDC_CLIENT_SECRET='your-real-keycloak-client-secret' \
LDAP_BIND_PASSWORD='your-ldap-bind-password' \
yarn --cwd packages/backend start \
  --config ../../app-config.yaml \
  --config ../../app-config.production.yaml

In the logs I see:

{"level":"info","message":"Registered scheduled task: github-provider:maksonlee:refresh, {\"version\":2,\"cadence\":\"PT30M\",\"timeoutAfterDuration\":\"PT3M\"}","plugin":"catalog","service":"backstage","task":"github-provider:maksonlee:refresh"}
...
{"class":"GithubEntityProvider","level":"info","message":"Read 74 GitHub repositories (74 matching the pattern)","plugin":"catalog","service":"backstage","target":"github-provider:maksonlee","taskId":"github-provider:maksonlee:refresh","taskInstanceId":"..."}

This tells me:

  • The scheduled task github-provider:maksonlee:refresh is registered.
  • It successfully enumerated my repos (Read 74 GitHub repositories in my case).

At this point Backstage is scanning GitHub correctly. Now I just need catalog-info.yaml files in repos that I want in the Catalog.


  1. Use one repo as the concrete example

I use maksonlee/beepbeep (a private Android app) as the test case.

Goal: once I add catalog-info.yaml to its default branch, it should show up in the Catalog automatically, without any UI registration.

In the root of the default branch (master for this repo), I create:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: beepbeep
  title: Beep Beep
  description: Minimal periodic reminder app with time-based sounds and quiet hours.
  annotations:
    github.com/project-slug: maksonlee/beepbeep
spec:
  type: service
  owner: user:default/maksonlee
  lifecycle: production

Notes:

  • metadata.name: beepbeep → entity reference becomes component:default/beepbeep.
  • github.com/project-slug: maksonlee/beepbeep → ties this entity back to the GitHub repo.
  • owner: user:default/maksonlee → I own the app as a user entity (synced from LDAP).

Commit:

git add catalog-info.yaml
git commit -m "chore(backstage): add catalog-info.yaml for Beep Beep"
git push

No Backstage UI action required. I just wait for the provider to refresh (or restart the backend once more if I’m impatient).


  1. Verify that repo appears in the Catalog

In the Backstage UI:

  • Go to https://backstage.maksonlee.com.
  • Click Catalog → Components.

I see:

  • Name: Beep Beep
  • Type: service
  • Owner: user:default/maksonlee

This confirms that:

  • GitHub discovery is reading my repos.
  • It found catalog-info.yaml in the default branch.
  • It created component:default/beepbeep automatically.

No “Register component” button needed.


  1. Scaling to all repos

With this setup, the rule across my account is now:

If a repo under github.com/maksonlee has a catalog-info.yaml at the root of its default branch, and its name matches repository: '.*', it will appear in the Backstage Catalog automatically.

To onboard another repo:

In that repo’s default branch, add catalog-info.yaml, for example:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: my-service
  title: My Service
  description: Something useful.
  annotations:
    github.com/project-slug: maksonlee/my-service
spec:
  type: service
  owner: user:default/maksonlee
  lifecycle: production

Commit and push:

git add catalog-info.yaml
git commit -m "chore(backstage): add catalog-info.yaml for My Service"
git push

Wait for the GitHub provider to refresh, or restart the backend.

The new component appears automatically.

If I want to limit which repos are scanned, I can adjust:

filters:
  repository: '.*'

For example, only repos starting with android-:

filters:
  repository: '^android-.*'

Or I can define multiple providers (e.g. one per pattern or per org) under catalog.providers.github.


  1. Summary

With a small amount of config, Backstage now auto-discovers GitHub repos using the default branch and a fine-grained PAT:

  • Created a fine-grained PAT backstage-read-repos with Contents: Read-only and Metadata: Read-only for all repos under maksonlee.
  • Configured integrations.github to use ${GITHUB_TOKEN} for github.com.
  • Installed and registered @backstage/plugin-catalog-backend-module-github in the backend.
  • Added catalog.providers.github.maksonlee to scan all repos under maksonlee, looking for /catalog-info.yaml in the default branch.
  • Onboarded a private repo (maksonlee/beepbeep) by adding a single catalog-info.yaml file and committing it.
  • Verified that Beep Beep appears in the Catalog automatically, with no manual registration.

From now on, onboarding a service into Backstage for this account is just:

Add catalog-info.yaml to the default branch → wait for refresh → done.

Did this guide save you time?

Support this site

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top