Security-as-Code: Why We Treat Provider Configuration Like Infrastructure Code

There is a class of infrastructure problems that only reveal themselves after something breaks in production. Not during development, not during code review, not even during staging. You find them at 2am when agents start returning 401s and nobody knows what changed.

Nexus provider configuration was one of those problems for us. The Broker holds refresh tokens and client credentials for every OAuth provider in your workspace. Before nexus-cli shipped in 0.2.0, the only way to add, update, or delete a provider was to call the REST API directly. A curl command, a Postman collection, a script someone wrote and saved locally. No version history. No peer review. No record of what ran or when.

We built nexus-cli to fix that. The pattern it follows is deliberate — it is the same approach the infrastructure ecosystem converged on for cloud resources over the last decade.

The Failure Mode That Prompted This

The specific incident that made this urgent involved a cleanup script. A developer on the team wrote a script to remove providers that looked unused based on a naming convention. The naming convention was applied inconsistently. The script deleted a Salesforce provider that was active. Every agent in the workspace that depended on Salesforce connections immediately started failing.

The diagnosis took longer than it should have because there was no record of the deletion. The API call had no caller identity attached to it, no timestamp in a log that could be correlated with the agent failures, and no diff showing what the state looked like before versus after. We were reading error logs backwards trying to infer what had changed.

The solution was not to make the API harder to call. It was to make the correct path easier than the ad-hoc path — and to make the ad-hoc path produce a record of what happened.

The Manifest

Provider configuration in nexus-cli lives in a YAML file you commit to your infrastructure repository:

providers:
  - name: google-workspace
    auth_type: oauth2
    client_id: "${GOOGLE_CLIENT_ID}"
    client_secret: "${GOOGLE_CLIENT_SECRET}"
    issuer: "https://accounts.google.com"
    enable_discovery: true
    scopes:
      - openid
      - email
      - profile
      - offline_access

  - name: github
    auth_type: oauth2
    client_id: "${GITHUB_CLIENT_ID}"
    client_secret: "${GITHUB_CLIENT_SECRET}"
    auth_url: "https://github.com/login/oauth/authorize"
    token_url: "https://github.com/login/oauth/access_token"
    scopes:
      - read:user
      - user:email

The ${VAR} references are expanded at runtime using os.Expand. If any referenced environment variable is unset, the CLI fails immediately with the complete list of missing variables before touching the Broker. This fail-fast behavior is intentional: a partial manifest expansion that silently substitutes empty strings for credentials would be worse than a failed run.

The manifest file is the source of truth for what providers should exist in the target environment. Everything else is drift.

Plan: What Would Change

nexus-cli plan fetches the current live state from the Broker in parallel (up to five concurrent profile requests), computes a field-by-field diff against your manifest using computeDrift, and prints an execution plan without making any changes:

nexus-cli plan
nexus-cli plan --file ./infra/nexus-providers.prod.yaml

Read 2 providers from nexus-providers.yaml

--- Execution Plan ---
+ CREATE : github
~ UPDATE : google-workspace
    scopes: [openid, email] -> [openid, email, profile, offline_access]
= OK     : internal-api (no changes)
! ORPHAN : old-slack-provider (would be deleted if --prune was passed)

Plan complete. Run 'nexus-cli apply' to perform these actions.

The symbols are exactly what they look like: + creates, ~ updates with a field-level diff, = means no change, ! is an orphan. When an update affects client_secret or client_id, those fields are always shown as *** → *** in the plan output. The diff shows you that a secret field changed without revealing either value. Non-secret fields like scopes show the actual before and after.

Updates use PATCH semantics — only the drifted fields are sent to the Broker, not a full replace. A scope list change does not re-submit your client credentials.

Apply: Confirm Then Execute

nexus-cli apply shows the same plan and then requires you to type yes before proceeding. The confirmation is not a Y/n prompt. It requires the full word. This matches the Terraform UX deliberately — a misclick on enter does not trigger a provider deletion.

If you pass --prune, providers that exist in the live Broker state but not in your manifest will be deleted. We surface orphans in every plan run so you can see them, but we do not delete them by default. An accidental provider deletion is an immediate, workspace-wide outage for every connection that references that provider. The --prune flag is an explicit opt-in, not a default behavior.

Why Auto-Apply on Merge Is Not Recommended

Every CI/CD system that touches infrastructure eventually asks: can we auto-apply when the manifest file changes on main? For Nexus providers, our answer is no — and we want to explain that, because it is not the default answer for most declarative systems.

Provider configuration is live operational data. The Broker is not a static resource like a DNS record or an S3 bucket policy. It holds active refresh tokens that agents are using right now. A misconfigured update to a provider's scope list or token URL does not fail gracefully — it breaks every in-flight token refresh for every connection on that provider. A deleted provider is an immediate hard failure for every agent that depends on it.

The recommended pattern is to run nexus-cli plan as an informational step on pull requests, so reviewers can see exactly what would change before approving the merge. Apply runs manually from a trusted environment by someone who has verified the plan looks correct. The review step is the entire point of the workflow, not a friction tax on deployment speed.

A GitHub Actions snippet for the plan step:

- name: Nexus provider plan
  env:
    BROKER_BASE_URL: ${{ secrets.BROKER_BASE_URL }}
    API_KEY: ${{ secrets.BROKER_API_KEY }}
    GOOGLE_CLIENT_ID: ${{ secrets.GOOGLE_CLIENT_ID }}
    GOOGLE_CLIENT_SECRET: ${{ secrets.GOOGLE_CLIENT_SECRET }}
  run: nexus-cli plan

The Audit Trail

When nexus-cli apply runs, it is calling the Broker's REST API — the same API you would have called manually before. The difference is that the Broker's audit subsystem now writes a structured event for every mutation those API calls produce.

Every provider.created, provider.updated, and provider.deleted event is written to the audit_events table with the actor IP address, User-Agent, provider ID, and a JSON payload of what changed. For provider.updated, the audit payload redacts client_secret and client_id — it records that those fields changed without storing the new values. Non-credential fields like scopes and endpoint URLs are recorded in full.

That gives you two independent audit trails for every provider change. The git history of your manifest file tells you who proposed the change, when it was reviewed, and who approved the merge. The Broker audit log tells you when the CLI ran, from what IP, and what the Broker accepted. If the two ever disagree — someone ran the CLI outside the normal workflow, or the apply produced unexpected results — you have enough evidence to understand what happened.

The audit log is queryable via GET /audit with filters for event type, resource ID, and time range. You can also pull it directly from the audit_events table in PostgreSQL if you need to run your own queries for compliance reporting.

The Broader Principle

The reason we called this "Security-as-Code" rather than just a CLI tool is that the pattern extends beyond the CLI itself. The Broker holds the most sensitive data in your agent stack. Treating its configuration as code — with version history, review gates, and an audit trail — is not a workflow preference, it is a security control.

An ad-hoc API call that deletes a provider is not just an operational problem. It is an undocumented change to a security boundary. The manifest and the CLI make that boundary visible, reviewable, and recoverable. That is the goal.

If you are running Nexus in production today, the first nexus-cli plan run is worth doing just to see your live state reflected in a structured output. Run it, commit the resulting manifest, and go from there.

The full CLI reference and manifest field documentation are in the Nexus security-as-code guide.