Ride the subscription, not the API bill

A funny thing about your subscription

You pay $20 a month, maybe $200, for Claude or ChatGPT and get a huge amount of the model. Use it all day. But the moment you want a program to use it, everyone points you at the API, where the same model is billed per token. One ambitious coding run can cost more than your whole monthly plan.

Same model, two prices. The cheap one is already paid for and apparently off-limits to automation.

Why? And can we fix it? We can. It’s a small pattern. Instead of pointing at someone’s codebase, let’s build it from scratch.

First, go find the subscription

Log into the Claude CLI, then look at what it wrote to disk:

$ claude              # log in with your Pro/Max account, once
$ cat ~/.claude/.credentials.json

On Linux you get this (on macOS it’s in the Keychain, same shape):

{
  "claudeAiOauth": {
    "accessToken": "sk-ant-oat01-…",
    "refreshToken": "sk-ant-ort01-…",
    "expiresAt": 1765432100000,
    "scopes": ["user:inference"],
    "subscriptionType": "max"
  }
}

This is not an API key. It’s an OAuth session: a short-lived accessToken, plus a refreshToken the CLI uses to mint a new one when it expires. “Expired” is the normal state. The CLI refreshes in the background and you never see it.

So the only thing that can spend your subscription is the program that logged in and keeps the token alive. The documented API authenticates keys, not plans. There’s no header you can paste a subscription into. (Could you grab these tokens and call the private endpoint yourself? You could. Hold that thought; we come back to why it’s a bad trade.)

The constraint falls out on its own: to use the subscription, let the CLI do the talking. The question is no longer “how do I get the subscription into my code” but “how do I put my code around the CLI.”

The plan writes itself

If only the CLI can spend the plan, run it. Treat the claude binary as an engine you feed work to.

The naive version:

import subprocess

def ask_claude(prompt: str) -> str:
    result = subprocess.run(
        ["claude", "-p", prompt],   # -p = print mode, headless, no TUI
        capture_output=True, text=True,
    )
    return result.stdout

It works. The CLI uses its login, does the job, prints the answer. You just ran a model on your subscription from Python.

Ship exactly that, though, and you’ll get a surprising bill.

The one line that decides everything

The machine running your code probably has an ANTHROPIC_API_KEY in its environment already, for some other tool or a CI job. The CLI checks for it, and if it’s there, uses it. Now every token bills to your API account at metered rates, silently, while your Max subscription sits unused.

The fix is a removal. Before spawning the CLI, scrub the environment:

import os

def clean_env() -> dict[str, str]:
    env = os.environ.copy()
    # Remove any API key so the CLI uses its subscription login,
    # not a developer key that bills separately.
    env.pop("ANTHROPIC_API_KEY", None)
    # If we're already inside a Claude session this is set, and a
    # nested `claude` refuses to start. Unset it too.
    env.pop("CLAUDECODE", None)
    return env

Wrapping one program in another means you inherit the whole environment the child inspects, not just the arguments you pass. Each vendor CLI reads a few of these signals: which credential to use, am I nested, who’s my parent. Getting the wrap right is knowing them and clearing them. You don’t prefer the subscription, you enforce it, by deleting the alternative.

Now subprocess.run(["claude", "-p", prompt], env=clean_env()) is honest.

Driving it for real

Dumping a text blob is fine for a toy. For real use you want to stream the agent thinking, calling tools, finishing. Use --output-format stream-json: one JSON event per line. Wrap it in an async generator that yields your own events:

import asyncio, json
from typing import AsyncIterator

async def run_claude_turn(prompt: str) -> AsyncIterator[dict]:
    proc = await asyncio.create_subprocess_exec(
        "claude", "-p", prompt,
        "--output-format", "stream-json", "--verbose",
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.PIPE,
        env=clean_env(),                 # keeps you on the plan
    )
    async for raw in proc.stdout:
        line = raw.decode().strip()
        if not line:
            continue
        event = json.loads(line)
        kind = event.get("type")
        if kind == "assistant":
            yield {"type": "text", "text": _text_of(event)}
        elif kind == "result":
            yield {"type": "done", "usage": event.get("usage")}
        # tool calls, thinking, etc. map the same way: one branch each

    await proc.wait()

Field names shift between CLI versions, so pin yours. The structure is the point: read the vendor’s stream, emit your own events. After this, callers see text and done events, not Claude.

Who holds the credential? (not you)

You don’t hold the token, refresh it, or check expiry. The CLI does, same as when you use it by hand. Your only job is to check whether a login exists, so you can fail with a clear message:

import json, time
from pathlib import Path

def has_claude_login() -> bool:
    try:
        oauth = json.loads(
            Path("~/.claude/.credentials.json").expanduser().read_text()
        )["claudeAiOauth"]
    except (OSError, KeyError, ValueError):
        return False
    if not oauth.get("accessToken"):
        return False
    if oauth.get("refreshToken"):
        return True                       # renewable, fine even if expired
    return oauth.get("expiresAt", 0) > time.time() * 1000

One catch: on macOS the credential is in the Keychain, so that file may not exist and the check falsely reports no login. Fall back to claude auth status, which reads wherever the credential lives. File first for speed, CLI for truth.

What if you don’t want the subscription, but an API key or a gateway? Keep it on the CLI’s terms. Don’t set the key as an env var (we just stripped that). Hand the CLI a command that prints a token:

def settings_for(mode: str, secret: str | None = None) -> dict:
    if mode == "subscription":
        return {}                                     # nothing to inject
    if mode == "api_key":
        # A command that prints the key. The CLI runs it when it
        # needs a bearer token, like its own refresh.
        return {"apiKeyHelper": f"printf %s {secret}"}
    if mode == "gateway":
        return {"apiKeyHelper": secret, "env": {"ANTHROPIC_BASE_URL": "https://…"}}

Write that to a settings file and pass claude --settings …. Subscription, key, or gateway: you tell the CLI how to authenticate, then step back. Auth is something the CLI pulls, never something you push.

But couldn’t I just refresh the tokens myself?

Fair objection. We have the access token, the refresh token, the scopes. OAuth refresh is one HTTP call. So why the CLI? Why not read the tokens, refresh them, and hit the endpoint directly?

You can. The token is a plaintext bearer credential, not a sealed secret, and people have reverse-engineered exactly this. It’s not a wall, it’s a trade. Here’s what you’d sign up for.

The endpoint. The subscription doesn’t use the documented api.anthropic.com path. It uses a client-identified endpoint that only accepts a request with the right shape: a specific anthropic-beta flag, the OAuth client_id from the CLI, a matching user-agent. The token alone isn’t enough; the request has to look like the official client. None of it is documented, all of it can change, and you’d re-reverse-engineer it every time it does.

The refresh. Refresh tokens usually rotate: each refresh returns a new one and voids the old. So you must write the new token back atomically, maybe while a real CLI does the same. Get it wrong once and you don’t fail a request, you break the login and sign in again. The CLI already does this correctly.

The moving target. Endpoint, headers, client id, token format, and increasingly attestation (proof the request came from a genuine client). The CLI passes these checks because it is the client. A hand-rolled one works today, breaks next release.

Legitimacy. Driving the shipped client is using their tool. Lifting the token and hitting private endpoints is circumventing it, which is what gets accounts flagged.

And the punchline: win all of that and you’ve only rebuilt auth. The subscription is exposed through the agent: the tool loop, model routing, retries. Skip the CLI and you’re rebuilding Claude Code, not saving a subprocess.

One shape for every agent

Now we have subscription Claude behind a clean event stream. The payoff is what comes next: give every agent the same shape and they become interchangeable.

One interface, one method:

from abc import ABC, abstractmethod
from typing import AsyncIterator

class Executor(ABC):
    @abstractmethod
    async def run_turn(self, prompt: str) -> AsyncIterator[dict]: ...

class ClaudeCLIExecutor(Executor):
    async def run_turn(self, prompt: str) -> AsyncIterator[dict]:
        async for event in run_claude_turn(prompt):
            yield event

class CodexCLIExecutor(Executor):
    async def run_turn(self, prompt: str) -> AsyncIterator[dict]:
        # Same idea, different binary: spawn `codex`, scrub OPENAI_API_KEY,
        # parse its stream, yield the same event dicts.
        ...

Codex logs in against your ChatGPT plan and stores its session in its own auth.json. Wrap it the same way and it drops into the same slot. Your orchestrator holds a list[Executor] and doesn’t care which vendor or billing mode is behind each one. Run a task on subscription Claude, send the diff to subscription Codex for review, fall back to an API-key model for a third opinion: three logins, three payment methods, one interface. Production harnesses take this all the way, normalizing a dozen agents behind one run_turn contract. We built the seed.

What this is, and isn’t

Be honest about the edges:

Needs a real subscription and a CLI. Claude and Codex/ChatGPT qualify. API-only providers and gateways like OpenRouter have no plan to ride; they stay metered.
The plan has limits. Weekly caps, throughput ceilings, throttling. Fan one login across ten sub-agents and you hit them fast. Generous is not infinite.
Terms of service apply. Automating a personal subscription, or sharing one login across a team, can break the rules. “It works” and “you’re allowed at scale” are different sentences. Read the terms.
You depend on someone’s CLI. Output formats drift, flags change, credential paths move. The normalization layer absorbs some of it, not all.
Respect the token’s rhythm. Short-lived tokens with silent refresh are the CLI’s job. Don’t fight it by re-injecting stale env vars or killing it mid-refresh.

That’s the honest price of turning a consumer plan into infrastructure.

The pattern, in one breath

The vendor’s first-party CLI is the only client that carries your subscription. So make the CLI programmable instead of routing around it. Spawn it, scrub the environment so it can’t fall back to a metered key, read its stream into your own events, let it hold and refresh its credential, and wrap it behind one interface so every agent looks the same.

We reach for the API by reflex because it’s the programmable door and the CLI is the human one. For subscription-gated capability, the CLI is the only door that carries the entitlement. Walk through it and it becomes a real engine: authenticated, refreshed, composable, running the vendor’s best agent on the plan you already pay for.

Next: Fence the agent. You just gave an autonomous agent your shell, your disk, and the network. Before you walk away from it, put it in a box.