Skip to content

Managed Runs allow you to execute specs in the cloud without running them locally. Request a run, SpecMarket orchestrates everything, and you receive results via webhook or polling.

This is optional — the CLI (specmarket run) remains the primary way to execute specs. Managed Runs are a convenience tier for users who want hands-off execution.


Use Cases

✅ Good Fit for Managed Runs

  • No local development environment — Running on a phone, shared machine, or server with limited resources
  • Scheduled runs — “Run this spec daily at 6 AM” (coming in Phase 2)
  • Batch processing — Submit 50 specs, get results later
  • Hands-off execution — Start a run and don’t think about it; receive results via webhook
  • Integration with other services — Trigger a spec run from a webhook, CI/CD pipeline, or scheduled job

❌ Not a Good Fit

  • Development & iteration — Use the local CLI for debugging and refining
  • Cost-sensitive workloads — Local runs (BYOK) are cheaper than platform-managed runs
  • Specs requiring interactive input — Managed runs are fire-and-forget; no way to provide real-time feedback to the agent
  • Sensitive environments — Your spec code and outputs run on SpecMarket servers; not suitable for highly confidential work

Pricing

Managed Runs use a simple, transparent pricing model.

Two Options

Option 1: BYOK (Bring Your Own Key)

  • You provide your own OpenAI or Anthropic API key
  • SpecMarket charges a flat $2 infrastructure fee per run
  • You pay API token costs directly to your API provider
  • Total cost: Your API tokens + $2 SpecMarket fee
  • Creator earnings: $0 (no margin to share)

Option 2: Platform Key (SpecMarket Handles API Costs)

  • SpecMarket provides the API key
  • SpecMarket charges: max(estimated cost × 1.4, $5 minimum) per run
  • Estimated cost comes from the spec’s estimatedCostUsd field (set by the creator)
  • Total cost: Transparent; you see the price before paying
  • Creator earnings: 30% of the margin (price - estimated cost)

Pricing Example

Spec: “DocuSign Replacement”

  • Estimated cost: $8.50

If you use BYOK:

  • Your cost: ~$8.50 (your API key) + $2 (SpecMarket fee) = ~$10.50
  • Creator gets: $0

If you use Platform Key:

  • Charge: max($8.50 × 1.4, $5) = max($11.90, $5) = $11.90
  • Your cost: $11.90
  • Creator earnings: 30% × ($11.90 - $8.50) = 30% × $3.40 = $1.02

Pricing Guarantees

  • No hidden charges — Only the quoted price
  • No per-API-call fees — Flat run pricing
  • No storage fees — Outputs are yours to download
  • Refunds for failures — If a run fails before completion, you’re not charged

How It Works

1. Request a Run

const runId = await requestManagedRun({
  specId: spec._id,
  specVersion: "1.0.0",           // Optional; defaults to current
  model: "claude-opus-4-6",       // Required: which model to use
  maxLoops: 100,                  // Optional; defaults to 50
  maxBudgetUsd: 50,               // Optional; defaults to $50
  userApiKeyEncrypted: "sk_enc...",  // Optional: if using BYOK
});

Response:

{
  "managedRunId": "mr_abc123...",
  "status": "pending_payment",  // or "queued" if using BYOK
  "estimatedCost": 11.90,
  "creatorEarnings": 1.02
}

If using the Platform Key option, status starts at pending_payment. Complete the Stripe checkout in the next step.

If using BYOK, status is immediately queued and your run will start when the coordinator service picks it up.

2. Payment (Platform Key Only)

If you’re using the platform key, complete Stripe checkout:

const checkout = await createManagedRunCheckout({
  managedRunId: "mr_abc123...",
  redirectUrl: "https://myapp.com/managed-runs/" + runId,
});
 
// Redirect to Stripe checkout
window.location.href = checkout.checkoutUrl;

After payment confirmation, status transitions to queued.

3. Execution

The coordinator service (running on AWS EC2) polls for queued runs and:

  1. Creates a Docker container with the spec
  2. Injects environment variables and API keys
  3. Runs the Ralph Loop agent
  4. Captures output and metrics
  5. Stores results

Run statuses during execution:

  • queued — Waiting to be picked up
  • provisioning — Creating the container
  • running — Ralph Loop in progress
  • success — Completed successfully
  • failure — All success criteria failed
  • stall — Agent couldn’t make progress
  • budget_exceeded — Hit your maxBudgetUsd limit
  • timed_out — Exceeded SpecMarket’s 2-hour timeout
  • cancelled — You cancelled the run

4. Results

Poll for results:

const run = useQuery(api.managedRuns.getById, {
  managedRunId: "mr_abc123...",
});
 
if (run.status === "success") {
  console.log("Output path:", run.outputPath);  // S3 or file server URL
  console.log("Metrics:", run.metrics);
}

Or receive webhooks (Phase 2):

// When you request the run, provide webhookUrl
const runId = await requestManagedRun({
  specId: spec._id,
  webhookUrl: "https://myapp.com/webhooks/managed-runs",
});

SpecMarket POSTs to your webhook when the run completes:

{
  "managedRunId": "mr_abc123...",
  "status": "success",
  "outputPath": "https://output.specmarket.dev/mr_abc123/output.zip",
  "metrics": {
    "totalLoops": 42,
    "totalCostUsd": 11.45,
    "totalTimeSeconds": 2400,
    "successCriteria": [
      { "name": "app starts", "passed": true },
      { "name": "database works", "passed": true }
    ]
  }
}

Limits & Constraints

Execution Limits

  • Max loops: 200 (you can set lower via maxLoops)
  • Max runtime: 2 hours
  • Max budget: $500 per run (you can set lower via maxBudgetUsd)
  • Concurrent runs per user: 10 at a time

Environment

  • File system: Specs can write to /workspace (output directory)
  • Browser: Chromium (headless) and Playwright system deps are pre-installed in every sandbox
  • Screenshots: Screenshots saved to /workspace/screenshots/ are uploaded to your dashboard in real-time
  • Playwright: Specs with Playwright tests will run them automatically. No Xvfb needed — Playwright runs in headless mode natively.
  • Network: Full internet access (required for npm install, git clone, external APIs, documentation). Blocked: host network, Docker API, cloud metadata endpoints.
  • Memory: 6GB per container (increased for Chromium headroom)
  • CPU: 2 vCPU
  • Shared memory: 1GB /dev/shm (required for Chromium)
  • Disk: 50GB temporary storage (output must fit in output directory)

API Key Security

  • BYOK keys are encrypted — Stored encrypted at rest, decrypted only during execution
  • Platform key runs — SpecMarket’s key is used; no key transmission to your side
  • Custom environment variables — Encrypted in transit and at rest

Cancellation

You can cancel a run while it’s in these statuses:

  • pending_payment — Payment not yet confirmed
  • queued — Waiting to start
  • provisioning — Container being created
  • running — Ralph Loop in progress

Once a run reaches a terminal status (success, failure, stall, budget_exceeded, timed_out), it cannot be cancelled.

Cancel a run:

await cancelManagedRun({
  managedRunId: "mr_abc123...",
});

If you cancel after the run has started execution, you’re still charged for the API costs incurred so far (BYOK) or the full platform fee (platform key).


Reliability & Retries

What Happens If Coordinator Fails?

The coordinator service is stateless and idempotent. If it crashes:

  1. Completed runs are already marked as done (no re-execution)
  2. In-progress runs resume from their checkpoint (if supported by the agent)
  3. Pending runs wait until the service restarts

Retry policy:

  • Failed runs can be retried by requesting a new managed run
  • You pay for each retry separately

What If Docker Container Crashes?

If the container crashes mid-run:

  1. The run status transitions to error
  2. Error message is logged (e.g., “Out of memory”)
  3. You can request a new run with higher maxBudgetUsd or fewer maxLoops

Limits & Current Status

✅ Implemented (Backend Complete)

  • Request a managed run (mutation)
  • Cancel a run (mutation)
  • Query run status and metrics
  • BYOK and platform key pricing models
  • Encryption for API keys and custom env vars

🟡 Scaffolded (Backend structure exists, not yet operational)

  • Coordinator service orchestration
  • Docker container management
  • Agent execution runtime
  • Webhook callbacks
  • Scheduled runs (cron)
  • Output storage (S3)

🔴 Not Yet Started

  • Web UI for managed runs dashboard
  • Advanced analytics and cost reporting
  • Advanced scheduling (weekly, monthly runs)

Phase 2 Roadmap

Managed Runs will fully launch in Phase 2 after Phase 1 stable deployment. This includes:

  1. Coordinator service — Deploy to Fly.io, handle queueing and orchestration
  2. Docker integration — Build and execute spec containers reliably
  3. Output storage — S3 or compatible file server
  4. Webhooks — Delivery and retry logic
  5. Dashboard — Web UI for managed runs, job history, cost tracking
  6. Scheduled runs — Cron-like scheduling for recurring runs

Timeline: 4-6 weeks post-Phase-1-launch


Estimating Costs

The estimatedCostUsd field on each spec tells you the expected cost.

For BYOK runs:

  • Your API cost + $2 = Total cost
  • Use your API provider’s pricing calculator for token estimates

For platform key runs:

  • max(estimatedCostUsd × 1.4, $5) = Your cost
  • Margins vary; more complex specs = higher margins

Tips to reduce costs:

  1. Use a cheaper model — If the spec’s minModel allows it:

    // Instead of Opus
    model: "claude-sonnet-4"  // Cheaper, sufficient for many specs
  2. Lower maxLoops — Fewer iterations = lower cost. Default is 50:

    maxLoops: 25  // Faster and cheaper, but may not complete
  3. Lower maxBudgetUsd — Set a hard limit:

    maxBudgetUsd: 10  // Stop early if costs exceed $10
  4. Use BYOK — If you have volume discounts from your API provider, BYOK is cheaper


Troubleshooting

”Run stuck in pending_payment”

Your payment failed or timed out in Stripe checkout. Complete checkout again or request a new run.

”Run stuck in provisioning”

The coordinator service is experiencing issues. Wait 5 minutes, then cancel and retry.

”budget_exceeded status”

Your run hit the maxBudgetUsd limit. Request a new run with a higher budget.

”stall status”

The agent couldn’t make progress. Review the error message — usually means the spec’s success criteria are too strict or the agent needs guidance.

”Run output is missing or incomplete”

Output files may still be uploading. Wait a few seconds and refresh. If still missing, the container may have run out of disk space — request a new run with a simpler spec.


Security Considerations

✅ Safe

  • Output files are private to your user account
  • API keys are encrypted at rest and in transit
  • Specs run in isolated Docker containers
  • Network traffic is HTTPS only

⚠️ Caution

  • Specs can access all environment variables you provide
  • Specs can write files to /workspace (your output directory)
  • Specs run with internet access (can make outbound HTTPS calls)
  • Don’t run untrusted specs from unknown authors — Review the spec’s source files before running

API Reference

See API Reference — Managed Runs Module for:

  • request(...) — Initiate a managed run
  • cancel(...) — Cancel a pending or running run
  • getById(...) — Fetch run details and status
  • getForUser(...) — List your runs
  • createCheckoutSession(...) — Create Stripe checkout for platform key runs

FAQ

Q: Can I reuse my API key across multiple managed runs? A: Yes. Provide userApiKeyEncrypted on each request; it’s securely stored and only decrypted during execution.

Q: What happens if my API key is invalid? A: The run fails during provisioning (status: error) with a message like “Failed to authenticate with API provider.” No charge if using BYOK; full charge if using platform key.

Q: Can I cancel a run and get a refund? A: Yes, if cancelled before the run starts (status: pending_payment or queued). Refunds for runs that started are prorated based on API tokens used.

Q: How long are outputs stored? A: 30 days. Download them to your own storage for long-term keeping.

Q: Can I schedule a spec to run daily? A: Not yet. Phase 2 will include cron-like scheduling.

Q: Is my source code safe? A: Yes. Specs run in isolated containers. SpecMarket never sees your output files’ source code unless you explicitly export it.


Getting Help

Managed Runs will be a first-class feature of SpecMarket. During Phase 2, expect rapid iteration and improvement based on user feedback.