Managed Runs — Cloud-Hosted Spec Execution

Managed Runs allow you to execute specs in the cloud without running them locally. Request a run, SpecMarket orchestrates everything, and you receive results via webhook or polling.

This is optional — the CLI (specmarket run) remains the primary way to execute specs. Managed Runs are a convenience tier for users who want hands-off execution.

Use Cases

✅ Good Fit for Managed Runs

No local development environment — Running on a phone, shared machine, or server with limited resources
Scheduled runs — “Run this spec daily at 6 AM” (coming in Phase 2)
Batch processing — Submit 50 specs, get results later
Hands-off execution — Start a run and don’t think about it; receive results via webhook
Integration with other services — Trigger a spec run from a webhook, CI/CD pipeline, or scheduled job

❌ Not a Good Fit

Development & iteration — Use the local CLI for debugging and refining
Cost-sensitive workloads — Local runs (BYOK) are cheaper than platform-managed runs
Specs requiring interactive input — Managed runs are fire-and-forget; no way to provide real-time feedback to the agent
Sensitive environments — Your spec code and outputs run on SpecMarket servers; not suitable for highly confidential work

Pricing

Managed Runs use a simple, transparent pricing model.

Two Options

Option 1: BYOK (Bring Your Own Key)

You provide your own OpenAI or Anthropic API key
SpecMarket charges a flat $2 infrastructure fee per run
You pay API token costs directly to your API provider
Total cost: Your API tokens + $2 SpecMarket fee
Creator earnings: $0 (no margin to share)

Option 2: Platform Key (SpecMarket Handles API Costs)

SpecMarket provides the API key
SpecMarket charges: max(estimated cost × 1.4, $5 minimum) per run
Estimated cost comes from the spec’s estimatedCostUsd field (set by the creator)
Total cost: Transparent; you see the price before paying
Creator earnings: 30% of the margin (price - estimated cost)

Pricing Example

Spec: “DocuSign Replacement”

Estimated cost: $8.50

If you use BYOK:

Your cost: ~$8.50 (your API key) + $2 (SpecMarket fee) = ~$10.50
Creator gets: $0

If you use Platform Key:

Charge: max($8.50 × 1.4, $5) = max($11.90, $5) = $11.90
Your cost: $11.90
Creator earnings: 30% × ($11.90 - $8.50) = 30% × $3.40 = $1.02

Pricing Guarantees

No hidden charges — Only the quoted price
No per-API-call fees — Flat run pricing
No storage fees — Outputs are yours to download
Refunds for failures — If a run fails before completion, you’re not charged

How It Works

1. Request a Run

const runId = await requestManagedRun({
  specId: spec._id,
  specVersion: "1.0.0",           // Optional; defaults to current
  model: "claude-opus-4-6",       // Required: which model to use
  maxLoops: 100,                  // Optional; defaults to 50
  maxBudgetUsd: 50,               // Optional; defaults to $50
  userApiKeyEncrypted: "sk_enc...",  // Optional: if using BYOK
});

Response:

{
  "managedRunId": "mr_abc123...",
  "status": "pending_payment",  // or "queued" if using BYOK
  "estimatedCost": 11.90,
  "creatorEarnings": 1.02
}

If using the Platform Key option, status starts at pending_payment. Complete the Stripe checkout in the next step.

If using BYOK, status is immediately queued and your run will start when the coordinator service picks it up.

2. Payment (Platform Key Only)

If you’re using the platform key, complete Stripe checkout:

const checkout = await createManagedRunCheckout({
  managedRunId: "mr_abc123...",
  redirectUrl: "https://myapp.com/managed-runs/" + runId,
});
 
// Redirect to Stripe checkout
window.location.href = checkout.checkoutUrl;

After payment confirmation, status transitions to queued.

3. Execution

The coordinator service (running on AWS EC2) polls for queued runs and:

Creates a Docker container with the spec
Injects environment variables and API keys
Runs the Ralph Loop agent
Captures output and metrics
Stores results

Run statuses during execution:

queued — Waiting to be picked up
provisioning — Creating the container
running — Ralph Loop in progress
success — Completed successfully
failure — All success criteria failed
stall — Agent couldn’t make progress
budget_exceeded — Hit your maxBudgetUsd limit
timed_out — Exceeded SpecMarket’s 2-hour timeout
cancelled — You cancelled the run

4. Results

Poll for results:

const run = useQuery(api.managedRuns.getById, {
  managedRunId: "mr_abc123...",
});
 
if (run.status === "success") {
  console.log("Output path:", run.outputPath);  // S3 or file server URL
  console.log("Metrics:", run.metrics);
}

Or receive webhooks (Phase 2):

// When you request the run, provide webhookUrl
const runId = await requestManagedRun({
  specId: spec._id,
  webhookUrl: "https://myapp.com/webhooks/managed-runs",
});

SpecMarket POSTs to your webhook when the run completes:

{
  "managedRunId": "mr_abc123...",
  "status": "success",
  "outputPath": "https://output.specmarket.dev/mr_abc123/output.zip",
  "metrics": {
    "totalLoops": 42,
    "totalCostUsd": 11.45,
    "totalTimeSeconds": 2400,
    "successCriteria": [
      { "name": "app starts", "passed": true },
      { "name": "database works", "passed": true }
    ]
  }
}

Limits & Constraints

Execution Limits

Max loops: 200 (you can set lower via maxLoops)
Max runtime: 2 hours
Max budget: $500 per run (you can set lower via maxBudgetUsd)
Concurrent runs per user: 10 at a time

Environment

File system: Specs can write to /workspace (output directory)
Browser: Chromium (headless) and Playwright system deps are pre-installed in every sandbox
Screenshots: Screenshots saved to /workspace/screenshots/ are uploaded to your dashboard in real-time
Playwright: Specs with Playwright tests will run them automatically. No Xvfb needed — Playwright runs in headless mode natively.
Network: Full internet access (required for npm install, git clone, external APIs, documentation). Blocked: host network, Docker API, cloud metadata endpoints.
Memory: 6GB per container (increased for Chromium headroom)
CPU: 2 vCPU
Shared memory: 1GB /dev/shm (required for Chromium)
Disk: 50GB temporary storage (output must fit in output directory)

API Key Security

BYOK keys are encrypted — Stored encrypted at rest, decrypted only during execution
Platform key runs — SpecMarket’s key is used; no key transmission to your side
Custom environment variables — Encrypted in transit and at rest

Cancellation

You can cancel a run while it’s in these statuses:

pending_payment — Payment not yet confirmed
queued — Waiting to start
provisioning — Container being created
running — Ralph Loop in progress

Once a run reaches a terminal status (success, failure, stall, budget_exceeded, timed_out), it cannot be cancelled.

Cancel a run:

await cancelManagedRun({
  managedRunId: "mr_abc123...",
});

If you cancel after the run has started execution, you’re still charged for the API costs incurred so far (BYOK) or the full platform fee (platform key).

Reliability & Retries

What Happens If Coordinator Fails?

The coordinator service is stateless and idempotent. If it crashes:

Completed runs are already marked as done (no re-execution)
In-progress runs resume from their checkpoint (if supported by the agent)
Pending runs wait until the service restarts

Retry policy:

Failed runs can be retried by requesting a new managed run
You pay for each retry separately

What If Docker Container Crashes?

If the container crashes mid-run:

The run status transitions to error
Error message is logged (e.g., “Out of memory”)
You can request a new run with higher maxBudgetUsd or fewer maxLoops

Limits & Current Status

✅ Implemented (Backend Complete)

Request a managed run (mutation)
Cancel a run (mutation)
Query run status and metrics
BYOK and platform key pricing models
Encryption for API keys and custom env vars

🟡 Scaffolded (Backend structure exists, not yet operational)

Coordinator service orchestration
Docker container management
Agent execution runtime
Webhook callbacks
Scheduled runs (cron)
Output storage (S3)

🔴 Not Yet Started

Web UI for managed runs dashboard
Advanced analytics and cost reporting
Advanced scheduling (weekly, monthly runs)

Phase 2 Roadmap

Managed Runs will fully launch in Phase 2 after Phase 1 stable deployment. This includes:

Coordinator service — Deploy to Fly.io, handle queueing and orchestration
Docker integration — Build and execute spec containers reliably
Output storage — S3 or compatible file server
Webhooks — Delivery and retry logic
Dashboard — Web UI for managed runs, job history, cost tracking
Scheduled runs — Cron-like scheduling for recurring runs

Timeline: 4-6 weeks post-Phase-1-launch

Estimating Costs

The estimatedCostUsd field on each spec tells you the expected cost.

For BYOK runs:

Your API cost + $2 = Total cost
Use your API provider’s pricing calculator for token estimates

For platform key runs:

max(estimatedCostUsd × 1.4, $5) = Your cost
Margins vary; more complex specs = higher margins

Tips to reduce costs:

Use a cheaper model — If the spec’s minModel allows it:

// Instead of Opus
model: "claude-sonnet-4"  // Cheaper, sufficient for many specs

Lower maxLoops — Fewer iterations = lower cost. Default is 50:
```
maxLoops: 25  // Faster and cheaper, but may not complete
```

Lower maxBudgetUsd — Set a hard limit:

maxBudgetUsd: 10  // Stop early if costs exceed $10

Use BYOK — If you have volume discounts from your API provider, BYOK is cheaper

Troubleshooting

”Run stuck in pending_payment”

Your payment failed or timed out in Stripe checkout. Complete checkout again or request a new run.

”Run stuck in provisioning”

The coordinator service is experiencing issues. Wait 5 minutes, then cancel and retry.

”budget_exceeded status”

Your run hit the maxBudgetUsd limit. Request a new run with a higher budget.

”stall status”

The agent couldn’t make progress. Review the error message — usually means the spec’s success criteria are too strict or the agent needs guidance.

”Run output is missing or incomplete”

Output files may still be uploading. Wait a few seconds and refresh. If still missing, the container may have run out of disk space — request a new run with a simpler spec.

Security Considerations

✅ Safe

Output files are private to your user account
API keys are encrypted at rest and in transit
Specs run in isolated Docker containers
Network traffic is HTTPS only

⚠️ Caution

Specs can access all environment variables you provide
Specs can write files to /workspace (your output directory)
Specs run with internet access (can make outbound HTTPS calls)
Don’t run untrusted specs from unknown authors — Review the spec’s source files before running

API Reference

See API Reference — Managed Runs Module for:

request(...) — Initiate a managed run
cancel(...) — Cancel a pending or running run
getById(...) — Fetch run details and status
getForUser(...) — List your runs
createCheckoutSession(...) — Create Stripe checkout for platform key runs

FAQ

Q: Can I reuse my API key across multiple managed runs? A: Yes. Provide userApiKeyEncrypted on each request; it’s securely stored and only decrypted during execution.

Q: What happens if my API key is invalid? A: The run fails during provisioning (status: error) with a message like “Failed to authenticate with API provider.” No charge if using BYOK; full charge if using platform key.

Q: Can I cancel a run and get a refund? A: Yes, if cancelled before the run starts (status: pending_payment or queued). Refunds for runs that started are prorated based on API tokens used.

Q: How long are outputs stored? A: 30 days. Download them to your own storage for long-term keeping.

Q: Can I schedule a spec to run daily? A: Not yet. Phase 2 will include cron-like scheduling.

Q: Is my source code safe? A: Yes. Specs run in isolated containers. SpecMarket never sees your output files’ source code unless you explicitly export it.

Getting Help

API issues: See API Reference
Pricing questions: Email support@specmarket.dev
Bug reports: GitHub Issues

Managed Runs will be a first-class feature of SpecMarket. During Phase 2, expect rapid iteration and improvement based on user feedback.