Managed Runs allow you to execute specs in the cloud without running them locally. Request a run, SpecMarket orchestrates everything, and you receive results via webhook or polling.
This is optional — the CLI (specmarket run) remains the primary way to execute specs. Managed Runs are a convenience tier for users who want hands-off execution.
Use Cases
✅ Good Fit for Managed Runs
- No local development environment — Running on a phone, shared machine, or server with limited resources
- Scheduled runs — “Run this spec daily at 6 AM” (coming in Phase 2)
- Batch processing — Submit 50 specs, get results later
- Hands-off execution — Start a run and don’t think about it; receive results via webhook
- Integration with other services — Trigger a spec run from a webhook, CI/CD pipeline, or scheduled job
❌ Not a Good Fit
- Development & iteration — Use the local CLI for debugging and refining
- Cost-sensitive workloads — Local runs (BYOK) are cheaper than platform-managed runs
- Specs requiring interactive input — Managed runs are fire-and-forget; no way to provide real-time feedback to the agent
- Sensitive environments — Your spec code and outputs run on SpecMarket servers; not suitable for highly confidential work
Pricing
Managed Runs use a simple, transparent pricing model.
Two Options
Option 1: BYOK (Bring Your Own Key)
- You provide your own OpenAI or Anthropic API key
- SpecMarket charges a flat $2 infrastructure fee per run
- You pay API token costs directly to your API provider
- Total cost: Your API tokens + $2 SpecMarket fee
- Creator earnings: $0 (no margin to share)
Option 2: Platform Key (SpecMarket Handles API Costs)
- SpecMarket provides the API key
- SpecMarket charges: max(estimated cost × 1.4, $5 minimum) per run
- Estimated cost comes from the spec’s
estimatedCostUsdfield (set by the creator) - Total cost: Transparent; you see the price before paying
- Creator earnings: 30% of the margin (price - estimated cost)
Pricing Example
Spec: “DocuSign Replacement”
- Estimated cost: $8.50
If you use BYOK:
- Your cost: ~$8.50 (your API key) + $2 (SpecMarket fee) = ~$10.50
- Creator gets: $0
If you use Platform Key:
- Charge: max($8.50 × 1.4, $5) = max($11.90, $5) = $11.90
- Your cost: $11.90
- Creator earnings: 30% × ($11.90 - $8.50) = 30% × $3.40 = $1.02
Pricing Guarantees
- No hidden charges — Only the quoted price
- No per-API-call fees — Flat run pricing
- No storage fees — Outputs are yours to download
- Refunds for failures — If a run fails before completion, you’re not charged
How It Works
1. Request a Run
const runId = await requestManagedRun({
specId: spec._id,
specVersion: "1.0.0", // Optional; defaults to current
model: "claude-opus-4-6", // Required: which model to use
maxLoops: 100, // Optional; defaults to 50
maxBudgetUsd: 50, // Optional; defaults to $50
userApiKeyEncrypted: "sk_enc...", // Optional: if using BYOK
});Response:
{
"managedRunId": "mr_abc123...",
"status": "pending_payment", // or "queued" if using BYOK
"estimatedCost": 11.90,
"creatorEarnings": 1.02
}If using the Platform Key option, status starts at pending_payment. Complete the Stripe checkout in the next step.
If using BYOK, status is immediately queued and your run will start when the coordinator service picks it up.
2. Payment (Platform Key Only)
If you’re using the platform key, complete Stripe checkout:
const checkout = await createManagedRunCheckout({
managedRunId: "mr_abc123...",
redirectUrl: "https://myapp.com/managed-runs/" + runId,
});
// Redirect to Stripe checkout
window.location.href = checkout.checkoutUrl;After payment confirmation, status transitions to queued.
3. Execution
The coordinator service (running on AWS EC2) polls for queued runs and:
- Creates a Docker container with the spec
- Injects environment variables and API keys
- Runs the Ralph Loop agent
- Captures output and metrics
- Stores results
Run statuses during execution:
queued— Waiting to be picked upprovisioning— Creating the containerrunning— Ralph Loop in progresssuccess— Completed successfullyfailure— All success criteria failedstall— Agent couldn’t make progressbudget_exceeded— Hit yourmaxBudgetUsdlimittimed_out— Exceeded SpecMarket’s 2-hour timeoutcancelled— You cancelled the run
4. Results
Poll for results:
const run = useQuery(api.managedRuns.getById, {
managedRunId: "mr_abc123...",
});
if (run.status === "success") {
console.log("Output path:", run.outputPath); // S3 or file server URL
console.log("Metrics:", run.metrics);
}Or receive webhooks (Phase 2):
// When you request the run, provide webhookUrl
const runId = await requestManagedRun({
specId: spec._id,
webhookUrl: "https://myapp.com/webhooks/managed-runs",
});SpecMarket POSTs to your webhook when the run completes:
{
"managedRunId": "mr_abc123...",
"status": "success",
"outputPath": "https://output.specmarket.dev/mr_abc123/output.zip",
"metrics": {
"totalLoops": 42,
"totalCostUsd": 11.45,
"totalTimeSeconds": 2400,
"successCriteria": [
{ "name": "app starts", "passed": true },
{ "name": "database works", "passed": true }
]
}
}Limits & Constraints
Execution Limits
- Max loops: 200 (you can set lower via
maxLoops) - Max runtime: 2 hours
- Max budget: $500 per run (you can set lower via
maxBudgetUsd) - Concurrent runs per user: 10 at a time
Environment
- File system: Specs can write to
/workspace(output directory) - Browser: Chromium (headless) and Playwright system deps are pre-installed in every sandbox
- Screenshots: Screenshots saved to
/workspace/screenshots/are uploaded to your dashboard in real-time - Playwright: Specs with Playwright tests will run them automatically. No Xvfb needed — Playwright runs in headless mode natively.
- Network: Full internet access (required for
npm install,git clone, external APIs, documentation). Blocked: host network, Docker API, cloud metadata endpoints. - Memory: 6GB per container (increased for Chromium headroom)
- CPU: 2 vCPU
- Shared memory: 1GB /dev/shm (required for Chromium)
- Disk: 50GB temporary storage (output must fit in output directory)
API Key Security
- BYOK keys are encrypted — Stored encrypted at rest, decrypted only during execution
- Platform key runs — SpecMarket’s key is used; no key transmission to your side
- Custom environment variables — Encrypted in transit and at rest
Cancellation
You can cancel a run while it’s in these statuses:
pending_payment— Payment not yet confirmedqueued— Waiting to startprovisioning— Container being createdrunning— Ralph Loop in progress
Once a run reaches a terminal status (success, failure, stall, budget_exceeded, timed_out), it cannot be cancelled.
Cancel a run:
await cancelManagedRun({
managedRunId: "mr_abc123...",
});If you cancel after the run has started execution, you’re still charged for the API costs incurred so far (BYOK) or the full platform fee (platform key).
Reliability & Retries
What Happens If Coordinator Fails?
The coordinator service is stateless and idempotent. If it crashes:
- Completed runs are already marked as done (no re-execution)
- In-progress runs resume from their checkpoint (if supported by the agent)
- Pending runs wait until the service restarts
Retry policy:
- Failed runs can be retried by requesting a new managed run
- You pay for each retry separately
What If Docker Container Crashes?
If the container crashes mid-run:
- The run status transitions to
error - Error message is logged (e.g., “Out of memory”)
- You can request a new run with higher
maxBudgetUsdor fewermaxLoops
Limits & Current Status
✅ Implemented (Backend Complete)
- Request a managed run (mutation)
- Cancel a run (mutation)
- Query run status and metrics
- BYOK and platform key pricing models
- Encryption for API keys and custom env vars
🟡 Scaffolded (Backend structure exists, not yet operational)
- Coordinator service orchestration
- Docker container management
- Agent execution runtime
- Webhook callbacks
- Scheduled runs (cron)
- Output storage (S3)
🔴 Not Yet Started
- Web UI for managed runs dashboard
- Advanced analytics and cost reporting
- Advanced scheduling (weekly, monthly runs)
Phase 2 Roadmap
Managed Runs will fully launch in Phase 2 after Phase 1 stable deployment. This includes:
- Coordinator service — Deploy to Fly.io, handle queueing and orchestration
- Docker integration — Build and execute spec containers reliably
- Output storage — S3 or compatible file server
- Webhooks — Delivery and retry logic
- Dashboard — Web UI for managed runs, job history, cost tracking
- Scheduled runs — Cron-like scheduling for recurring runs
Timeline: 4-6 weeks post-Phase-1-launch
Estimating Costs
The estimatedCostUsd field on each spec tells you the expected cost.
For BYOK runs:
- Your API cost + $2 = Total cost
- Use your API provider’s pricing calculator for token estimates
For platform key runs:
- max(estimatedCostUsd × 1.4, $5) = Your cost
- Margins vary; more complex specs = higher margins
Tips to reduce costs:
-
Use a cheaper model — If the spec’s
minModelallows it:// Instead of Opus model: "claude-sonnet-4" // Cheaper, sufficient for many specs -
Lower
maxLoops— Fewer iterations = lower cost. Default is 50:maxLoops: 25 // Faster and cheaper, but may not complete -
Lower
maxBudgetUsd— Set a hard limit:maxBudgetUsd: 10 // Stop early if costs exceed $10 -
Use BYOK — If you have volume discounts from your API provider, BYOK is cheaper
Troubleshooting
”Run stuck in pending_payment”
Your payment failed or timed out in Stripe checkout. Complete checkout again or request a new run.
”Run stuck in provisioning”
The coordinator service is experiencing issues. Wait 5 minutes, then cancel and retry.
”budget_exceeded status”
Your run hit the maxBudgetUsd limit. Request a new run with a higher budget.
”stall status”
The agent couldn’t make progress. Review the error message — usually means the spec’s success criteria are too strict or the agent needs guidance.
”Run output is missing or incomplete”
Output files may still be uploading. Wait a few seconds and refresh. If still missing, the container may have run out of disk space — request a new run with a simpler spec.
Security Considerations
✅ Safe
- Output files are private to your user account
- API keys are encrypted at rest and in transit
- Specs run in isolated Docker containers
- Network traffic is HTTPS only
⚠️ Caution
- Specs can access all environment variables you provide
- Specs can write files to
/workspace(your output directory) - Specs run with internet access (can make outbound HTTPS calls)
- Don’t run untrusted specs from unknown authors — Review the spec’s source files before running
API Reference
See API Reference — Managed Runs Module for:
request(...)— Initiate a managed runcancel(...)— Cancel a pending or running rungetById(...)— Fetch run details and statusgetForUser(...)— List your runscreateCheckoutSession(...)— Create Stripe checkout for platform key runs
FAQ
Q: Can I reuse my API key across multiple managed runs?
A: Yes. Provide userApiKeyEncrypted on each request; it’s securely stored and only decrypted during execution.
Q: What happens if my API key is invalid?
A: The run fails during provisioning (status: error) with a message like “Failed to authenticate with API provider.” No charge if using BYOK; full charge if using platform key.
Q: Can I cancel a run and get a refund?
A: Yes, if cancelled before the run starts (status: pending_payment or queued). Refunds for runs that started are prorated based on API tokens used.
Q: How long are outputs stored? A: 30 days. Download them to your own storage for long-term keeping.
Q: Can I schedule a spec to run daily? A: Not yet. Phase 2 will include cron-like scheduling.
Q: Is my source code safe? A: Yes. Specs run in isolated containers. SpecMarket never sees your output files’ source code unless you explicitly export it.
Getting Help
- API issues: See API Reference
- Pricing questions: Email support@specmarket.dev
- Bug reports: GitHub Issues
Managed Runs will be a first-class feature of SpecMarket. During Phase 2, expect rapid iteration and improvement based on user feedback.