When to reach for a Distributed Run
parallelism and the platform owns the fan-out: sharding, gang or opportunistic admission, per-shard status, and the concurrency budget. You no longer hand-roll Promise.all orchestration in your own app.Indexed Shards
Each shard gets a stable SYLPHX_SHARD_INDEX (0..N-1) + SYLPHX_SHARD_COUNT — self-select your partition, no coordinator needed
Gang Scheduling
All shards start together or none do — a shard never waits on absent peers (all-or-nothing admission)
Aggregate + Drill-down
One rolled-up status (succeeded / partial / failed) plus per-shard exit codes and logs
Shared Result Volume
Mount one ReadWriteMany volume into every shard at a well-known path — the merge convention for per-shard outputs
Same Security Envelope
Distribution changes shard count, not isolation — sandboxing, image scanning, and scoped secrets are per-shard
runDistributed (replaces client-side Promise.all)
The old pattern fanned out N independent RunsClient.run() calls with Promise.all and merged the results in your app. runDistributed moves all of that into the platform: one call, one handle, one aggregate result. Each shard reads SYLPHX_SHARD_INDEX to pick its slice of the work.
import { RunsClient, createClient } from '@sylphx/sdk'
const config = createClient(process.env.SYLPHX_URL!)
const folds = [0, 1, 2, 3, 4, 5, 6, 7, 8]
// One distributed job — the platform fans it out to 9 indexed shards.
const run = await RunsClient.runDistributed(config, {
image: 'ghcr.io/acme/trainer:sha-abc123',
// The container reads SYLPHX_SHARD_INDEX / SYLPHX_SHARD_COUNT to pick its fold.
command: ['python', 'train.py'],
parallelism: folds.length,
machine: 'large',
// Optional: use low-priority spare capacity and tolerate preemptions.
gangScheduling: false,
backoffLimitPerIndex: 12,
// A ReadWriteMany volume mounted into every shard is the result-merge convention.
volumeMounts: [{ volumeId: process.env.CACHE_VOLUME_ID!, mountPath: '/cache' }],
timeoutSeconds: 7200,
})
// .wait() resolves once the aggregate reaches a terminal state
// (succeeded | failed | partial | cancelled | timed_out).
const status = await run.wait()
const shards = await run.shards()
const failures = shards.filter((s) => s.status === 'failed')
if (failures.length > 0) {
console.error(`${failures.length}/${shards.length} shards failed`)
// Drill into one shard's logs:
const { stdout, stderr } = await run.logs({ shard: failures[0].index })
console.error(stderr)
}
console.log('Aggregate status:', status.status, '— succeeded:', status.succeeded)Shard self-selection
import os
shard = int(os.environ["SYLPHX_SHARD_INDEX"]) # 0 .. N-1
count = int(os.environ["SYLPHX_SHARD_COUNT"]) # N
symbols = load_universe()
my_symbols = symbols[shard::count] # every Nth symbol, offset by shard
train_and_write(my_symbols, out=f"/cache/scores-{shard}.parquet")CLI
The same primitive from the terminal. --parallelism turns a Run into a Distributed Run; --follow streams the aggregate + per-shard status table to completion.
# Fan out to 9 shards on large machines, follow the per-shard status table.
sylphx run \
--image ghcr.io/acme/trainer:sha-abc123 \
--parallelism 9 \
--opportunistic \
--backoff-limit-per-index 12 \
--machine large \
--timeout 7200 \
--follow \
-- python train.py
# Inspect a previously started run (aggregate + per-shard status).
sylphx runs get wrk_abc123
# Stream one shard's logs.
sylphx runs logs wrk_abc123 --shard 3Configuration Reference
| Property | Type | Description |
|---|---|---|
parallelism | number | Shard count, 1..256. Defaults to 1 (a single-shard Run). > 1 fans out to N indexed shards. |
completionMode | 'Indexed' | 'NonIndexed' | Defaults to Indexed when parallelism > 1, giving each shard a stable SYLPHX_SHARD_INDEX. |
gangScheduling | boolean | Defaults to true when parallelism > 1 — all shards are admitted together or none, so a shard never waits on absent peers. |
backoffLimitPerIndex | number | Optional 0..50 retry budget for opportunistic distributed runs only. Requires parallelism > 1 and gangScheduling false; omitted uses the platform default of 3. |
machine | 'nano' | 'micro' | 'small' | 'standard' | 'large' | 'xlarge' | '2xlarge' | 'mem-2xlarge' | Per-shard machine size — every shard runs on the same tier. Defaults to standard. |
volumeMounts | RunVolumeMount[] | Mounted into every shard. A ReadWriteMany volume is the cross-shard result-merge convention. |
timeoutSeconds | number | Per-shard hard deadline. Default 3600 (1h), max 86400 (24h). |
Machine Tiers
Every shard runs on the machine size you pick. Larger tiers cost proportionally more per compute-second (see Billing). One shard OOMs above its RAM ceiling — shard wide rather than reaching for the largest single machine.
| Property | Type | Description |
|---|---|---|
nano | 0.1 vCPU · 256 MiB | 0.125× standard rate |
micro | 0.25 vCPU · 512 MiB | 0.25× standard rate |
small | 0.5 vCPU · 1 GiB | 0.5× standard rate |
standard | 1 vCPU · 2 GiB | 1× — the reference tier |
large | 2 vCPU · 4 GiB | 2× standard rate |
xlarge | 4 vCPU · 8 GiB | 4× standard rate |
2xlarge | 8 vCPU · 16 GiB | 8× standard rate |
mem-2xlarge | 4 vCPU · 32 GiB | Memory-optimized — 4× standard rate |
GPU
Quotas
Distributed Runs are governed by a two-dimensional, plan-tier budget — a single big job is never throttled by a global run count, and no tenant can monopolize the pool:
| Property | Type | Description |
|---|---|---|
Per-job parallelism | Plan ceiling | The maximum shards one job may request. Free is single-shard only (1); Pro / Team / Enterprise unlock the full 256-shard contract maximum. |
Org concurrent shards | Plan budget | The maximum shards running at once across ALL your Runs. A 100-shard job consumes 100 of this budget; a single-shard run consumes 1. |
429 when a budget is exceeded
429 Too Many Requests with an actionable message. Reduce parallelism, wait for running shards to finish, or upgrade your plan.Billing Model
Distributed Runs meter under the same compute-second model as single-shard Runs — there is no unmetered capability. Usage is the sum across shards, weighted by machine size:
compute_seconds = wall_clock_seconds × shard_count × machine_units
# machine_units normalizes a tier's vCPU to the standard tier (1 vCPU = 1.0):
# standard = 1.0 large = 2.0 xlarge = 4.0 small = 0.5 nano = 0.125