@camunda8/orchestration-cluster-api
    Preparing search index...

    @camunda8/orchestration-cluster-api

    Camunda 8 Orchestration Cluster TypeScript SDK

    Type‑safe, promise‑based client for the Camunda 8 Orchestration Cluster REST API.

    • Strong TypeScript models (requests, responses, discriminated unions)
    • Branded key types to prevent mixing IDs at compile time
    • Optional request/response schema validation (Zod) via a single env variable
    • OAuth2 client‑credentials & Basic auth (token cache, early refresh, jittered retry, singleflight)
    • Optional mTLS (Node) with inline or *_PATH environment variables
    • Cancelable promises for all operations
    • Eventual consistency helper for polling endpoints
    • Immutable, deep‑frozen configuration accessible through a factory‑created client instance
    • Automatic body-level tenantId defaulting: if a request body supports an optional tenantId and you omit it, the SDK fills it from CAMUNDA_DEFAULT_TENANT_ID (path params are never auto-filled)
    • Automatic transient HTTP retry (429, 503, network) with exponential backoff + full jitter (configurable via CAMUNDA_SDK_HTTP_RETRY*). Non-retryable 500s fail fast. Pluggable strategy surface (default uses p-retry when available, internal fallback otherwise).
    npm install @camunda8/orchestration-cluster-api
    

    Runtime support:

    • Node 20+ (native fetch & File; Node 18 needs global File polyfill)
    • Modern browsers (Chromium, Firefox, Safari) – global fetch & File available

    For older Node versions supply a fetch ponyfill AND a File shim (or upgrade). For legacy browsers, add a fetch polyfill (e.g. whatwg-fetch).

    Keep configuration out of application code. Let the factory read CAMUNDA_* variables from the environment (12‑factor style). This makes rotation, secret management, and environment promotion safer & simpler.

    import createCamundaClient from '@camunda8/orchestration-cluster-api';

    // Zero‑config construction: reads CAMUNDA_* from process.env once.
    const camunda = createCamundaClient();

    const topology = await camunda.getTopology();
    console.log('Brokers:', topology.brokers?.length ?? 0);

    Typical .env (example):

    CAMUNDA_REST_ADDRESS=https://cluster.example   # SDK will use https://cluster.example/v2/... unless /v2 already present
    CAMUNDA_AUTH_STRATEGY=OAUTH
    CAMUNDA_CLIENT_ID=***
    CAMUNDA_CLIENT_SECRET=***
    CAMUNDA_DEFAULT_TENANT_ID=<default> # optional: override default tenant resolution
    CAMUNDA_SDK_HTTP_RETRY_MAX_ATTEMPTS=4 # optional: total attempts (initial + 3 retries)
    CAMUNDA_SDK_HTTP_RETRY_BASE_DELAY_MS=100 # optional: base backoff (ms)
    CAMUNDA_SDK_HTTP_RETRY_MAX_DELAY_MS=2000 # optional: cap (ms)

    Prefer environment / secret manager injection over hard‑coding values in source. Treat the SDK like a leaf dependency: construct once near process start, pass the instance where needed.

    Why zero‑config?

    • Separation of concerns: business code depends on an interface, not on secret/constants wiring.
    • 12‑Factor alignment: config lives in the environment → simpler promotion (dev → staging → prod).
    • Secret rotation & incident response: rotate credentials without a code change or redeploy of application containers built with baked‑in values.
    • Immutable start: single hydration pass prevents drift / mid‑request mutations.
    • Test ergonomics: swap an .env.test (or injected vars) without touching source; create multiple clients for multi‑tenant tests.
    • Security review: fewer code paths handling secrets; scanners & vault tooling work at the boundary.
    • Deploy portability: same artifact runs everywhere; only the environment differs.
    • Observability clarity: configuration diffing is an ops concern, not an application code diff.

    Use only when you must supply or mutate configuration dynamically (e.g. multi‑tenant routing, tests, ephemeral preview environments) or in the browser. Keys mirror their CAMUNDA_* env names.

    const camunda = createCamundaClient({
    config: {
    CAMUNDA_REST_ADDRESS: 'https://cluster.example',
    CAMUNDA_AUTH_STRATEGY: 'BASIC',
    CAMUNDA_BASIC_AUTH_USERNAME: 'alice',
    CAMUNDA_BASIC_AUTH_PASSWORD: 'secret',
    },
    });

    Inject a custom fetch to add tracing, mock responses, instrumentation, circuit breakers, etc.

    const camunda = createCamundaClient({
    fetch: (input, init) => {
    // inspect / modify request here
    return fetch(input, init);
    },
    });

    You can call client.configure({ config: { ... } }) to re‑hydrate. The exposed client.getConfig() stays Readonly and deep‑frozen. Prefer creating a new client instead of mutating a shared one in long‑lived services.

    Controlled by CAMUNDA_SDK_VALIDATION (or config override). Grammar:

    none | warn | strict | req:<mode>[,res:<mode>] | res:<mode>[,req:<mode>]
    <mode> = none|warn|strict

    Examples:

    CAMUNDA_SDK_VALIDATION=warn           # warn on both
    CAMUNDA_SDK_VALIDATION=req:strict,res:warn
    CAMUNDA_SDK_VALIDATION=none

    Behavior:

    The SDK includes built‑in transient HTTP retry (429, 503, network errors) using a p‑retry based engine plus a fallback implementation. For advanced resilience patterns (circuit breakers, timeouts, custom classification, combining policies) you can integrate cockatiel.

    • You need different retry policies per operation (e.g. idempotent GET vs mutating POST)
    • You want circuit breaking, hedging, timeout, or bulkhead controls
    • You want to add custom classification (e.g. retry certain 5xx only on safe verbs)

    Set CAMUNDA_SDK_HTTP_RETRY_MAX_ATTEMPTS=1 so the SDK does only the initial attempt; then wrap operations with cockatiel.

    import { createCamundaClient } from '@camunda8/orchestration-cluster-api';
    import { retry, ExponentialBackoff, handleAll } from 'cockatiel';

    const client = createCamundaClient({
    config: {
    CAMUNDA_REST_ADDRESS: 'https://cluster.example',
    CAMUNDA_AUTH_STRATEGY: 'NONE',
    CAMUNDA_SDK_HTTP_RETRY_MAX_ATTEMPTS: 1, // disable SDK automatic retries
    } as any,
    });

    // Policy: up to 5 attempts total (1 + 4 retries) with exponential backoff & jitter
    const policy = retry(handleAll, {
    maxAttempts: 5,
    backoff: new ExponentialBackoff({ initialDelay: 100, maxDelay: 2000, jitter: true }),
    });

    // Wrap getTopology
    const origGetTopology = client.getTopology.bind(client);
    client.getTopology = (() => policy.execute(() => origGetTopology())) as any;

    const topo = await client.getTopology();
    console.log(topo.brokers?.length);
    import { createCamundaClient } from '@camunda8/orchestration-cluster-api';
    import { retry, ExponentialBackoff, handleAll } from 'cockatiel';

    const client = createCamundaClient({
    config: {
    CAMUNDA_REST_ADDRESS: 'https://cluster.example',
    CAMUNDA_AUTH_STRATEGY: 'OAUTH',
    CAMUNDA_CLIENT_ID: process.env.CAMUNDA_CLIENT_ID,
    CAMUNDA_CLIENT_SECRET: process.env.CAMUNDA_CLIENT_SECRET,
    CAMUNDA_OAUTH_URL: process.env.CAMUNDA_OAUTH_URL,
    CAMUNDA_TOKEN_AUDIENCE: 'zeebe.camunda.io',
    CAMUNDA_SDK_HTTP_RETRY_MAX_ATTEMPTS: 1,
    } as any,
    });

    const retryPolicy = retry(handleAll, {
    maxAttempts: 4,
    backoff: new ExponentialBackoff({ initialDelay: 150, maxDelay: 2500, jitter: true }),
    });

    const skip = new Set([
    'logger',
    'configure',
    'getConfig',
    'withCorrelation',
    'deployResourcesFromFiles',
    ]);
    for (const key of Object.keys(client)) {
    const val: any = (client as any)[key];
    if (typeof val === 'function' && !key.startsWith('_') && !skip.has(key)) {
    const original = val.bind(client);
    (client as any)[key] = (...a: any[]) => retryPolicy.execute(() => original(...a));
    }
    }

    // Now every public operation is wrapped.

    For diagnostics during support interactions you can enable an auxiliary file logger that captures a sanitized snapshot of environment & configuration plus selected runtime events.

    Enable by setting one of:

    CAMUNDA_SUPPORT_LOG_ENABLED=true        # canonical
    

    Optional override for output path (default is ./camunda-support.log in the current working directory):

    CAMUNDA_SUPPORT_LOG_FILE_PATH=/var/log/camunda-support.log
    

    Behavior:

    • File is created eagerly on first client construction (one per process; if the path exists a numeric suffix is appended to avoid clobbering).
    • Initial preamble includes SDK package version, timestamp, and redacted environment snapshot.
    • Secrets (client secret, passwords, mTLS private key, etc.) are automatically masked or truncated.
    • Designed to be low‑impact: append‑only, newline‑delimited JSON records may be added in future releases for deeper inspection (current version writes the preamble only unless additional events are wired).

    Recommended usage:

    CAMUNDA_SUPPORT_LOG_ENABLED=1 CAMUNDA_SDK_LOG_LEVEL=debug node app.js
    

    Keep the file only as long as needed for troubleshooting; it may contain sensitive non‑secret operational metadata. Do not commit it to version control.

    To disable, unset the env variable or set CAMUNDA_SUPPORT_LOG_ENABLED=false.

    Refer to ./docs/CONFIG_REFERENCE.md for the full list of related environment variables.

    We welcome issues and pull requests. Please read the CONTRIBUTING.md guide before opening a PR to understand:

    • Deterministic builds policy (no committed timestamps) – see CONTRIBUTING
    • Commit message conventions (Conventional Commits with enforced subject length)
    • Release workflow & how to dry‑run semantic‑release locally
    • Testing strategy (unit vs integration)
    • Performance and security considerations

    If you plan to help migrate to npm Trusted Publishing (OIDC), open an issue so we can coordinate workflow permission changes (id-token: write) and removal of the legacy NPM_TOKEN secret.

    Retry only network errors + 429/503, plus optionally 500 on safe GET endpoints you mark:

    import { retry, ExponentialBackoff, handleWhen } from 'cockatiel';

    const classify = handleWhen((err) => {
    const status = (err as any)?.status;
    if (status === 429 || status === 503) return true;
    if (status === 500 && (err as any).__opVerb === 'GET') return true; // custom tagging optional
    return err?.name === 'TypeError'; // network errors from fetch
    });

    const policy = retry(classify, {
    maxAttempts: 5,
    backoff: new ExponentialBackoff({ initialDelay: 100, maxDelay: 2000, jitter: true }),
    });
    • Keep SDK retries disabled to prevent duplicate layers.
    • SDK synthesizes Error objects with a status for retry-significant HTTP responses (429, 503, 500), enabling classification.
    • You can tag errors (e.g. assign err.__opVerb) in a wrapper if verb-level logic is needed.
    • Future improvement: an official retryStrategy injection hook—current approach is non-invasive.

    Combine cockatiel retry with a circuit breaker, timeout, or bulkhead policy for more robust behavior in partial outages.

    The client now includes an internal global backpressure manager that adaptively throttles the number of initiating in‑flight operations when the cluster signals resource exhaustion. It complements (not replaces) per‑request HTTP retry.

    An HTTP response is treated as a backpressure signal when it is classified retryable and matches one of:

    • 429 (Too Many Requests) – always
    • 503 with title === "RESOURCE_EXHAUSTED"
    • 500 whose RFC 9457 / 7807 detail text contains RESOURCE_EXHAUSTED

    All other 5xx / 503 variants are treated as non‑retryable (fail fast) and do not influence the adaptive gate.

    1. Normal state starts with effectively unlimited concurrency (no global semaphore enforced) until the first backpressure event.
    2. On the first signal the manager boots with a provisional concurrency cap (e.g. 16) and immediately reduces it (soft state).
    3. Repeated consecutive signals escalate severity to severe, applying a stronger reduction factor.
    4. Successful (non‑backpressure) completions trigger passive recovery checks that gradually restore permits over time if the system stays quiet.
    5. Quiet periods (no signals for a configurable decay interval) downgrade severity (severe → soft → healthy) and reset the consecutive counter when fully healthy.

    The policy is intentionally conservative: it only engages after genuine pressure signals and recovers gradually to avoid oscillation.

    Certain operations that help drain work or complete execution are exempt from gating so they are never queued behind initiating calls:

    • completeJob
    • failJob
    • throwJobError
    • completeUserTask

    These continue immediately even during severe backpressure to promote system recovery.

    Per‑request retry still performs exponential backoff + jitter for classified transient errors. The adaptive concurrency layer sits outside retry:

    1. A call acquires a permit (unless exempt) before its first attempt.
    2. Internal retry re‑attempts happen within the held permit.
    3. On final success the permit is released and a healthy hint is recorded (possible gradual recovery).
    4. On final failure (non‑retryable or attempts exhausted) the permit is released; a 429 on the terminal attempt still records backpressure.

    This design prevents noisy churn (permits would not shrink/expand per retry attempt) and focuses on admission control of distinct logical operations.

    Enable debug logging (CAMUNDA_SDK_LOG_LEVEL=debug or trace) to see events emitted under the scoped logger bp (e.g. backpressure.permits.scale, backpressure.permits.recover, backpressure.severity). These are trace‑level; use trace for the most granular insight.

    Current release ships with defaults tuned for conservative behavior. Adaptive gating is controlled by a profile (no separate boolean toggle). Use the LEGACY profile for observe‑only mode (no global gating, still records severity). Otherwise choose a tuning profile and optionally override individual knobs.

    Tuning environment variables (all optional; defaults shown):

    Variable Default Description
    CAMUNDA_SDK_BACKPRESSURE_INITIAL_MAX 16 Bootstrap concurrency cap once the first signal is observed (null/unlimited before any signal).
    CAMUNDA_SDK_BACKPRESSURE_SOFT_FACTOR 70 Percentage multiplier applied on each soft backpressure event (70 => 0.70x permits).
    CAMUNDA_SDK_BACKPRESSURE_SEVERE_FACTOR 50 Percentage multiplier when entering or re-triggering in severe state.
    CAMUNDA_SDK_BACKPRESSURE_RECOVERY_INTERVAL_MS 1000 Interval between passive recovery checks.
    CAMUNDA_SDK_BACKPRESSURE_RECOVERY_STEP 1 Permits regained per recovery interval until reaching the bootstrap cap.
    CAMUNDA_SDK_BACKPRESSURE_DECAY_QUIET_MS 2000 Quiet period to downgrade severity (severe→soft→healthy).
    CAMUNDA_SDK_BACKPRESSURE_FLOOR 1 Minimum concurrency floor while degraded.
    CAMUNDA_SDK_BACKPRESSURE_SEVERE_THRESHOLD 3 Consecutive signals required to enter severe state.
    CAMUNDA_SDK_BACKPRESSURE_PROFILE BALANCED Preset profile: BALANCED, CONSERVATIVE, AGGRESSIVE, LEGACY (LEGACY = observe-only, no gating).

    Profiles supply coordinated defaults when you don't want to reason about individual knobs. Any explicitly set knob env var overrides the profile value.

    Profile initialMax softFactor% severeFactor% recoveryIntervalMs recoveryStep quietDecayMs floor severeThreshold Intended Use
    BALANCED 16 70 50 1000 1 2000 1 3 General workloads with moderate spikes
    CONSERVATIVE 12 60 40 1200 1 2500 1 2 Protect cluster under tighter capacity / cost constraints
    AGGRESSIVE 24 80 60 800 2 1500 2 4 High throughput scenarios aiming to utilize headroom quickly
    LEGACY n/a 70 50 1000 1 2000 1 3 Observe signals only (severity metrics) without adaptive gating

    Select via:

    CAMUNDA_SDK_BACKPRESSURE_PROFILE=AGGRESSIVE
    

    Then optionally override a single parameter, e.g.:

    CAMUNDA_SDK_BACKPRESSURE_PROFILE=AGGRESSIVE
    CAMUNDA_SDK_BACKPRESSURE_INITIAL_MAX=32

    If the profile name is unrecognized the SDK falls back to BALANCED silently (future versions may emit a warning).

    Factors use integer percentages to avoid floating point drift in env parsing; the SDK converts them to multipliers internally (e.g. 70 -> 0.7).

    If you have concrete tuning needs, open an issue describing workload patterns (operation mix, baseline concurrency, observed broker limits) to help prioritize which knobs to surface.

    The SDK provides a lightweight polling job worker for service task job types using createJobWorker. It activates jobs in batches (respecting a concurrency limit), validates variables (optional), and offers action helpers on each job.

    import createCamundaClient from '@camunda8/orchestration-cluster-api';
    import { z } from 'zod';

    const client = createCamundaClient();

    // Define schemas (optional)
    const Input = z.object({ orderId: z.string() });
    const Output = z.object({ processed: z.boolean() });

    const worker = client.createJobWorker({
    jobType: 'process-order',
    maxParallelJobs: 10,
    timeoutMs: 15_000, // long‑poll timeout (server side requestTimeout)
    pollIntervalMs: 100, // delay between polls when no jobs / at capacity
    inputSchema: Input, // validates incoming variables if validateSchemas true
    outputSchema: Output, // validates variables passed to complete(...)
    validateSchemas: true, // set false for max throughput (skip Zod)
    autoStart: true, // default true; start polling immediately
    jobHandler: (job) => {
    // Access typed variables
    const vars = job.variables; // inferred from Input schema
    // Do work...
    return job.complete({ variables: { processed: true } });
    },
    });

    // Later, on shutdown:
    process.on('SIGINT', () => {
    worker.stop();
    });

    Your jobHandler must ultimately invoke exactly one of:

    • job.complete({ variables? }) OR job.complete()
    • job.fail({ errorMessage, retries?, retryBackoff? })
    • job.cancelWorkflow({}) (cancels the process instance)
    • job.ignore() (marks as done locally without reporting to broker – can be used for decoupled flows)

    Each action returns an opaque unique symbol receipt (JobActionReceipt). The handler's declared return type (Promise<JobActionReceipt>) is intentional:

    Why this design:

    • Enforces a single terminal code path: every successful handler path should end by returning the sentinal obtained by invoking an action.
    • Enables static reasoning: TypeScript can identify if your handler has a code path that does not acknowledge the job (catch unintended behavior early).
    • Makes test assertions simple: e.g. expect(await job.complete()).toBe(JobActionReceipt).

    Acknowledgement lifecycle:

    • Calling any action (complete, fail, cancelWorkflow, ignore) sets job.acknowledged = true internally. This surfaces multiple job resolution code paths at runtime.
    • If the handler resolves (returns the symbol manually or via an action) without any acknowledgement having occurred, the worker logs job.handler.noAction and locally marks the job finished WITHOUT informing the broker (avoids a leak of the in-memory slot, but the broker will eventually time out and re-dispatch the job).

    Recommended usage:

    • Always invoke an action; if you truly mean to skip broker acknowledgement (for example: forwarding a job to another system which will complete it) use job.ignore().

    Example patterns:

    // GOOD: explicit completion
    return job.complete({ variables: { processed: true } });

    // GOOD: No-arg completion example, sentinel stored for ultimate return
    const ack = await job.complete();
    // ...
    return ack;

    // GOOD: explicit ignore
    const ack = await job.ignore();
    // No-arg completion example
    

    Set maxParallelJobs to the maximum number of jobs you want actively processing concurrently. The worker will long‑poll for up to the remaining capacity each cycle. Global backpressure (adaptive concurrency) still applies to the underlying REST calls; activation itself is a normal operation.

    If validateSchemas is true:

    • Incoming variables are parsed with inputSchema (fail => job is failed with a validation error message).
    • Incoming customHeaders parsed with customHeadersSchema if provided.
    • Completion payload variables parsed with outputSchema (warns & proceeds on failure).

    Use await worker.stopGracefully({ waitUpToMs?, checkIntervalMs? }) to drain without force‑cancelling the current activation request.

    // Attempt graceful drain for up to 8 seconds
    const { remainingJobs, timedOut } = await worker.stopGracefully({ waitUpToMs: 8000 });
    if (timedOut) {
    console.warn('Graceful stop timed out; remaining jobs:', remainingJobs);
    }

    Behavior:

    • Stops scheduling new polls immediately.
    • Lets any in‑flight activation finish (not cancelled proactively).
    • Waits for active jobs to acknowledge (complete/fail/cancelWorkflow/ignore).
    • On timeout: falls back to hard stop semantics (cancels activation) and logs worker.gracefulStop.timeout at debug.

    For immediate termination call worker.stop() (or client.stopAllWorkers()) which cancels the in‑flight activation if present.

    Activation cancellations during stop are logged at debug (activation.cancelled) instead of error noise.

    You can register multiple workers on a single client instance—one per job type is typical. The client exposes client.getWorkers() for inspection and client.stopAllWorkers() for coordinated shutdown.

    Action methods return a unique symbol (not a string) to avoid accidental misuse and allow internal metrics. If you store the receipt, annotate its type as JobActionReceipt to preserve uniqueness:

    import { JobActionReceipt } from '@camunda8/orchestration-cluster-api';
    const receipt: JobActionReceipt = await job.complete({ variables: { processed: true } });

    If you ignore the return value you don’t need to import the symbol.

    • Extremely latency‑sensitive tasks where a push mechanism or streaming protocol is required.
    • Massive fan‑out requiring custom partitioning strategies (implement a custom activator loop instead).
    • Browser environments (long‑lived polling + secret handling often unsuitable).

    For custom strategies you can still call client.activateJobs(...), manage concurrency yourself, and use completeJob / failJob directly.

    • Never increases latency for healthy clusters (no cap until first signal).
    • Cannot create fairness across multiple processes; it is per client instance in a single process. Scale your worker pool with that in mind.
    • Not a replacement for server‑side quotas or external rate limiters—it's a cooperative adaptive limiter.

    To bypass adaptive concurrency while still collecting severity metrics use:

    CAMUNDA_SDK_BACKPRESSURE_PROFILE=LEGACY
    

    This reverts to only per‑request retry for transient errors (no global gating) while keeping observability.

    Call client.getBackpressureState() to obtain:

    {
    severity: 'healthy' | 'soft' | 'severe';
    consecutive: number; // consecutive backpressure signals observed
    permitsMax: number | null; // current concurrency cap (null => unlimited/not engaged)
    permitsCurrent: number; // currently acquired permits
    waiters: number; // queued operations waiting for a permit
    }

    Set CAMUNDA_AUTH_STRATEGY to NONE (default), BASIC, or OAUTH.

    Basic:

    CAMUNDA_AUTH_STRATEGY=BASIC
    CAMUNDA_BASIC_AUTH_USERNAME=alice
    CAMUNDA_BASIC_AUTH_PASSWORD=supersecret

    OAuth (client credentials):

    CAMUNDA_AUTH_STRATEGY=OAUTH
    CAMUNDA_CLIENT_ID=yourClientId
    CAMUNDA_CLIENT_SECRET=yourSecret
    CAMUNDA_OAUTH_URL=https://idp.example/oauth/token # if required by your deployment

    Optional audience / retry / timeout vars are also read if present (see generated config reference).

    Auth helper features (automatic inside the client):

    • Disk + memory token cache
    • Early refresh with skew handling
    • Exponential backoff & jitter
    • Singleflight suppression of concurrent refreshes
    • Hook: client.onAuthHeaders(h => ({ ...h, 'X-Trace': 'abc' }))
    • Force refresh: await client.forceAuthRefresh()
    • Clear caches: client.clearAuthCache({ disk: true, memory: true })

    The SDK always keeps the active OAuth access token in memory. Optional disk persistence (Node only) is enabled by setting:

    CAMUNDA_OAUTH_CACHE_DIR=/path/to/cache
    

    When present and running under Node, each distinct credential context (combination of oauthUrl | clientId | audience | scope) is hashed to a filename:

    <CAMUNDA_OAUTH_CACHE_DIR>/camunda_oauth_token_cache_<hash>.json
    

    Writes are atomic (.tmp + rename) and use file mode 0600 (owner read/write). On process start the SDK attempts to load the persisted file to avoid an unnecessary token fetch; if the token is near expiry it will still perform an early refresh (5s skew window plus additional safety buffer based on 5% or 30s minimum).

    Clearing / refreshing:

    • Programmatic clear: client.clearAuthCache({ disk: true, memory: true })
    • Memory only: client.clearAuthCache({ memory: true, disk: false })
    • Force new token (ignores freshness): await client.forceAuthRefresh()

    Disable disk persistence by simply omitting CAMUNDA_OAUTH_CACHE_DIR (memory cache still applies). For short‑lived or serverless functions you may prefer no disk cache to minimize I/O; for long‑running workers disk caching reduces cold‑start latency and load on the identity provider across restarts / rolling deploys.

    Security considerations:

    • Ensure the directory has restrictive ownership/permissions; the SDK creates files with 0600 but will not alter parent directory permissions.
    • Tokens are bearer credentials; treat the directory like a secrets store and avoid including it in container image layers or backups.
    • If you rotate credentials (client secret) the filename hash changes; old cache files become unused and can be pruned safely.

    Browser usage: There is no disk concept—if executed in a browser the SDK (when strategy OAUTH) attempts to store the token in sessionStorage (tab‑scoped). Closing the tab clears the cache; a new tab will fetch a fresh token.

    If you need a custom persistence strategy (e.g. Redis / encrypted keychain), wrap the client and periodically call client.forceAuthRefresh() while storing and re‑injecting the token via a headers hook; first measure whether the built‑in disk cache already meets your needs.

    Provide inline or path variables (inline wins):

    CAMUNDA_MTLS_CERT / CAMUNDA_MTLS_CERT_PATH
    CAMUNDA_MTLS_KEY / CAMUNDA_MTLS_KEY_PATH
    CAMUNDA_MTLS_CA / CAMUNDA_MTLS_CA_PATH (optional)
    CAMUNDA_MTLS_KEY_PASSPHRASE (optional)

    If both cert & key are available an https.Agent is attached to all outbound calls (including token fetches).

    Import branded key helpers directly:

    import { ProcessDefinitionKey, ProcessInstanceKey } from '@camunda8/orchestration-cluster';

    const defKey = ProcessDefinitionKey.assumeExists('2251799813686749');
    // @ts-expect-error – cannot assign def key to instance key
    const bad: ProcessInstanceKey = defKey;

    They are zero‑cost runtime strings with compile‑time separation.

    All methods return a CancelablePromise<T>:

    const p = camunda.searchProcessInstances({ filter: { processDefinitionKey: defKey } });
    setTimeout(() => p.cancel(), 100); // best‑effort cancel
    try {
    await p; // resolves if not cancelled
    } catch (e) {
    if (isSdkError(e) && e.name === 'CancelSdkError') {
    console.log('Operation cancelled');
    } else throw e;
    }

    Notes:

    • Rejects with CancelSdkError.
    • Cancellation classification runs first so aborted fetches are never downgraded to generic network errors.
    • Abort is immediate and idempotent; underlying fetch is signalled.

    @experimental - this feature is not guaranteed to be tested or stable.

    The main entry stays minimal. To opt in to a TaskEither-style facade & helper combinators import from the dedicated subpath:

    import {
    createCamundaFpClient,
    retryTE,
    withTimeoutTE,
    eventuallyTE,
    isLeft,
    } from '@camunda8/orchestration-cluster/fp';

    const fp = createCamundaFpClient();
    const deployTE = fp.deployResourcesFromFiles(['./bpmn/process.bpmn']);
    const deployed = await deployTE();
    if (isLeft(deployed)) throw deployed.left; // DomainError union

    // Chain with fp-ts (optional) – the returned thunks are structurally compatible with TaskEither
    // import { pipe } from 'fp-ts/function'; import * as TE from 'fp-ts/TaskEither';

    Why a subpath?

    • Keeps base bundle lean for the 80% use case.
    • No hard dependency on fp-ts at runtime; only structural types.
    • Advanced users can compose with real fp-ts without pulling the effect model into the default import path.

    Exports available from .../fp:

    • createCamundaFpClient – typed facade (methods return () => Promise<Either<DomainError,T>>).
    • Type guards: isLeft, isRight.
    • Error / type aliases: DomainError, TaskEither, Either, Left, Right, Fpify.
    • Combinators: retryTE, withTimeoutTE, eventuallyTE.

    DomainError union currently includes:

    • CamundaValidationError
    • EventualConsistencyTimeoutError
    • HTTP-like error objects (status/body/message) produced by transport
    • Generic Error

    You can refine left-channel typing later by mapping HTTP status codes or discriminator fields.

    Some endpoints accept consistency management options. Pass a consistency block (where supported) with waitUpToMs and optional pollIntervalMs (default 500). If the condition is not met within timeout an EventualConsistencyTimeoutError is thrown.

    To consume eventual polling in a non‑throwing fashion set the client error mode before invoking an eventually consistent method: At present the canonical client operates in throwing mode. Non‑throwing adaptation (Result / fp-ts) is achieved via the functional wrappers rather than mutating the base client.

    consistency object fields (all optional except waitUpToMs):

    Field Type Description
    waitUpToMs number Maximum total time to wait before failing. 0 disables polling and returns the first response immediately.
    pollIntervalMs number Base delay between attempts (minimum enforced at 10ms). Defaults to 500 or the value of CAMUNDA_SDK_EVENTUAL_POLL_DEFAULT_MS if provided.
    predicate (result) => boolean | Promise<boolean> Custom success condition. If omitted, non-GET endpoints default to: first 2xx body whose items array (if present) is non-empty.
    trace boolean When true, logs each 200 response body (truncated ~1KB) before predicate evaluation and emits a success line with elapsed time when the predicate passes. Requires log level debug (or trace) to see output.
    onAttempt (info) => void Callback after each attempt: { attempt, elapsedMs, remainingMs, status, predicateResult, nextDelayMs }.
    onComplete (info) => void Callback when predicate succeeds: { attempts, elapsedMs }. Not called on timeout.

    Enable by setting trace: true inside consistency. Output appears under the eventual log scope at level debug so you must raise the SDK log level (e.g. CAMUNDA_SDK_LOG_LEVEL=debug).

    Emitted lines (examples):

    [camunda-sdk][debug][eventual] op=searchJobs attempt=3 trace body={"items":[]}
    [camunda-sdk][debug][eventual] op=searchJobs attempt=5 status=200 predicate=true elapsed=742ms totalAttempts=5

    Use this to understand convergence speed and data shape evolution during tests or to diagnose slow propagation.

    const jobs = await camunda.searchJobs({
    filter: { type: 'payment' },
    consistency: {
    waitUpToMs: 5000,
    pollIntervalMs: 200,
    trace: true,
    predicate: (r) => Array.isArray(r.items) && r.items.some((j) => j.state === 'CREATED'),
    },
    });

    On timeout an EventualConsistencyTimeoutError includes diagnostic fields: { attempts, elapsedMs, lastStatus, lastResponse, operationId }.

    Per‑client logger; no global singleton. The level defaults from CAMUNDA_SDK_LOG_LEVEL (default error).

    const client = createCamundaClient({
    log: {
    level: 'info',
    transport: (evt) => {
    // evt: { level, scope, ts, args, code?, data? }
    console.log(JSON.stringify(evt));
    },
    },
    });

    const log = client.logger('worker');
    log.debug(() => ['expensive detail only if enabled', { meta: 1 }]);
    log.code('info', 'WORK_START', 'Starting work loop', { pid: process.pid });

    Lazy args (functions with zero arity) are only invoked if the level is enabled.

    Update log level / transport at runtime via client.configure({ log: { level: 'debug' } }).

    Without any explicit log option:

    • Level = error (unless CAMUNDA_SDK_LOG_LEVEL is set)
    • Transport = console (console.error / console.warn / console.log)
    • Only error level internal events are emitted (e.g. strict validation failure summaries, fatal auth issues)
    • No info/debug/trace noise by default

    To silence everything set level to silent:

    CAMUNDA_SDK_LOG_LEVEL=silent
    

    To enable debug logs via env:

    CAMUNDA_SDK_LOG_LEVEL=debug
    

    Setting CAMUNDA_SDK_LOG_LEVEL=silly enables the deepest diagnostics. In addition to everything at trace, the SDK will emit HTTP request and response body previews for all HTTP methods under the telemetry scope (log line contains http.body). This can leak sensitive information (secrets, PII). A warning (log.level.silly.enabled) is emitted on client construction. Use only for short‑lived local debugging; never enable in production or share captured logs externally. Body output is truncated (max ~4KB) and form-data parts identify uploaded files as [File].

    Provide a transport function to forward structured LogEvent objects into any logging library.

    import pino from 'pino';
    import createCamundaClient from '@camunda8/orchestration-cluster';

    const p = pino();
    const client = createCamundaClient({
    log: {
    level: 'info',
    transport: e => {
    const lvl = e.level === 'trace' ? 'debug' : e.level; // map trace
    p.child({ scope: e.scope, code: e.code }).[lvl]({ ts: e.ts, data: e.data, args: e.args }, e.args.filter(a=>typeof a==='string').join(' '));
    }
    }
    });
    import winston from 'winston';
    import createCamundaClient from '@camunda8/orchestration-cluster';

    const w = winston.createLogger({ transports: [new winston.transports.Console()] });
    const client = createCamundaClient({
    log: {
    level: 'debug',
    transport: (e) => {
    const lvl = e.level === 'trace' ? 'silly' : e.level; // winston has 'silly'
    w.log({
    level: lvl,
    message: e.args.filter((a) => typeof a === 'string').join(' '),
    scope: e.scope,
    code: e.code,
    data: e.data,
    ts: e.ts,
    });
    },
    },
    });
    import log from 'loglevel';
    import createCamundaClient from '@camunda8/orchestration-cluster';

    log.setLevel('info'); // host app level
    const client = createCamundaClient({
    log: {
    level: 'info',
    transport: (e) => {
    if (e.level === 'silent') return;
    const method = (['error', 'warn', 'info', 'debug'].includes(e.level) ? e.level : 'debug') as
    | 'error'
    | 'warn'
    | 'info'
    | 'debug';
    (log as any)[method](`[${e.scope}]`, e.code ? `${e.code}:` : '', ...e.args);
    },
    },
    });
    • Map trace to the nearest available level if your logger lacks it.
    • Use log.code(level, code, msg, data) for machine-parsable events.
    • Redact secrets before logging if you add token contents to custom messages.
    • Reconfigure later: client.configure({ log: { level: 'warn' } }) updates only that client.
    • When the effective level is debug (or trace), the client emits a lazy config.hydrated event on construction and config.reconfigured on configure(), each containing the redacted effective configuration { config: { CAMUNDA_... } }. Secrets are already masked using the SDK's redaction rules.

    May throw:

    • Network / fetch failures
    • Non‑2xx HTTP responses
    • Validation errors (strict mode)
    • EventualConsistencyTimeoutError
    • CancelSdkError on cancellation

    All SDK-thrown operational errors normalize to a discriminated union (SdkError) when they originate from HTTP, network, auth, or validation layers. Use the guard isSdkError to narrow inside a catch:

    import { createCamundaClient } from '@camunda8/orchestration-cluster-api';
    import { isSdkError } from '@camunda8/orchestration-cluster-api/dist/runtime/errors';

    const client = createCamundaClient();

    try {
    await client.getTopology();
    } catch (e) {
    if (isSdkError(e)) {
    switch (e.name) {
    case 'HttpSdkError':
    console.error('HTTP failure', e.status, e.operationId);
    break;
    case 'ValidationSdkError':
    console.error('Validation issue on', e.operationId, e.side, e.issues);
    break;
    case 'AuthSdkError':
    console.error('Auth problem', e.message, e.status);
    break;
    case 'CancelSdkError':
    console.error('Operation cancelled', e.operationId);
    break;
    case 'NetworkSdkError':
    console.error('Network layer error', e.message);
    break;
    }
    return;
    }
    // Non-SDK (programmer) error; rethrow or wrap
    throw e;
    }

    Guarantees:

    • HTTP errors expose status and optional operationId.
    • If the server returns RFC 9457 / RFC 7807 Problem Details JSON (type, title, status, detail, instance) these fields are passed through on the HttpSdkError when present.
    • Validation errors expose side and operationId.
    • Classification is best-effort; unknown shapes fall back to NetworkSdkError.

    Advanced: You can still layer your own domain errors on top (e.g. translate certain status codes) by mapping SdkError into custom discriminants.

    Note that this feature is experimental and subject to change.

    If you prefer FP‑style explicit error handling instead of exceptions, use the result client wrapper:

    import { createCamundaResultClient, isOk } from '@camunda8/orchestration-cluster';

    const camundaR = createCamundaResultClient();
    const res = await camundaR.createDeployment({ resources: [file] });
    if (isOk(res)) {
    console.log('Deployment key', res.value.deployments[0].deploymentKey);
    } else {
    console.error('Deployment failed', res.error);
    }

    API surface differences:

    • All async operation methods return Promise<Result<T>> where Result<T> = { ok: true; value: T } | { ok: false; error: unknown }.
    • No exceptions are thrown for HTTP / validation errors (cancellation and programmer errors like invalid argument sync throws are still converted to { ok:false }).
    • The original throwing client is available via client.inner if you need to mix styles.

    Helpers:

    import { isOk, isErr } from '@camunda8/orchestration-cluster';
    

    When to use:

    • Integrating with algebraic effects / functional pipelines.
    • Avoiding try/catch nesting in larger orchestration flows.
    • Converting to libraries expecting an Either/Result pattern.

    Note that this feature is experimental and subject to change.

    For projects using fp-ts, wrap the throwing client in a lazy TaskEither facade:

    import { createCamundaFpClient } from '@camunda8/orchestration-cluster';
    import { pipe } from 'fp-ts/function';
    import * as TE from 'fp-ts/TaskEither';

    const fp = createCamundaFpClient();

    const deployTE = fp.createDeployment({ resources: [file] }); // TaskEither<unknown, ExtendedDeploymentResult>

    pipe(
    deployTE(), // invoke the task (returns Promise<Either>)
    (then) => then // typical usage would use TE.match / TE.fold; shown expanded for clarity
    );

    // With helpers
    const task = fp.createDeployment({ resources: [file] });
    const either = await task();
    if (either._tag === 'Right') {
    console.log(either.right.deployments.length);
    } else {
    console.error('Error', either.left);
    }

    Notes:

    • No runtime dependency on fp-ts; adapter implements a minimal Either shape. Structural typing lets you lift into real fp-ts functions (fromEither, etc.).
    • Each method becomes a function returning () => Promise<Either<E,A>> (a TaskEither shape). Invoke it later to execute.
    • Cancellation: calling .cancel() on the original promise isn’t surfaced; if you need cancellation use the base client directly.
    • For richer interop, you can map the returned factory to TE.tryCatch in userland.

    Search endpoints expose typed request bodies that include pagination fields. Provide the desired page object; auto‑pagination is not (yet) bundled.

    Generated doc enumerating all supported environment variables (types, defaults, conditional requirements, redaction rules) is produced at build time:

    ./docs/CONFIG_REFERENCE.md
    

    The deployment endpoint requires each resource to have a filename (extension used to infer type: .bpmn, .dmn, .form / .json). Extensions influence server classification; incorrect or missing extensions may yield unexpected results. Pass an array of File objects (NOT plain Blob).

    const bpmnXml = `<definitions id="process" xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL">...</definitions>`;
    const file = new File([bpmnXml], 'order-process.bpmn', { type: 'application/xml' });
    const result = await camunda.createDeployment({ resources: [file] });
    console.log(result.deployments.length);

    From an existing Blob:

    const blob: Blob = getBlob();
    const file = new File([blob], 'model.bpmn');
    await camunda.createDeployment({ resources: [file] });

    Use the built-in helper deployResourcesFromFiles(...) to read local files and create File objects automatically. It returns the enriched ExtendedDeploymentResult (adds typed arrays: processes, decisions, decisionRequirements, forms, resources).

    const result = await camunda.deployResourcesFromFiles([
    './bpmn/order-process.bpmn',
    './dmn/discount.dmn',
    './forms/order.form',
    ]);

    console.log(result.processes.map((p) => p.processDefinitionId));
    console.log(result.decisions.length);

    With explicit tenant (overriding tenant from configuration):

    await camunda.deployResourcesFromFiles(['./bpmn/order-process.bpmn'], { tenantId: 'tenant-a' });
    

    Error handling:

    try {
    await camunda.deployResourcesFromFiles([]); // throws (empty array)
    } catch (e) {
    console.error('Deployment failed:', e);
    }

    Manual construction alternative (if you need custom logic):

    import { File } from 'node:buffer';
    const bpmnXml =
    '<definitions id="process" xmlns="http://www.omg.org/spec/BPMN/20100524/MODEL"></definitions>';
    const file = new File([Buffer.from(bpmnXml)], 'order-process.bpmn', { type: 'application/xml' });
    await camunda.createDeployment({ resources: [file] });

    Helper behavior:

    • Dynamically imports node:fs/promises & node:path (tree-shaken from browser bundles)
    • Validates Node environment (throws in browsers)
    • Lightweight MIME inference: .bpmn|.dmn|.xml -> application/xml, .json|.form -> application/json, fallback application/octet-stream
    • Rejects empty path list

    Empty arrays are rejected. Always use correct extensions so the server can classify each resource.

    Create isolated clients per test file:

    const client = createCamundaClient({
    config: { CAMUNDA_REST_ADDRESS: 'http://localhost:8080', CAMUNDA_AUTH_STRATEGY: 'NONE' },
    });

    Inject a mock fetch:

    const client = createCamundaClient({
    fetch: async (input, init) => new Response(JSON.stringify({ ok: true }), { status: 200 }),
    });

    Apache 2.0

    Generate an HTML API reference site with TypeDoc (public entry points only):

    npm run docs:api
    

    Output: static site in docs/api (open docs/api/index.html in a browser or serve the folder, e.g. npx http-server docs/api). Entry points: src/index.ts, src/logger.ts, src/fp/index.ts. Internal generated code, scripts, tests are excluded and private / protected members are filtered. Regenerate after changing public exports.