Skip to main content

Benchmark Report

View Release Notes
Benchmarkv3.9.0Generated 2026-03-31 15:57:42 UTC

Performance Metrics

Event Throughput (events/s)

MaxAvgMin
60.846.6738.8

TTFE - Time To First Event (ms)

AvgMinMaxP50P90P95
94.5541.02228.051.098.3169.45

Connections

Max ConcurrentAvg Active
22.013.6

Empty Workflow QPS

Max QPSAvg QPSAvg Duration (ms)
71.036.37215.61

Metrics Definition

Event Throughput (events/s)

The rate of SSE (Server-Sent Events) received per second. This metric indicates the system's capacity to handle streaming data.

  • Max: Peak throughput during the test
  • Avg: Average throughput across the entire test duration
  • Min: Lowest throughput observed

TTFE - Time To First Event (ms)

The latency from sending a request to receiving the first SSE event. This is a critical user experience metric for streaming applications.

  • Average: Mean latency across all requests
  • Minimum: Best-case latency observed
  • Maximum: Worst-case latency observed
  • P50 (Median): 50% of requests completed faster than this value
  • P90: 90% of requests completed faster than this value
  • P95: 95% of requests completed faster than this value

Connections

Measures the concurrent SSE connection capacity of the system.

  • Max Concurrent Connections: Maximum number of simultaneous SSE connections
  • Avg Active Connections: Average number of active connections during the test

Empty Workflow QPS

Performance of the minimal API path without external dependencies (e.g., LLM calls). This measures pure system capacity.

  • Max QPS: Peak requests per second achieved
  • Avg QPS: Average requests per second
  • Avg Duration: Average request duration in milliseconds

Test Environment

The benchmark was executed in a Kubernetes cluster environment. The dify-api service ran with 3 replicas, each configured with 1 CPU core (1000m) and 2 GB memory. Both resource limits and requests were set equally to ensure stable CPU scheduling and avoid throttling.

Configuration

api:
  replicas: 3
  resources:
    limits:
      cpu: 1000m
      memory: 2048Mi
    requests:
      cpu: 1000m
      memory: 2048Mi

Test Scenarios

Empty Workflow QPS

This scenario uses a minimal workflow containing only a Start node and an End node, with no processing logic in between. It measures the pure API throughput capacity of the system without any external dependencies or computational overhead.

Workflow Structure:

Start → End

This test helps establish the baseline performance ceiling of the Dify API infrastructure.

TTFE, Connections, and Event Throughput

These metrics are measured using a workflow that includes an LLM node: Start → LLM → End.

To eliminate external dependencies and ensure consistent, reproducible results, we mock the OpenAI API server rather than calling real LLM services. This approach:

  • Removes variability from actual LLM response times
  • Ensures stable and predictable test conditions
  • Allows us to isolate Dify's streaming performance characteristics

Workflow Structure:

Start → LLM → End

The mocked LLM server returns simulated streaming responses to test SSE (Server-Sent Events) handling, connection management, and event throughput under controlled conditions.

© 2026 Dify All rights reserved.Enterprise release information is confidential. Do not distribute externally.