Benchmark Report

Performance Metrics

Event Throughput (events/s)

Max	Avg	Min
60.8	46.67	38.8

TTFE - Time To First Event (ms)

Avg	Min	Max	P50	P90	P95
94.55	41.0	2228.0	51.0	98.3	169.45

Connections

Max Concurrent	Avg Active
22.0	13.6

Empty Workflow QPS

Max QPS	Avg QPS	Avg Duration (ms)
71.0	36.37	215.61

Metrics Definition

Event Throughput (events/s)

The rate of SSE (Server-Sent Events) received per second. This metric indicates the system's capacity to handle streaming data.

Max: Peak throughput during the test
Avg: Average throughput across the entire test duration
Min: Lowest throughput observed

TTFE - Time To First Event (ms)

The latency from sending a request to receiving the first SSE event. This is a critical user experience metric for streaming applications.

Average: Mean latency across all requests
Minimum: Best-case latency observed
Maximum: Worst-case latency observed
P50 (Median): 50% of requests completed faster than this value
P90: 90% of requests completed faster than this value
P95: 95% of requests completed faster than this value

Connections

Measures the concurrent SSE connection capacity of the system.

Max Concurrent Connections: Maximum number of simultaneous SSE connections
Avg Active Connections: Average number of active connections during the test

Empty Workflow QPS

Performance of the minimal API path without external dependencies (e.g., LLM calls). This measures pure system capacity.

Max QPS: Peak requests per second achieved
Avg QPS: Average requests per second
Avg Duration: Average request duration in milliseconds

Test Environment

The benchmark was executed in a Kubernetes cluster environment. The dify-api service ran with 3 replicas, each configured with 1 CPU core (1000m) and 2 GB memory. Both resource limits and requests were set equally to ensure stable CPU scheduling and avoid throttling.

Configuration

api:
  replicas: 3
  resources:
    limits:
      cpu: 1000m
      memory: 2048Mi
    requests:
      cpu: 1000m
      memory: 2048Mi

Test Scenarios

Empty Workflow QPS

This scenario uses a minimal workflow containing only a Start node and an End node, with no processing logic in between. It measures the pure API throughput capacity of the system without any external dependencies or computational overhead.

Workflow Structure:

Start → End

This test helps establish the baseline performance ceiling of the Dify API infrastructure.

TTFE, Connections, and Event Throughput

These metrics are measured using a workflow that includes an LLM node: Start → LLM → End.

To eliminate external dependencies and ensure consistent, reproducible results, we mock the OpenAI API server rather than calling real LLM services. This approach:

Removes variability from actual LLM response times
Ensures stable and predictable test conditions
Allows us to isolate Dify's streaming performance characteristics

Workflow Structure:

Start → LLM → End

The mocked LLM server returns simulated streaming responses to test SSE (Server-Sent Events) handling, connection management, and event throughput under controlled conditions.