Benchmark Report

Performance Metrics

Event Throughput (events/s)

	api=1 worker=1	api=1 worker=2	api=2 worker=2	api=2 worker=3	api=3 worker=3
Max	10.4	26.6	31.4	47.6	48.8
Avg	9.5	22.26	28.41	46.06	45.44
Min	8.6	19	25	44.2	41

TTFE - Time To First Event (ms)

	api=1 worker=1	api=1 worker=2	api=2 worker=2	api=2 worker=3	api=3 worker=3
Avg	1164.17	520.04	488.62	309.86	315.09
Min	318	351	265	270	264
Max	1521	1399	914	659	738
P50	1414.5	513	315	289	306
P90	1460.6	582.8	774.6	330	340
P95	1482.75	976.6	859	398.2	366.7

Connections

	api=1 worker=1	api=1 worker=2	api=2 worker=2	api=2 worker=3	api=3 worker=3
Max Concurrent	9	8	17	16	18
Avg Active	1.1	7.7	16.3	15.3	17.3

Empty Workflow QPS

	api=1 worker=1	api=1 worker=2	api=2 worker=2	api=2 worker=3	api=3 worker=3
Max QPS	25.6	23.8	41.4	40.6	40.6
Avg QPS	23.96	21.95	39.93	39.96	39.97
Avg Duration (ms)	176.54	154.15	127.14	119.61	100.83

Metrics Definition

Event Throughput (events/s)

The rate of SSE (Server-Sent Events) received per second. This metric indicates the system's capacity to handle streaming data.

Max: Peak throughput during the test
Avg: Average throughput across the entire test duration
Min: Lowest throughput observed

TTFE - Time To First Event (ms)

The latency from sending a request to receiving the first SSE event. This is a critical user experience metric for streaming applications.

Average: Mean latency across all requests
Minimum: Best-case latency observed
Maximum: Worst-case latency observed
P50 (Median): 50% of requests completed faster than this value
P90: 90% of requests completed faster than this value
P95: 95% of requests completed faster than this value

Connections

Measures the concurrent SSE connection capacity of the system.

Max Concurrent Connections: Maximum number of simultaneous SSE connections
Avg Active Connections: Average number of active connections during the test

Empty Workflow QPS

Performance of the minimal API path without external dependencies (e.g., LLM calls). This measures pure system capacity.

Max QPS: Peak requests per second achieved
Avg QPS: Average requests per second
Avg Duration: Average request duration in milliseconds

Test Environment

The benchmark was executed in a Kubernetes cluster environment. Each pod is configured with 1 CPU core (1000m) and 2 GB memory. Both resource limits and requests are set equally to ensure stable CPU scheduling and avoid throttling.

Test Scenarios

Empty Workflow QPS

This scenario uses a minimal workflow containing only a Start node and an End node, with no processing logic in between. It measures the pure API throughput capacity of the system without any external dependencies or computational overhead.

Workflow Structure:

Start → End

This test helps establish the baseline performance ceiling of the Dify API infrastructure.

TTFE, Connections, and Event Throughput

These metrics are measured using a workflow that includes an LLM node: Start → LLM → End.

To eliminate external dependencies and ensure consistent, reproducible results, we mock the OpenAI API server rather than calling real LLM services. This approach:

Removes variability from actual LLM response times
Ensures stable and predictable test conditions
Allows us to isolate Dify's streaming performance characteristics

Workflow Structure:

Start → LLM → End

The mocked LLM server returns simulated streaming responses to test SSE (Server-Sent Events) handling, connection management, and event throughput under controlled conditions.