Performance Metrics
Event Throughput (events/s)
| Max | Avg | Min |
|---|---|---|
| 60.8 | 46.67 | 38.8 |
TTFE - Time To First Event (ms)
| Avg | Min | Max | P50 | P90 | P95 |
|---|---|---|---|---|---|
| 94.55 | 41.0 | 2228.0 | 51.0 | 98.3 | 169.45 |
Connections
| Max Concurrent | Avg Active |
|---|---|
| 22.0 | 13.6 |
Empty Workflow QPS
| Max QPS | Avg QPS | Avg Duration (ms) |
|---|---|---|
| 71.0 | 36.37 | 215.61 |
Metrics Definition
Event Throughput (events/s)
The rate of SSE (Server-Sent Events) received per second. This metric indicates the system's capacity to handle streaming data.
- Max: Peak throughput during the test
- Avg: Average throughput across the entire test duration
- Min: Lowest throughput observed
TTFE - Time To First Event (ms)
The latency from sending a request to receiving the first SSE event. This is a critical user experience metric for streaming applications.
- Average: Mean latency across all requests
- Minimum: Best-case latency observed
- Maximum: Worst-case latency observed
- P50 (Median): 50% of requests completed faster than this value
- P90: 90% of requests completed faster than this value
- P95: 95% of requests completed faster than this value
Connections
Measures the concurrent SSE connection capacity of the system.
- Max Concurrent Connections: Maximum number of simultaneous SSE connections
- Avg Active Connections: Average number of active connections during the test
Empty Workflow QPS
Performance of the minimal API path without external dependencies (e.g., LLM calls). This measures pure system capacity.
- Max QPS: Peak requests per second achieved
- Avg QPS: Average requests per second
- Avg Duration: Average request duration in milliseconds
Test Environment
The benchmark was executed in a Kubernetes cluster environment. The dify-api service ran with 3 replicas, each configured with 1 CPU core (1000m) and 2 GB memory. Both resource limits and requests were set equally to ensure stable CPU scheduling and avoid throttling.
Configuration
api:
replicas: 3
resources:
limits:
cpu: 1000m
memory: 2048Mi
requests:
cpu: 1000m
memory: 2048Mi
Test Scenarios
Empty Workflow QPS
This scenario uses a minimal workflow containing only a Start node and an End node, with no processing logic in between. It measures the pure API throughput capacity of the system without any external dependencies or computational overhead.
Workflow Structure:
Start → End
This test helps establish the baseline performance ceiling of the Dify API infrastructure.
TTFE, Connections, and Event Throughput
These metrics are measured using a workflow that includes an LLM node: Start → LLM → End.
To eliminate external dependencies and ensure consistent, reproducible results, we mock the OpenAI API server rather than calling real LLM services. This approach:
- Removes variability from actual LLM response times
- Ensures stable and predictable test conditions
- Allows us to isolate Dify's streaming performance characteristics
Workflow Structure:
Start → LLM → End
The mocked LLM server returns simulated streaming responses to test SSE (Server-Sent Events) handling, connection management, and event throughput under controlled conditions.