Skip to main content

Benchmark Report

View Release Notes
Benchmarkv3.9.1langgenius/dify-benchmark:13f8568dGenerated 2026-04-15 09:14:22 UTC

Performance Metrics

Event Throughput (events/s)

api=1 worker=1api=1 worker=2api=2 worker=2api=2 worker=3api=3 worker=3
Max10.426.631.447.648.8
Avg9.522.2628.4146.0645.44
Min8.6192544.241

TTFE - Time To First Event (ms)

api=1 worker=1api=1 worker=2api=2 worker=2api=2 worker=3api=3 worker=3
Avg1164.17520.04488.62309.86315.09
Min318351265270264
Max15211399914659738
P501414.5513315289306
P901460.6582.8774.6330340
P951482.75976.6859398.2366.7

Connections

api=1 worker=1api=1 worker=2api=2 worker=2api=2 worker=3api=3 worker=3
Max Concurrent98171618
Avg Active1.17.716.315.317.3

Empty Workflow QPS

api=1 worker=1api=1 worker=2api=2 worker=2api=2 worker=3api=3 worker=3
Max QPS25.623.841.440.640.6
Avg QPS23.9621.9539.9339.9639.97
Avg Duration (ms)176.54154.15127.14119.61100.83

Metrics Definition

Event Throughput (events/s)

The rate of SSE (Server-Sent Events) received per second. This metric indicates the system's capacity to handle streaming data.

  • Max: Peak throughput during the test
  • Avg: Average throughput across the entire test duration
  • Min: Lowest throughput observed

TTFE - Time To First Event (ms)

The latency from sending a request to receiving the first SSE event. This is a critical user experience metric for streaming applications.

  • Average: Mean latency across all requests
  • Minimum: Best-case latency observed
  • Maximum: Worst-case latency observed
  • P50 (Median): 50% of requests completed faster than this value
  • P90: 90% of requests completed faster than this value
  • P95: 95% of requests completed faster than this value

Connections

Measures the concurrent SSE connection capacity of the system.

  • Max Concurrent Connections: Maximum number of simultaneous SSE connections
  • Avg Active Connections: Average number of active connections during the test

Empty Workflow QPS

Performance of the minimal API path without external dependencies (e.g., LLM calls). This measures pure system capacity.

  • Max QPS: Peak requests per second achieved
  • Avg QPS: Average requests per second
  • Avg Duration: Average request duration in milliseconds

Test Environment

The benchmark was executed in a Kubernetes cluster environment. Each pod is configured with 1 CPU core (1000m) and 2 GB memory. Both resource limits and requests are set equally to ensure stable CPU scheduling and avoid throttling.


Test Scenarios

Empty Workflow QPS

This scenario uses a minimal workflow containing only a Start node and an End node, with no processing logic in between. It measures the pure API throughput capacity of the system without any external dependencies or computational overhead.

Workflow Structure:

Start → End

This test helps establish the baseline performance ceiling of the Dify API infrastructure.

TTFE, Connections, and Event Throughput

These metrics are measured using a workflow that includes an LLM node: Start → LLM → End.

To eliminate external dependencies and ensure consistent, reproducible results, we mock the OpenAI API server rather than calling real LLM services. This approach:

  • Removes variability from actual LLM response times
  • Ensures stable and predictable test conditions
  • Allows us to isolate Dify's streaming performance characteristics

Workflow Structure:

Start → LLM → End

The mocked LLM server returns simulated streaming responses to test SSE (Server-Sent Events) handling, connection management, and event throughput under controlled conditions.

© 2026 Dify All rights reserved.Enterprise release information is confidential. Do not distribute externally.