Table of Contents

As a machine learning engineer/data scientist, after the model development process is finished, we need to deploy the model as a web service using different web frameworks. To achieve maximum performance and lower the hardware cost, we often need to optimize the speed our service, including TensorRT acceleration, config tuning, etc.

In order to reliably and objectively evaluate the performance of the service under different configs, we need to load-test the service. In this post, I want to share how to load test your HTTP service with wrk.

Install
#

Wrk is a lightweight and easy-to-use load testing tool. To install it, run the following command:

git clone --depth=1 https://github.com/wg/wrk.git
cd wrk
make -j

General options
#

The generated wrk executable is under this folder. This is how you use wrk for GET request benchmark:

wrk -t 6 -c 200 -d 30s --latency https://google.com

Some of the command flags for wrk:

-c: the number of connections to use
-t: the number of threads to use
-d: the test duration, e.g., 60s
-s: the lua script to use for load testing our service (will cover in later section)
--timeout how many seconds to timeout a request
--latency: show the latency distribution for all the requests

For connections and threads, the author suggest using thread number less than the core in CPU. The connections are shared in different threads, i.e., each threads get N = connections/threads connections.

Refs
#

wrk threads and connections:
- https://github.com/wg/wrk/issues/90
- https://github.com/wg/wrk/issues/205

Wrk in action
#

Making GET request in wrk is straightforward and easy, so I am not going to show it here. In the following, I will show how to make POST request with wrk.

Suppose we have the following server code:

from flask import Flask, jsonify, request


app = Flask(__name__)


@app.route("/demo", methods=["POST"])
def server():
    if request.content_type == 'application/x-www-form-urlencoded':
        req = request.form.to_dict()
    elif request.content_type == 'application/json':
        req = request.get_json()
    else:
        return jsonify({'status': 1, 'msg': 'unsupported content type'})

    print(f"user req: {req}")
    w = int(req.get("width", 0))
    h = int(req.get("height", 0))

    return jsonify({'status': 0, 'msg': "ok", "area": w*h})


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=1234)

To test the server’s performance, we run the following wrk command:

wrk -t 4 -c 100 -d 180s -s test.lua --latency "http://server_ip:1234/demo"

The content of test.lua is like:

wrk.method = "POST"

-- post form urlencoded data
wrk.body = "width=2&height=2"
wrk.headers['Content-Type'] = "application/x-www-form-urlencoded"

The above script assumes you are making request in application/x-www-form-urlencoded format. If you content type is application/json, use the following test.lua:

wrk.method = "POST"

-- post json data
wrk.body = '{"width": 2, "height": 2}'
wrk.headers['Content-Type'] = "application/json"

Advanced scripting
#

wrk can also support advanced control for benchmarking, the official guide is here.

References
#

Faster Directory Navigation with z.lua

14 February 2019·Updated: 13 June 2019·387 words·2 mins

Tools Cmder Shell Bash Zsh Windows

Nerdfont Icon Missing after Wezterm Upgrade

25 July 2023·172 words·1 min

Tools Terminal Font Unicode

How to Extract Key Frames from A Video with FFmpeg

25 December 2021·Updated: 25 January 2022·426 words·2 mins

Tools Ffmpeg Video

Benchmarking Your HTTP Service Using wrk

Install
#

General options
#

Refs
#

Wrk in action
#

Advanced scripting
#

References
#

Related

Install#

General options#

Refs#

Wrk in action#

Advanced scripting#

References#

Related

Install
#

General options
#

Refs
#

Wrk in action
#

Advanced scripting
#

References
#