Matrix is, fundamentally, a combination of the best parts of IRC, XMPP, and Slack-like communication platforms (Discord, Mattermost, Rocketchat, etc.) built to modern standards. In the Matrix ecosystem, users can run their own server instances, called "homeservers", which then federate amongst themselves to create a "fediverse". It is thus fully distributed, allowing users to communicate with each other on their own terms, while providing all the features one would expect of a global chat system, such as large public rooms, as well as standard features of more modern platforms, like small private groups, direct messages, file uploads, and advanced integration and moderation features, such as bots. The reference homeserver application is called "Synapse", written in Python 3, and released under an Apache 2.0 license.
In this guide, I seek to provide a document detailing the full steps to deploy a highly-available, redundant, multi-worker Matrix instance, with a fully redundant PostgreSQL database and LDAP authentication and 3PID backend. For those of you who just want to run a quick-and-easy Matrix instance with few advanced features, this guide is probably not for you, and there are numerous guides out there for setting up basic Matrix Synapse instances instead.
Most of the concepts in this guide, as well as most of the configuration files given, can be adapted to a single-host but still split-worker instance instead, should the configuration below be deemed too complicated or excessive for your usecase. Be sure to carefully read this document and the Matrix documentation if you wish to do so, though most sections can be adapted verbatim.
## The problem with Synapse
The main issue with Synapse in its default configuration, as documented by the Matrix project themselves, is that it is single-threaded and non-redundant. Since a lot of actions inside Synapse require significant CPU resources, especially those related to federation, this can be a significant bottleneck. This is especially true in very large rooms, where there are potentially hundreds of joined users on multiple homeservers that all must be communicated to. Without tweaking, this can manifest as posts to large rooms taking an extrordanarily long time, upwards of 10 seconds, to send, as well as problems joining very large rooms for the first time (significant delays, timeouts, join failures, etc.).
Unfortunately, most homeserver users aren't running their instance on the fastest possible CPU, thus, the only solution to improve performance in this area is to somehow allow the Synapse process to use multiple threads. Luckily for us, Matrix Synapse, since about version 1.10, supports this via workers. Workers allow one to split various functions out of the main Synapse process, which then allows multi-threaded operation and thus, increased performance.
The configuration of workers [is discussed in the Synapse documentation](https://github.com/matrix-org/synapse/blob/master/docs/workers.md), however a number of details are glossed over or not mentioned completely. Thus, this blog post will outline some of the specific details involved in tuning workers for maximum performance.
## Step 1 - Prerequisites and planning
The system outlined in this guide is designed to provide a very scalable and redundant Matrix experience. To this end, the entire system is split up into multiple hosts. In most cases, these should be Virtual Machines running on at least 2 hypervisors for redundancy at the lower layers, though this is outside of the scope of this guide. For our purposes, we will assume that the VMs discussed below are already installed, configured, and operating.
The configuration outlines here makes use of a total of 14 VMs, with 6 distinct roles. Within each role, either 2 or 3 individual VMs are configured to provide redundancy. The roles can be roughly divided into two categories, frontends that expose services to users, and backends that expose databases to the frontend instances.
The full VM list, with an example naming convention where X is the host "ID" (e.g. 1, 2, etc.), is as follows:
Quantity Name Description
--- --- ---
2 flbX Frontend load balancers running HAProxy, handling incoming requests from clients and federated servers.
2 rwX Riot Web instances under Nginx.
3 hsX Matrix Synapse homeserver instances running the various workers.
2 blbX Backend load balancers running HAProxy, handling database requests from the homeserver instances.
3 mpgX PostgreSQL database instances running Patroni with Zookeeper.
2 mldX OpenLDAP instances.
While this setup may seem like overkill, it is, aside from the homeserver instances, the minimum configuration possible while still providing fully redundancy. If redundancy is not desired, a smaller configuration, down to as little as one host, is possible, though this is not detailed below.
In addition to these 14 VMs, some sort of shared storage must be provided for the sharing of media files (e.g. uploaded files) between the homeservers. For the purpose of this guide, we assume that this is an NFS export at `/srv/matrix` from a system called `nas`. The configuration of redundant, shared storage is outside of the scope of this guide, and thus we will not discuss this beyond this paragraph, though prospective administrators of highly-available Matrix instances should consider this as well.
All the VMs mentioned above should be running the same operating system. I recommended Debian 10.X (Buster) here, both because it is the distribution I run myself, and also because it provides nearly all the required packages with minimal fuss. If you wish to use another distribution, you must adapt the commands and examples below to fit. Additionally, this guide expects that you are running the Systemd init system. This is not the place for continuing the seemingly-endless initsystem debate, but some advanced features of Systemd (such as template units) are used below and in the official Matrix documentation, so we expect this is the initsystem you are running, and you are on your own if you choose to use an alternative.
For networking purposes, it is sufficient to place all the above servers in a single RFC1918 network. Outbound NAT should be configured to allow all hosts to reach the internet, and a small number of ports should be permitted through a firewall towards the external load balancer VIP (virtual IP address). The following is an example IP configuration in the network `10.0.0.0/24` that can be used for this guide, though you may of course choose a different subnet and host IP allocation scheme if you wish. All these names should resolve in DNS, or be configured in `/etc/hosts` on all machines.
IP address Hostname Description
--- --- ---
10.0.0.1 gw NAT gateway and firewall, upstream router.
10.0.0.2 blbvip Floating VIP for blbX instances.
10.0.0.3 blb1 blbX host 1.
10.0.0.4 blb2 blbX host 2.
10.0.0.5 mpg1 mpgX host 1.
10.0.0.6 mpg2 mpgX host 2.
10.0.0.7 mpg3 mpgX host 3.
10.0.0.8 mld1 mldX host 1.
10.0.0.9 mld2 mldX host 2.
10.0.0.10 flbvip Floating VIP for flbX instances.
10.0.0.11 flb1 flbX host 1.
10.0.0.12 flb2 flbX host 2.
10.0.0.13 rw1 rwX host 1.
10.0.0.14 rw2 rwX host 2.
10.0.0.15 hs1 hsX host 1.
10.0.0.16 hs2 hsX host 2.
10.0.0.17 hs3 hsX host 3.
# Step 2 - Installing and configuring OpenLDAP instances
[OpenLDAP]() is a common LDAP server, which provides centralized user administration as well as the configuration of additional details in a user directory. Installing and configuring OpenLDAP is beyond the scope of this guide, though the Matrix Homeserver configurations below assume that this is operating and that all Matrix users are stored in the LDAP database. In our example configuration, there are 2 OpenLDAP instances running with replication (`syncrepl`) between them, which are then load-balanced in a multi-master fashion. Since no services below here will be performing writes to this database, this is fine. The administrator is expected to configure some sort of user management layer of their choosing (e.g. scripts, or a web-based frontend) for managing users, resetting passwords, etc.
While this short section may seem like a cop-out, this is an extensive topic with many potential caveats, and should thus have its own (future) post on this blog. Until then, I trust that the administrator is able to look up and configure this themselves. I include these references only to help guide the administrator towards full-stack redundancy and to explain why there are LDAP sections in the backend load balancer configurations.
# Step 3 - Installing and configuring Patroni instances
[Patroni](https://github.com/zalando/patroni) is a service manager for PostgreSQL which provides automated failover and replication support for a PostgreSQL database. Like OpenLDAP above, the configuration of Patroni is beyond the scope of this guide, and the configurations below assume that this is operating and already configured. In our example configuration, there are 3 Patroni instances, which is the minimum required for quorum among the members. As above, I do plan to document this in a future post, but until then, I recommend the administrator reference the Patroni documentation as well as [this other post on my blog](https://www.boniface.me/post/patroni-and-haproxy-agent-checks/) for details on setting up the Patroni instances.
# Step 4 - Installing and configuring backend load balancers
While I do not go into details in the previous two steps, this section details how to make use of a redundant pair of HAProxy instances to expose the redundant databases mentioned above to the Homeserver instances.
In order to provide a single entrypoint to the load balancers, the administrator should first install and configure Keepalived. The following `/etc/keepalived/keepalived.conf` configuration will set up the `blbvip` floating IP address between the two instances, while providing checking of the HAProxy instance health. This configuration below can be used on both proxy hosts, and inline comments provide additional clarification and information as well as indicating any changes required between the hosts.
Once the above configuration is installed at `/etc/keepalived/keepalived.conf`, restart the Keepalived service with `sudo systemctl restart keepalived` on each host. You should see the VIP become active on the first host.
The HAProxy configuration below can be used verbatim on both proxy hosts, and inline comments provide additional clarification and information to avoid breaking up the configuration snippit. This configuration makes use of an advanced feature for the Patroni hosts [which is detailed in another post on this blog](https://www.boniface.me/post/patroni-and-haproxy-agent-checks/), to ensure that only the active Patroni node is sent traffic and to avoid the other two database hosts from reporting `DOWN` state all the time.
```
# Global settings - tune HAProxy for optimal performance, administration, and security.
global
# Send logs to the "local6" service on the local host, via an rsyslog UDP listener. Enable debug logging to log individual connections.
log ip6-localhost:514 local6 debug
log-send-hostname
chroot /var/lib/haproxy
pidfile /run/haproxy/haproxy.pid
# Use multi-threadded support (available with HAProxy 1.8+) for optimal performance in high-load situations. Adjust `nbthread` as needed for your host's core count (1/2 is optimal).
nbproc 1
nbthread 2
# Provide a stats socket for `hatop`
stats socket /var/lib/haproxy/admin.sock mode 660 level admin process 1
stats timeout 30s
# Run in daemon mode as the `haproxy` user
daemon
user haproxy
group haproxy
# Set the global connection limit to 10000; this is certainly overkill but avoids needing to tweak this for larger instances.
maxconn 10000
# Default settings - provide some default settings that are applicable to (most) of the listeners and backends below.
# Statistics listener with authentication - provides stats for the HAProxy instance via a WebUI (optional)
userlist admin
# WARNING - CHANGE ME TO A REAL PASSWORD OR A SHA512-hashed PASSWORD (with `password` instead of `insecure-password`). IF YOU USE `insecure-password`, MAKE SURE THIS CONFIGURATION IS NOT WORLD-READABLE.
user admin insecure-password P4ssw0rd
listen stats
bind :::5555 v4v6
mode http
stats enable
stats uri /
stats hide-version
stats refresh 10s
stats show-node
stats show-legends
acl is_admin http_auth(admin)
http-request auth realm "Admin access required" if !is_admin
# Stick-tables peers configuration
peers keepalived-pair
peer blb1 10.0.0.3:1023
peer blb1 10.0.0.4:1023
# LDAP frontend
frontend ldap
bind :::389 v4v6
maxconn 1000
mode tcp
option tcpka
default_backend ldap
# PostgreSQL frontend
frontend pgsql
bind :::5432 v4v6
maxconn 1000
mode tcp
option tcpka
default_backend pgsql
# LDAP backend
backend ldap
mode tcp
option tcpka
balance leastconn
server mld1 10.0.0.8:389 check inter 2000 maxconn 64
server mld2 10.0.0.9:389 check inter 2000 maxconn 64
# PostgreSQL backend using agent check
backend pgsql
mode tcp
option tcpka
option httpchk OPTIONS /master
http-check expect status 200
server mpg1 10.0.0.5:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008
server mpg2 10.0.0.6:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008
server mpg3 10.0.0.7:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008
```
Once the above configurations are installed on each server, restart the HAProxy service with `sudo systemctl restart haproxy`. Use `sudo hatop -s /var/lib/haproxy/admin.sock` to view the status of the backends, and continue once all are running correctly.
## Step 5 - Install and configure Synapse instances
The core homeserver processes should be configured on all homeserver machines. There are numerous options but
## Step 2 - Configure systemd units
The easiest way to set up workers is to use a template unit file with a series of individual worker configurations. A series of unit files are [provided within the Synapse documentation](https://github.com/matrix-org/synapse/tree/master/docs/systemd-with-workers), which can be used to set up template-based workers.
I decided to modify these somewhat, by replacing the configuration directory at `/etc/matrix-synapse/workers` with `/etc/matrix-synapse/worker.d`, but this is just a personal preference. If you're using official Debian packages (as I am), you will also need to adjust the path to the Python binary. I also adjust the description to be a little more consistent. The resulting template worker unit looks like this:
There is also a generic target unit that should be installed to provide a unified management point for both the primary Synapse process as well as the workers. After some similar tweaks, including adjusting the After condition to use `network-online.target` instead of `network.target`, the resulting file looks like this:
```
[Unit]
Description = Synapse Matrix homeserver target
After = network-online.target
[Install]
WantedBy = multi-user.target
```
Install both of these units, as `matrix-synapse-worker@.service` and `matrix-synapse.target` respectively, to `/etc/systemd/system`, and run `sudo systemctl daemon-reload`.
Once the unit files are prepared, you can begin building each individual worker configuration.
## Step 3 - Configure the individual workers
Each worker is configured via an individual YAML configuration file, with our units under `/etc/matrix-synapse/worker.d`. By design, each worker makes use of `homeserver.yaml` for all global configuration values, then the individual worker configurations override specific settings for the particular worker. The [Synapse documentation on workers](https://github.com/matrix-org/synapse/blob/master/docs/workers.md) provides a good starting point, but some sections are vague, and thus this guide hopes to provide more detailed instructions and explanations.
Each worker is given a specific section below, which includes the full YAML configuration I use, as well as any notes about the configuration that are worth mentioning. They are provided in alphabetical order, rather than the order provided in the documentation above, for clarity.
For any worker which responds to REST, a port must be selected for the worker to listen on. The main homeserver runs by default on port 8008, and I have `ma1xd` running on port 8090, so I chose ports from 8091 to 8097 for the various REST workers in order to keep them in a consistent range.
Finally, the main homeserver must be configured with both TCP and HTTP replication listeners, to provide communication between the workers and the main process. For this I use the ports provided by the Matrix documentation above, 9092 and 9093, with the following configuration in the main `homeserver.yaml``listeners` section:
```
listeners:
- port: 8008
tls: false
bind_addresses:
- '::'
type: http
x_forwarded: true
resources:
- names: [client, webclient]
compress: true
- port: 9092
bind_addresses:
- '::'
type: replication
x_forwarded: true
- port: 9093
bind_addresses:
- '::'
type: http
x_forwarded: true
resources:
- names: [replication]
```
There are a couple adjustments here from the default configuration. First, the `federation` resource has been removed from the primary listener, since this is implemented as a worker below. TLS is disabled here, and `x_forwarded: true` is added to all 3 frontends, since this is handled by a reverse proxy, as discussed later in this guide. All three listeners use a global IPv6+IPv4 bind address of `::` so they will be accessible by other machines on the network, which is important for the final, multi-host setup. As noted in the Matrix documentation, *ensure that the replication ports are not publicly accessible*, since they are unauthenticated and unencrypted; I run these servers on an RFC1918 private network behind a firewall so this is secure, but you will need to provide some sort of firewall if your Synapse instance is directly available on the public Internet.
The configurations below show a hostname, `mlbvip`, for all instances of `worker_replication_host`. This will be explained and discussed further in the reverse proxy section. If you are only interested in running a "single-server" instance, you may use `localhost`, `127.0.0.1`, or `::1` here instead, as these ports will not managed by the reverse proxy in such a setup.
The `appservice` worker does not service REST endpoints, and thus has a minimal configuration.
```
---
worker_app: synapse.app.appservice
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@appservice.service`. It will be started later in the process.
The `client_reader` worker services REST endpoints, and thus has a listener section, with port 8091 chosen.
```
---
worker_app: synapse.app.client_reader
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8091
resources:
- names:
- client
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@client_reader.service`. It will be started later in the process.
The `event_creator` worker services REST endpoints, and thus has a listener section, with port 8092 chosen.
```
---
worker_app: synapse.app.event_creator
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8092
resources:
- names:
- client
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@event_creator.service`. It will be started later in the process.
The `federation_reader` worker services REST endpoints, and thus has a listener section, with port 8093 chosen. Note that this worker, in addition to a `client` resource, also provides a `federation` resource.
```
---
worker_app: synapse.app.federation_reader
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8093
resources:
- names:
- client
- federation
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@federation_reader.service`. It will be started later in the process.
The `federation_sender` worker does not service REST endpoints, and thus has a minimal configuration.
```
---
worker_app: synapse.app.federation_sender
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@federation_sender.service`. It will be started later in the process.
The `frontend_proxy` worker services REST endpoints, and thus has a listener section, with port 8094 chosen. This worker has an additional configuration parameter, `worker_main_http_uri`, which allows the worker to direct requests back to the primary Synapse instance. Similar to the `worker_replication_host` value, this uses `mlbvip` in this example, and for "single-server" instances *must* be replaced with `localhost`, `127.0.0.1`, or `::1` instead, as this port will not managed by the reverse proxy in such a setup.
```
---
worker_app: synapse.app.frontend_proxy
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_main_http_uri: http://mlbvip:8008
worker_listeners:
- type: http
port: 8094
resources:
- names:
- client
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@frontend_proxy.service`. It will be started later in the process.
The `media_repository` worker services REST endpoints, and thus has a listener section, with port 8095 chosen. Note that this worker, in addition to a `client` resource, also provides a `media` resource.
```
---
worker_app: synapse.app.media_repository
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8095
resources:
- names:
- client
- media
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@media_repository.service`. It will be started later in the process.
The `pusher` worker does not service REST endpoints, and thus has a minimal configuration.
```
---
worker_app: synapse.app.pusher
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@pusher.service`. It will be started later in the process.
The `synchrotron` worker services REST endpoints, and thus has a listener section, with port 8096 chosen.
```
---
worker_app: synapse.app.synchrotron
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8096
resources:
- names:
- client
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@synchrotron.service`. It will be started later in the process.
The `user_dir` worker services REST endpoints, and thus has a listener section, with port 8097 chosen.
```
---
worker_app: synapse.app.user_dir
worker_replication_host: mlbvip
worker_replication_port: 9092
worker_replication_http_port: 9093
worker_listeners:
- type: http
port: 8097
resources:
- names:
- client
```
Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@user_dir.service`. It will be started later in the process.
## Step 4 - Riot instance
Riot Web is the reference frontend for Matrix instances, allowing a user to access Matrix via a web browser. Riot is an optional, but recommended, feature for your homeserver
## Step 5 - ma1sd instance
ma1sd is an optional component for Matrix, providing 3PID (e.g. email, phone number, etc.) lookup services for Matrix users. I use ma1sd with my Matrix instance for two main reasons: first, to map nice-looking user data such as full names to my Matrix users, and also as RESTful authentication provider to interface Matrix with my LDAP instance. For this guide, I assume that you already have an LDAP instance set up and that you are using it in this manner too.
For this guide, HAProxy was selected as the reverse proxy of choice. This is mostly due to my familiarity with it, but also to a lesser degree for its more advanced functionality and, in my opinion, nicer configuration syntax. This section provides configuration for a "load-balanced", multi-server instance with an additional 2 slave worker servers and with separate proxy servers; a single-server instance with basic split workers can be made by removing the additional servers. This will allow the homeserver to grow to many dozens or even hundreds of users. In this setup, the load balancer is separated out onto a separate pair of servers, with a `keepalived` VIP (virtual IP address) shared between them. The name `mlbvip` should resolve to this IP, and all previous worker configurations should use this `mlbvip` hostname as the connection target for the replication directives. Both a reasonable `keepalived` configuration for the VIP and the HAProxy configuration are provided.
The two proxy hosts can be named as desired, in my case using the names `mlb1` and `mlb2`. These names must resolve in DNS, or be specified in `/etc/hosts` on both servers.
The Keepalived configuration below can be used on both proxy hosts, and inline comments provide additional clarification and information as well as indicating any changes required between the hosts. The VIP should be selected from the free IPs of your server subnet.
```
# Global configuration options.
global_defs {
# Use a dedicated IPv6 multicast group; adjust the last octet if this conflicts within your network.
vrrp_mcast_group4 224.0.0.21
# Use VRRP version 3 in strict mode and with no iptables configuration.
vrrp_version 3
vrrp_strict
vrrp_iptables
}
# HAProxy check script, to ensure that this host will not become PRIMARY if HAProxy is not active.
# A dedicated, unique virtual router ID for this cluster; adjust this if required.
virtual_router_id 21
# The priority. Set to 200 for the primary (first) server, and to 100 for the secondary (second) server.
priority 200
# The (list of) virtual IP address(es) with CIDR subnet mask.
virtual_ipaddress {
10.0.0.10/24
}
# Use the HAProxy check script for this VIP.
track_script {
chk
}
}
```
Once the above configuration is installed at `/etc/keepalived/keepalived.conf`, restart the Keepalived service with `sudo systemctl restart keepalived` on each host. You should see the VIP become active on the first host.
The HAProxy configuration below can be used verbatim on both proxy hosts, and inline comments provide additional clarification and information to avoid breaking up the configuration snippit. In this example we use `peer` configuration to enable the use of `stick-tables` directives, which ensure that individual user sessions are synchronized between the HAProxy instances during failovers; with this setting, if the hostnames of the load balancers do not resolve, HAProxy will not start. Some additional, advanced features are used in several ACLs to ensure that, for instance, specific users and rooms are always directed to the same workers if possible, which is required by the individual workers as specified in [the Matrix documentation](https://github.com/matrix-org/synapse/blob/master/docs/workers.md).
# Use multi-threadded support (available with HAProxy 1.8+) for optimal performance in high-load situations. Adjust `nbthread` as needed for your host's core count (2-4 is optimal).
Once the above configurations are installed on each server, restart the HAProxy service with `sudo systemctl restart haproxy`. You will now have access to the various endpoints on ports 443 and 8448 with a redirection from port 80 to port 443 to enforce SSL from clients.
## Final steps
Now that your proxy is running, test connectivity to your servers. For Riot, visit the bare VIP IP or the Riot subdomain. For Matrix, visit the Matrix subdomain. In both cases, ensure that the page loads properly. Finally, use the [Matirx Homserver Federation Tester](https://federationtester.matrix.org/) to verify that Federation is correctly configured for your Homserver.
Congratulations, you now have a fully-configured, multi-worker and, if configured, load-balanced Matrix instance capable of handling many dozens or hundreds of users with optimal performance!
If you have any feedback about this post, including corrections, please contact me - you can find me in the [`#synapse:matrix.org`](https://matrix.to/#/!mjbDjyNsRXndKLkHIe:matrix.org) Matrix room, or via email!