+++ class = "post" date = "2020-05-31T00:00:00-04:00" tags = ["systems administration", "development", "matrix"] title = "Building a scalable, redundant Matrix Homeserver" description = "Deploy an advanced, highly-scalable Matrix instance with split-workers and backends from scratch" type = "post" weight = 1 draft = true +++ ## What is Matrix? Matrix is, fundamentally, a combination of the best parts of IRC, XMPP, and Slack-like communication platforms (Discord, Mattermost, Rocketchat, etc.) built to modern standards. In the Matrix ecosystem, users can run their own server instances, called "homeservers", which then federate amongst themselves to create a "fediverse". It is thus fully distributed, allowing users to communicate with each other on their own terms, while providing all the features one would expect of a global chat system, such as large public rooms, as well as standard features of more modern platforms, like small private groups, direct messages, file uploads, and advanced integration and moderation features, such as bots. The reference homeserver application is called "Synapse", written in Python 3, and released under an Apache 2.0 license. In this guide, I seek to provide a document detailing the full steps to deploy a highly-available, redundant, multi-worker Matrix instance, with a fully redundant PostgreSQL database and LDAP authentication and 3PID backend. For those of you who just want to run a quick-and-easy Matrix instance with few advanced features, this guide is probably not for you, and there are numerous guides out there for setting up basic Matrix Synapse instances instead. Most of the concepts in this guide, as well as most of the configuration files given, can be adapted to a single-host but still split-worker instance instead, should the configuration below be deemed too complicated or excessive for your usecase. Be sure to carefully read this document and the Matrix documentation if you wish to do so, though most sections can be adapted verbatim. ## The problem with Synapse The main issue with Synapse in its default configuration, as documented by the Matrix project themselves, is that it is single-threaded and non-redundant. Since a lot of actions inside Synapse require significant CPU resources, especially those related to federation, this can be a significant bottleneck. This is especially true in very large rooms, where there are potentially hundreds of joined users on multiple homeservers that all must be communicated to. Without tweaking, this can manifest as posts to large rooms taking an extrordanarily long time, upwards of 10 seconds, to send, as well as problems joining very large rooms for the first time (significant delays, timeouts, join failures, etc.). Unfortunately, most homeserver users aren't running their instance on the fastest possible CPU, thus, the only solution to improve performance in this area is to somehow allow the Synapse process to use multiple threads. Luckily for us, Matrix Synapse, since about version 1.10, supports this via workers. Workers allow one to split various functions out of the main Synapse process, which then allows multi-threaded operation and thus, increased performance. The configuration of workers [is discussed in the Synapse documentation](https://github.com/matrix-org/synapse/blob/master/docs/workers.md), however a number of details are glossed over or not mentioned completely. Thus, this blog post will outline some of the specific details involved in tuning workers for maximum performance. ## Step 1 - Prerequisites and planning The system outlined in this guide is designed to provide a very scalable and redundant Matrix experience. To this end, the entire system is split up into multiple hosts. In most cases, these should be Virtual Machines running on at least 2 hypervisors for redundancy at the lower layers, though this is outside of the scope of this guide. For our purposes, we will assume that the VMs discussed below are already installed, configured, and operating. The configuration outlines here makes use of a total of 14 VMs, with 6 distinct roles. Within each role, either 2 or 3 individual VMs are configured to provide redundancy. The roles can be roughly divided into two categories, frontends that expose services to users, and backends that expose databases to the frontend instances. The full VM list, with an example naming convention where X is the host "ID" (e.g. 1, 2, etc.), is as follows: Quantity Name Description --- --- --- 2 flbX Frontend load balancers running HAProxy, handling incoming requests from clients and federated servers. 2 rwX Riot Web instances under Nginx. 3 hsX Matrix Synapse homeserver instances running the various workers. 2 blbX Backend load balancers running HAProxy, handling database requests from the homeserver instances. 3 mpgX PostgreSQL database instances running Patroni with Zookeeper. 2 mldX OpenLDAP instances. While this setup may seem like overkill, it is, aside from the homeserver instances, the minimum configuration possible while still providing fully redundancy. If redundancy is not desired, a smaller configuration, down to as little as one host, is possible, though this is not detailed below. In addition to these 14 VMs, some sort of shared storage must be provided for the sharing of media files (e.g. uploaded files) between the homeservers. For the purpose of this guide, we assume that this is an NFS export at `/srv/matrix` from a system called `nas`. The configuration of redundant, shared storage is outside of the scope of this guide, and thus we will not discuss this beyond this paragraph, though prospective administrators of highly-available Matrix instances should consider this as well. All the VMs mentioned above should be running the same operating system. I recommended Debian 10.X (Buster) here, both because it is the distribution I run myself, and also because it provides nearly all the required packages with minimal fuss. If you wish to use another distribution, you must adapt the commands and examples below to fit. Additionally, this guide expects that you are running the Systemd init system. This is not the place for continuing the seemingly-endless initsystem debate, but some advanced features of Systemd (such as template units) are used below and in the official Matrix documentation, so we expect this is the initsystem you are running, and you are on your own if you choose to use an alternative. For networking purposes, it is sufficient to place all the above servers in a single RFC1918 network. Outbound NAT should be configured to allow all hosts to reach the internet, and a small number of ports should be permitted through a firewall towards the external load balancer VIP (virtual IP address). The following is an example IP configuration in the network `10.0.0.0/24` that can be used for this guide, though you may of course choose a different subnet and host IP allocation scheme if you wish. All these names should resolve in DNS, or be configured in `/etc/hosts` on all machines. IP address Hostname Description --- --- --- 10.0.0.1 gw NAT gateway and firewall, upstream router. 10.0.0.2 blbvip Floating VIP for blbX instances. 10.0.0.3 blb1 blbX host 1. 10.0.0.4 blb2 blbX host 2. 10.0.0.5 mpg1 mpgX host 1. 10.0.0.6 mpg2 mpgX host 2. 10.0.0.7 mpg3 mpgX host 3. 10.0.0.8 mld1 mldX host 1. 10.0.0.9 mld2 mldX host 2. 10.0.0.10 flbvip Floating VIP for flbX instances. 10.0.0.11 flb1 flbX host 1. 10.0.0.12 flb2 flbX host 2. 10.0.0.13 rw1 rwX host 1. 10.0.0.14 rw2 rwX host 2. 10.0.0.15 hs1 hsX host 1. 10.0.0.16 hs2 hsX host 2. 10.0.0.17 hs3 hsX host 3. # Step 2 - Installing and configuring OpenLDAP instances [OpenLDAP]() is a common LDAP server, which provides centralized user administration as well as the configuration of additional details in a user directory. Installing and configuring OpenLDAP is beyond the scope of this guide, though the Matrix Homeserver configurations below assume that this is operating and that all Matrix users are stored in the LDAP database. In our example configuration, there are 2 OpenLDAP instances running with replication (`syncrepl`) between them, which are then load-balanced in a multi-master fashion. Since no services below here will be performing writes to this database, this is fine. The administrator is expected to configure some sort of user management layer of their choosing (e.g. scripts, or a web-based frontend) for managing users, resetting passwords, etc. While this short section may seem like a cop-out, this is an extensive topic with many potential caveats, and should thus have its own (future) post on this blog. Until then, I trust that the administrator is able to look up and configure this themselves. I include these references only to help guide the administrator towards full-stack redundancy and to explain why there are LDAP sections in the backend load balancer configurations. # Step 3 - Installing and configuring Patroni instances [Patroni](https://github.com/zalando/patroni) is a service manager for PostgreSQL which provides automated failover and replication support for a PostgreSQL database. Like OpenLDAP above, the configuration of Patroni is beyond the scope of this guide, and the configurations below assume that this is operating and already configured. In our example configuration, there are 3 Patroni instances, which is the minimum required for quorum among the members. As above, I do plan to document this in a future post, but until then, I recommend the administrator reference the Patroni documentation as well as [this other post on my blog](https://www.boniface.me/post/patroni-and-haproxy-agent-checks/) for details on setting up the Patroni instances. # Step 4 - Installing and configuring backend load balancers While I do not go into details in the previous two steps, this section details how to make use of a redundant pair of HAProxy instances to expose the redundant databases mentioned above to the Homeserver instances. In order to provide a single entrypoint to the load balancers, the administrator should first install and configure Keepalived. The following `/etc/keepalived/keepalived.conf` configuration will set up the `blbvip` floating IP address between the two instances, while providing checking of the HAProxy instance health. This configuration below can be used on both proxy hosts, and inline comments provide additional clarification and information as well as indicating any changes required between the hosts. ``` # Global configuration options. global_defs { # Use a dedicated IPv4 multicast group; adjust the last octet if this conflicts within your network. vrrp_mcast_group4 224.0.0.21 # Use VRRP version 3 in strict mode and with no iptables configuration. vrrp_version 3 vrrp_strict vrrp_iptables } # HAProxy check script, to ensure that this host will not become PRIMARY if HAProxy is not active. vrrp_script chk { script "/usr/bin/haproxyctl show info" interval 5 rise 2 fall 2 } # Primary IPv4 VIP configuration. vrrp_instance VIP_4 { # Initial state, MASTER on both hosts to ensure that at least one host becomes active immediately on boot. state MASTER # Interface to place the VIP on; this is optional though still recommended on single-NIC machines; replace "ens2" with your actual NIC name. interface ens2 # A dedicated, unique virtual router ID for this cluster; adjust this if required. virtual_router_id 21 # The priority. Set to 200 for the primary (first) server, and to 100 for the secondary (second) server. priority 200 # The (list of) virtual IP address(es) with CIDR subnet mask for the "blbvip" host. virtual_ipaddress { 10.0.0.2/24 } # Use the HAProxy check script for this VIP. track_script { chk } } ``` Once the above configuration is installed at `/etc/keepalived/keepalived.conf`, restart the Keepalived service with `sudo systemctl restart keepalived` on each host. You should see the VIP become active on the first host. The HAProxy configuration below can be used verbatim on both proxy hosts, and inline comments provide additional clarification and information to avoid breaking up the configuration snippit. This configuration makes use of an advanced feature for the Patroni hosts [which is detailed in another post on this blog](https://www.boniface.me/post/patroni-and-haproxy-agent-checks/), to ensure that only the active Patroni node is sent traffic and to avoid the other two database hosts from reporting `DOWN` state all the time. ``` # Global settings - tune HAProxy for optimal performance, administration, and security. global # Send logs to the "local6" service on the local host, via an rsyslog UDP listener. Enable debug logging to log individual connections. log ip6-localhost:514 local6 debug log-send-hostname chroot /var/lib/haproxy pidfile /run/haproxy/haproxy.pid # Use multi-threadded support (available with HAProxy 1.8+) for optimal performance in high-load situations. Adjust `nbthread` as needed for your host's core count (1/2 is optimal). nbproc 1 nbthread 2 # Provide a stats socket for `hatop` stats socket /var/lib/haproxy/admin.sock mode 660 level admin process 1 stats timeout 30s # Run in daemon mode as the `haproxy` user daemon user haproxy group haproxy # Set the global connection limit to 10000; this is certainly overkill but avoids needing to tweak this for larger instances. maxconn 10000 # Default settings - provide some default settings that are applicable to (most) of the listeners and backends below. defaults log global timeout connect 30s timeout client 15m timeout server 15m log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq %bi:%bp" # Statistics listener with authentication - provides stats for the HAProxy instance via a WebUI (optional) userlist admin # WARNING - CHANGE ME TO A REAL PASSWORD OR A SHA512-hashed PASSWORD (with `password` instead of `insecure-password`). IF YOU USE `insecure-password`, MAKE SURE THIS CONFIGURATION IS NOT WORLD-READABLE. user admin insecure-password P4ssw0rd listen stats bind :::5555 v4v6 mode http stats enable stats uri / stats hide-version stats refresh 10s stats show-node stats show-legends acl is_admin http_auth(admin) http-request auth realm "Admin access required" if !is_admin # Stick-tables peers configuration peers keepalived-pair peer blb1 10.0.0.3:1023 peer blb1 10.0.0.4:1023 # LDAP frontend frontend ldap bind :::389 v4v6 maxconn 1000 mode tcp option tcpka default_backend ldap # PostgreSQL frontend frontend pgsql bind :::5432 v4v6 maxconn 1000 mode tcp option tcpka default_backend pgsql # LDAP backend backend ldap mode tcp option tcpka balance leastconn server mld1 10.0.0.8:389 check inter 2000 maxconn 64 server mld2 10.0.0.9:389 check inter 2000 maxconn 64 # PostgreSQL backend using agent check backend pgsql mode tcp option tcpka option httpchk OPTIONS /master http-check expect status 200 server mpg1 10.0.0.5:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008 server mpg2 10.0.0.6:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008 server mpg3 10.0.0.7:5432 maxconn 1000 check agent-check agent-port 5555 inter 1s fall 2 rise 2 on-marked-down shutdown-sessions port 8008 ``` Once the above configurations are installed on each server, restart the HAProxy service with `sudo systemctl restart haproxy`. Use `sudo hatop -s /var/lib/haproxy/admin.sock` to view the status of the backends, and continue once all are running correctly. ## Step 5 - Install and configure Synapse instances The core homeserver processes should be configured on all homeserver machines. There are numerous options but ## Step 2 - Configure systemd units The easiest way to set up workers is to use a template unit file with a series of individual worker configurations. A series of unit files are [provided within the Synapse documentation](https://github.com/matrix-org/synapse/tree/master/docs/systemd-with-workers), which can be used to set up template-based workers. I decided to modify these somewhat, by replacing the configuration directory at `/etc/matrix-synapse/workers` with `/etc/matrix-synapse/worker.d`, but this is just a personal preference. If you're using official Debian packages (as I am), you will also need to adjust the path to the Python binary. I also adjust the description to be a little more consistent. The resulting template worker unit looks like this: ``` [Unit] Description = Synapse Matrix worker %i PartOf = matrix-synapse.target [Service] Type = notify NotifyAccess = main User = matrix-synapse WorkingDirectory = /var/lib/matrix-synapse EnvironmentFile = /etc/default/matrix-synapse ExecStart = /usr/bin/python3 -m synapse.app.generic_worker --config-path=/etc/matrix-synapse/homeserver.yaml --config-path=/etc/matrix-synapse/conf.d/ --config-path=/etc/matrix-synapse/worker.d/%i.yaml ExecReload = /bin/kill -HUP $MAINPID Restart = on-failure RestartSec = 3 SyslogIdentifier = matrix-synapse-%i [Install] WantedBy = matrix-synapse.target ``` There is also a generic target unit that should be installed to provide a unified management point for both the primary Synapse process as well as the workers. After some similar tweaks, including adjusting the After condition to use `network-online.target` instead of `network.target`, the resulting file looks like this: ``` [Unit] Description = Synapse Matrix homeserver target After = network-online.target [Install] WantedBy = multi-user.target ``` Install both of these units, as `matrix-synapse-worker@.service` and `matrix-synapse.target` respectively, to `/etc/systemd/system`, and run `sudo systemctl daemon-reload`. Once the unit files are prepared, you can begin building each individual worker configuration. ## Step 3 - Configure the individual workers Each worker is configured via an individual YAML configuration file, with our units under `/etc/matrix-synapse/worker.d`. By design, each worker makes use of `homeserver.yaml` for all global configuration values, then the individual worker configurations override specific settings for the particular worker. The [Synapse documentation on workers](https://github.com/matrix-org/synapse/blob/master/docs/workers.md) provides a good starting point, but some sections are vague, and thus this guide hopes to provide more detailed instructions and explanations. Each worker is given a specific section below, which includes the full YAML configuration I use, as well as any notes about the configuration that are worth mentioning. They are provided in alphabetical order, rather than the order provided in the documentation above, for clarity. For any worker which responds to REST, a port must be selected for the worker to listen on. The main homeserver runs by default on port 8008, and I have `ma1xd` running on port 8090, so I chose ports from 8091 to 8097 for the various REST workers in order to keep them in a consistent range. Finally, the main homeserver must be configured with both TCP and HTTP replication listeners, to provide communication between the workers and the main process. For this I use the ports provided by the Matrix documentation above, 9092 and 9093, with the following configuration in the main `homeserver.yaml` `listeners` section: ``` listeners: - port: 8008 tls: false bind_addresses: - '::' type: http x_forwarded: true resources: - names: [client, webclient] compress: true - port: 9092 bind_addresses: - '::' type: replication x_forwarded: true - port: 9093 bind_addresses: - '::' type: http x_forwarded: true resources: - names: [replication] ``` There are a couple adjustments here from the default configuration. First, the `federation` resource has been removed from the primary listener, since this is implemented as a worker below. TLS is disabled here, and `x_forwarded: true` is added to all 3 frontends, since this is handled by a reverse proxy, as discussed later in this guide. All three listeners use a global IPv6+IPv4 bind address of `::` so they will be accessible by other machines on the network, which is important for the final, multi-host setup. As noted in the Matrix documentation, *ensure that the replication ports are not publicly accessible*, since they are unauthenticated and unencrypted; I run these servers on an RFC1918 private network behind a firewall so this is secure, but you will need to provide some sort of firewall if your Synapse instance is directly available on the public Internet. The configurations below show a hostname, `mlbvip`, for all instances of `worker_replication_host`. This will be explained and discussed further in the reverse proxy section. If you are only interested in running a "single-server" instance, you may use `localhost`, `127.0.0.1`, or `::1` here instead, as these ports will not managed by the reverse proxy in such a setup. #### `appservice` worker (`/etc/matrix-synapse/worker.d/appservice.yaml`) The `appservice` worker does not service REST endpoints, and thus has a minimal configuration. ``` --- worker_app: synapse.app.appservice worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@appservice.service`. It will be started later in the process. #### `client_reader` worker (`/etc/matrix-synapse/worker.d/client_reader.yaml`) The `client_reader` worker services REST endpoints, and thus has a listener section, with port 8091 chosen. ``` --- worker_app: synapse.app.client_reader worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8091 resources: - names: - client ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@client_reader.service`. It will be started later in the process. #### `event_creator` worker (`/etc/matrix-synapse/worker.d/event_creator.yaml`) The `event_creator` worker services REST endpoints, and thus has a listener section, with port 8092 chosen. ``` --- worker_app: synapse.app.event_creator worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8092 resources: - names: - client ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@event_creator.service`. It will be started later in the process. #### `federation_reader` worker (`/etc/matrix-synapse/worker.d/federation_reader.yaml`) The `federation_reader` worker services REST endpoints, and thus has a listener section, with port 8093 chosen. Note that this worker, in addition to a `client` resource, also provides a `federation` resource. ``` --- worker_app: synapse.app.federation_reader worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8093 resources: - names: - client - federation ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@federation_reader.service`. It will be started later in the process. #### `federation_sender` worker (`/etc/matrix-synapse/worker.d/federation_sender.yaml`) The `federation_sender` worker does not service REST endpoints, and thus has a minimal configuration. ``` --- worker_app: synapse.app.federation_sender worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@federation_sender.service`. It will be started later in the process. #### `frontend_proxy` worker (`/etc/matrix-synapse/worker.d/frontend_proxy.yaml`) The `frontend_proxy` worker services REST endpoints, and thus has a listener section, with port 8094 chosen. This worker has an additional configuration parameter, `worker_main_http_uri`, which allows the worker to direct requests back to the primary Synapse instance. Similar to the `worker_replication_host` value, this uses `mlbvip` in this example, and for "single-server" instances *must* be replaced with `localhost`, `127.0.0.1`, or `::1` instead, as this port will not managed by the reverse proxy in such a setup. ``` --- worker_app: synapse.app.frontend_proxy worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_main_http_uri: http://mlbvip:8008 worker_listeners: - type: http port: 8094 resources: - names: - client ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@frontend_proxy.service`. It will be started later in the process. #### `media_repository` worker (`/etc/matrix-synapse/worker.d/media_repository.yaml`) The `media_repository` worker services REST endpoints, and thus has a listener section, with port 8095 chosen. Note that this worker, in addition to a `client` resource, also provides a `media` resource. ``` --- worker_app: synapse.app.media_repository worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8095 resources: - names: - client - media ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@media_repository.service`. It will be started later in the process. #### `pusher` worker (`/etc/matrix-synapse/worker.d/pusher.yaml`) The `pusher` worker does not service REST endpoints, and thus has a minimal configuration. ``` --- worker_app: synapse.app.pusher worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@pusher.service`. It will be started later in the process. #### `synchrotron` worker (`/etc/matrix-synapse/worker.d/synchrotron.yaml`) The `synchrotron` worker services REST endpoints, and thus has a listener section, with port 8096 chosen. ``` --- worker_app: synapse.app.synchrotron worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8096 resources: - names: - client ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@synchrotron.service`. It will be started later in the process. #### `user_dir` worker (`/etc/matrix-synapse/worker.d/user_dir.yaml`) The `user_dir` worker services REST endpoints, and thus has a listener section, with port 8097 chosen. ``` --- worker_app: synapse.app.user_dir worker_replication_host: mlbvip worker_replication_port: 9092 worker_replication_http_port: 9093 worker_listeners: - type: http port: 8097 resources: - names: - client ``` Once the configuration is in place, enable the worker by running `sudo systemctl enable matrix-synapse-worker@user_dir.service`. It will be started later in the process. ## Step 4 - Riot instance Riot Web is the reference frontend for Matrix instances, allowing a user to access Matrix via a web browser. Riot is an optional, but recommended, feature for your homeserver ## Step 5 - ma1sd instance ma1sd is an optional component for Matrix, providing 3PID (e.g. email, phone number, etc.) lookup services for Matrix users. I use ma1sd with my Matrix instance for two main reasons: first, to map nice-looking user data such as full names to my Matrix users, and also as RESTful authentication provider to interface Matrix with my LDAP instance. For this guide, I assume that you already have an LDAP instance set up and that you are using it in this manner too. ## Step 6 - Reverse proxy For this guide, HAProxy was selected as the reverse proxy of choice. This is mostly due to my familiarity with it, but also to a lesser degree for its more advanced functionality and, in my opinion, nicer configuration syntax. This section provides configuration for a "load-balanced", multi-server instance with an additional 2 slave worker servers and with separate proxy servers; a single-server instance with basic split workers can be made by removing the additional servers. This will allow the homeserver to grow to many dozens or even hundreds of users. In this setup, the load balancer is separated out onto a separate pair of servers, with a `keepalived` VIP (virtual IP address) shared between them. The name `mlbvip` should resolve to this IP, and all previous worker configurations should use this `mlbvip` hostname as the connection target for the replication directives. Both a reasonable `keepalived` configuration for the VIP and the HAProxy configuration are provided. The two proxy hosts can be named as desired, in my case using the names `mlb1` and `mlb2`. These names must resolve in DNS, or be specified in `/etc/hosts` on both servers. The Keepalived configuration below can be used on both proxy hosts, and inline comments provide additional clarification and information as well as indicating any changes required between the hosts. The VIP should be selected from the free IPs of your server subnet. ``` # Global configuration options. global_defs { # Use a dedicated IPv6 multicast group; adjust the last octet if this conflicts within your network. vrrp_mcast_group4 224.0.0.21 # Use VRRP version 3 in strict mode and with no iptables configuration. vrrp_version 3 vrrp_strict vrrp_iptables } # HAProxy check script, to ensure that this host will not become PRIMARY if HAProxy is not active. vrrp_script chk { script "/usr/bin/haproxyctl show info" interval 5 rise 2 fall 2 } # Primary IPv4 VIP configuration. vrrp_instance VIP_4 { # Initial state, MASTER on both hosts to ensure that at least one host becomes active immediately on boot. state MASTER # Interface to place the VIP on; this is optional though still recommended on single-NIC machines; replace "ens2" with your actual NIC name. interface ens2 # A dedicated, unique virtual router ID for this cluster; adjust this if required. virtual_router_id 21 # The priority. Set to 200 for the primary (first) server, and to 100 for the secondary (second) server. priority 200 # The (list of) virtual IP address(es) with CIDR subnet mask. virtual_ipaddress { 10.0.0.10/24 } # Use the HAProxy check script for this VIP. track_script { chk } } ``` Once the above configuration is installed at `/etc/keepalived/keepalived.conf`, restart the Keepalived service with `sudo systemctl restart keepalived` on each host. You should see the VIP become active on the first host. The HAProxy configuration below can be used verbatim on both proxy hosts, and inline comments provide additional clarification and information to avoid breaking up the configuration snippit. In this example we use `peer` configuration to enable the use of `stick-tables` directives, which ensure that individual user sessions are synchronized between the HAProxy instances during failovers; with this setting, if the hostnames of the load balancers do not resolve, HAProxy will not start. Some additional, advanced features are used in several ACLs to ensure that, for instance, specific users and rooms are always directed to the same workers if possible, which is required by the individual workers as specified in [the Matrix documentation](https://github.com/matrix-org/synapse/blob/master/docs/workers.md). ``` global # Send logs to the "local6" service on the local host, via an rsyslog UDP listener. Enable debug logging to log individual connections. log ip6-localhost:514 local6 debug log-send-hostname chroot /var/lib/haproxy pidfile /run/haproxy/haproxy.pid # Use multi-threadded support (available with HAProxy 1.8+) for optimal performance in high-load situations. Adjust `nbthread` as needed for your host's core count (2-4 is optimal). nbproc 1 nbthread 4 # Provide a stats socket for `hatop` stats socket /var/lib/haproxy/admin.sock mode 660 level admin process 1 stats timeout 30s # Run in daemon mode as the `haproxy` user daemon user haproxy group haproxy # Set the global connection limit to 10000; this is certainly overkill but avoids needing to tweak this for larger instances. maxconn 10000 # Set default SSL configurations, including a modern highly-secure configuration requiring TLS1.2 client support. ca-base /etc/ssl/certs crt-base /etc/ssl/private tune.ssl.default-dh-param 2048 ssl-default-bind-ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets ssl-default-server-ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384 ssl-default-server-options ssl-min-ver TLSv1.2 no-tls-tickets defaults log global option http-keep-alive option forwardfor except 127.0.0.0/8 option redispatch option dontlognull option splice-auto option log-health-checks default-server init-addr libc,last,none timeout client 30s timeout connect 30s timeout server 300s timeout tunnel 3600s timeout http-keep-alive 60s timeout http-request 30s timeout queue 60s timeout tarpit 60s peers keepalived-pair # Peers for site bl0 peer mlb1.i.bonilan.net mlb1.i.bonilan.net:1023 peer mlb2.i.bonilan.net mlb2.i.bonilan.net:1023 resolvers nsX nameserver ns1 10.101.0.61:53 nameserver ns2 10.101.0.62:53 userlist admin user admin password MySuperSecretPassword123 listen stats bind :::5555 v4v6 mode http stats enable stats uri / stats hide-version stats refresh 10s stats show-node stats show-legends acl is_admin http_auth(admin) http-request auth realm "Admin access" if !is_admin frontend http bind :::80 v4v6 mode http option httplog acl url_letsencrypt path_beg /.well-known/acme-challenge/ use_backend letsencrypt if url_letsencrypt redirect scheme https if !url_letsencrypt !{ ssl_fc } frontend https bind :::443 v4v6 ssl crt /etc/ssl/letsencrypt/ alpn h2,http/1.1 bind :::8448 v4v6 ssl crt /etc/ssl/letsencrypt/ alpn h2,http/1.1 mode http option httplog capture request header Host len 64 http-request set-header X-Forwarded-Proto https http-request add-header X-Forwarded-Host %[req.hdr(host)] http-request add-header X-Forwarded-Server %[req.hdr(host)] http-request add-header X-Forwarded-Port %[dst_port] # Method ACLs acl http_method_get method GET # Domain ACLs acl host_matrix hdr_dom(host) im.bonifacelabs.ca acl host_element hdr_dom(host) chat.bonifacelabs.ca # URL ACLs # Sync requests acl url_workerX_stick-auth path_reg ^/_matrix/client/(r0|v3)/sync$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3)/events$ acl url_workerX_stick-auth path_reg ^/_matrix/client/(api/v1|r0|v3)/initialSync$ acl url_workerX_stick-auth path_reg ^/_matrix/client/(api/v1|r0|v3)/rooms/[^/]+/initialSync$ # Federation requests acl url_workerX_generic path_reg ^/_matrix/federation/v1/event/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/state/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/state_ids/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/backfill/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/get_missing_events/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/publicRooms acl url_workerX_generic path_reg ^/_matrix/federation/v1/query/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/make_join/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/make_leave/ acl url_workerX_generic path_reg ^/_matrix/federation/(v1|v2)/send_join/ acl url_workerX_generic path_reg ^/_matrix/federation/(v1|v2)/send_leave/ acl url_workerX_generic path_reg ^/_matrix/federation/(v1|v2)/invite/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/event_auth/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/exchange_third_party_invite/ acl url_workerX_generic path_reg ^/_matrix/federation/v1/user/devices/ acl url_workerX_generic path_reg ^/_matrix/key/v2/query acl url_workerX_generic path_reg ^/_matrix/federation/v1/hierarchy/ # Inbound federation transaction request acl url_workerX_stick-src path_reg ^/_matrix/federation/v1/send/ # Client API requests acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/createRoom$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/publicRooms$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/joined_members$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/context/.*$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/members$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state$ acl url_workerX_generic path_reg ^/_matrix/client/v1/rooms/.*/hierarchy$ acl url_workerX_generic path_reg ^/_matrix/client/unstable/org.matrix.msc2716/rooms/.*/batch_send$ acl url_workerX_generic path_reg ^/_matrix/client/unstable/im.nheko.summary/rooms/.*/summary$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/account/3pid$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/account/whoami$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/devices$ acl url_workerX_generic path_reg ^/_matrix/client/versions$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/voip/turnServer$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/event/ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/joined_rooms$ acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/search$ # Encryption requests # Note that ^/_matrix/client/(r0|v3|unstable)/keys/upload/ requires `worker_main_http_uri` acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/keys/query$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/keys/changes$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/keys/claim$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/room_keys/ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/keys/upload/ # Registration/login requests acl url_workerX_generic path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/login$ acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/register$ acl url_workerX_generic path_reg ^/_matrix/client/v1/register/m.login.registration_token/validity$ # Event sending requests acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/redact acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/send acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/state/ acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/(join|invite|leave|ban|unban|kick)$ acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/join/ acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/profile/ # User directory search requests acl url_workerX_generic path_reg ^/_matrix/client/(r0|v3|unstable)/user_directory/search$ # Pagination requests acl url_workerX_stick-path path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/messages$ # Push rules (GET-only) acl url_push-rules path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/pushrules/ # Directory worker endpoints acl url_directory-worker path_reg ^/_matrix/client/(r0|v3|unstable)/user_directory/search$ # Event persister endpoints acl url_stream-worker path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/rooms/.*/typing acl url_stream-worker path_reg ^/_matrix/client/(r0|v3|unstable)/sendToDevice/ acl url_stream-worker path_reg ^/_matrix/client/(r0|v3|unstable)/.*/tags acl url_stream-worker path_reg ^/_matrix/client/(r0|v3|unstable)/.*/account_data acl url_stream-worker path_reg ^/_matrix/client/(r0|v3|unstable)/rooms/.*/receipt acl url_stream-worker path_reg ^/_matrix/client/(r0|v3|unstable)/rooms/.*/read_markers acl url_stream-worker path_reg ^/_matrix/client/(api/v1|r0|v3|unstable)/presence/ # Backend directors use_backend synapseX_worker_generic if host_matrix url_workerX_generic use_backend synapseX_worker_generic if host_matrix url_push-rules http_method_get use_backend synapseX_worker_stick-auth if host_matrix url_workerX_stick-auth use_backend synapseX_worker_stick-src if host_matrix url_workerX_stick-src use_backend synapseX_worker_stick-path if host_matrix url_workerX_stick-path use_backend synapse0_directory_worker if host_matrix url_directory-worker use_backend synapse0_stream_worker if host_matrix url_stream-worker # Master workers (single-instance) - Federation media repository requests acl url_mediarepository path_reg ^/_matrix/media/ acl url_mediarepository path_reg ^/_synapse/admin/v1/purge_media_cache$ acl url_mediarepository path_reg ^/_synapse/admin/v1/room/.*/media.*$ acl url_mediarepository path_reg ^/_synapse/admin/v1/user/.*/media.*$ acl url_mediarepository path_reg ^/_synapse/admin/v1/media/.*$ acl url_mediarepository path_reg ^/_synapse/admin/v1/quarantine_media/.*$ acl url_mediarepository path_reg ^/_synapse/admin/v1/users/.*/media$ use_backend synapse0_media_repository if host_matrix url_mediarepository # MXISD/MA1SD worker acl url_ma1sd path_reg ^/_matrix/client/(api/v1|r0|unstable)/user_directory acl url_ma1sd path_reg ^/_matrix/client/(api/v1|r0|unstable)/login acl url_ma1sd path_reg ^/_matrix/identity use_backend synapse0_ma1sd if host_matrix url_ma1sd # Webhook service acl url_webhook path_reg ^/webhook use_backend synapse0_webhook if host_matrix url_webhook # .well-known configs acl url_wellknown path_reg ^/.well-known/matrix use_backend elementX_http if host_matrix url_wellknown # Catchall Matrix and RElement use_backend synapse0_master if host_matrix use_backend elementX_http if host_element # Default to Riot default_backend elementX_http frontend ma1sd_http bind :::8090 v4v6 mode http option httplog use_backend synapse0_ma1sd backend letsencrypt mode http server elbvip.i.bonilan.net elbvip.i.bonilan.net:80 resolvers nsX resolve-prefer ipv4 backend elementX_http mode http balance leastconn option httpchk GET /index.html # Force users (by source IP) to visit the same backend server stick-table type ipv6 size 5000k peers keepalived-pair expire 72h stick on src errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server element1 element1.i.bonilan.net:80 resolvers nsX resolve-prefer ipv4 check inter 5000 cookie element1.i.bonilan.net server element2 element2.i.bonilan.net:80 resolvers nsX resolve-prefer ipv4 check inter 5000 cookie element2.i.bonilan.net backend synapse0_master mode http balance roundrobin option httpchk retries 0 errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse0.i.bonilan.net synapse0.i.bonilan.net:8008 resolvers nsX resolve-prefer ipv4 check inter 5000 backup backend synapse0_directory_worker mode http balance roundrobin option httpchk retries 0 errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse0.i.bonilan.net synapse0.i.bonilan.net:8033 resolvers nsX resolve-prefer ipv4 check inter 5000 backup backend synapse0_stream_worker mode http balance roundrobin option httpchk retries 0 errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse0.i.bonilan.net synapse0.i.bonilan.net:8035 resolvers nsX resolve-prefer ipv4 check inter 5000 backup backend synapse0_media_repository mode http balance roundrobin option httpchk retries 0 errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse0.i.bonilan.net synapse0.i.bonilan.net:8095 resolvers nsX resolve-prefer ipv4 check inter 5000 backup backend synapse0_ma1sd mode http balance roundrobin option httpchk errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse0.i.bonilan.net synapse0.i.bonilan.net:8090 resolvers nsX resolve-prefer ipv4 check inter 5000 backend synapse0_webhook mode http balance roundrobin option httpchk GET / server synapse0.i.bonilan.net synapse0.i.bonilan.net:4785 resolvers nsX resolve-prefer ipv4 check inter 5000 backup backend synapseX_worker_generic mode http balance roundrobin option httpchk errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse1.i.bonilan.net synapse1.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 server synapse2.i.bonilan.net synapse2.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 backend synapseX_worker_stick-auth mode http balance roundrobin option httpchk # Force users (by Authorization header) to visit the same backend server stick-table type string len 1024 size 5000k peers keepalived-pair expire 72h stick on hdr(Authorization) errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse1.i.bonilan.net synapse1.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 server synapse2.i.bonilan.net synapse2.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 backend synapseX_worker_stick-path mode http balance roundrobin option httpchk # Force users to visit the same backend server stick-table type string len 1024 size 5000k peers keepalived-pair expire 72h stick on path,word(5,/) if { path_reg ^/_matrix/client/(r0|unstable)/rooms } stick on path,word(6,/) if { path_reg ^/_matrix/client/api/v1/rooms } stick on path errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse1.i.bonilan.net synapse1.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 server synapse2.i.bonilan.net synapse2.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 backend synapseX_worker_stick-src mode http balance roundrobin option httpchk # Force users (by source IP) to visit the same backend server stick-table type ipv6 size 5000k peers keepalived-pair expire 72h stick on src errorfile 500 /etc/haproxy/sorryserver.http errorfile 502 /etc/haproxy/sorryserver.http errorfile 503 /etc/haproxy/sorryserver.http errorfile 504 /etc/haproxy/sorryserver.http server synapse1.i.bonilan.net synapse1.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 server synapse2.i.bonilan.net synapse2.i.bonilan.net:8030 resolvers nsX resolve-prefer ipv4 check inter 5000 ``` Once the above configurations are installed on each server, restart the HAProxy service with `sudo systemctl restart haproxy`. You will now have access to the various endpoints on ports 443 and 8448 with a redirection from port 80 to port 443 to enforce SSL from clients. ## Final steps Now that your proxy is running, test connectivity to your servers. For Riot, visit the bare VIP IP or the Riot subdomain. For Matrix, visit the Matrix subdomain. In both cases, ensure that the page loads properly. Finally, use the [Matirx Homserver Federation Tester](https://federationtester.matrix.org/) to verify that Federation is correctly configured for your Homserver. Congratulations, you now have a fully-configured, multi-worker and, if configured, load-balanced Matrix instance capable of handling many dozens or hundreds of users with optimal performance! If you have any feedback about this post, including corrections, please contact me - you can find me in the [`#synapse:matrix.org`](https://matrix.to/#/!mjbDjyNsRXndKLkHIe:matrix.org) Matrix room, or via email!