====== MQTT Clustering ======
This description will focus on [[https://mosquitto.org/|Mosquitto]], however it is possible to achieve a similar outcome with other MQTT implementations.
The aim of clustering MQTT servers is to improve the fault tolerance of the system; both the MQTT servers themselves and the services which make use of them. In this description the servers will be arranged in a Master/Slave configuration, although from the perspective of clients utilising the server they will be functionally equivalent.
Additionally a mechanism is required to allow clients to connect to either of the available servers. A number of options are available here, depending on the implementation and features of the client. These include:
* Virtual IPs (ie CARP or VRRP);
* DNS fail-over
* DNS SRV Records
* Server Load Balancer
I have chosen to utilise a Virtual IP arrangement using CARP, as this is the simplest to implement with the majority of the MQTT clients I need to service.
===== Replication =====
Mosquitto includes the ability to replicate messages to and from another server. In some circumstances this replication may even be bidirectional. This offers the advantage of allowing clients to publish to which ever server that can connect to and still inform all subscribers of messages.
On the master server I have added the following to the mosquitto.conf file:
connection Bridge
address 192.168.29.3
clientid leonard
username mqttbridge
password **REMOVED**
try_private true
topic home/# both 2 "" ""
topic homeassistant/# both 2 "" ""
topic tasmota/# both 2 "" ""
topic tasmotas/# both 2 "" ""
topic tele/# both 2 "" ""
This configuration causes the master MQTT server to connect to the slave and replicate the home, homeassistant, tasmota, tasmotas, and tele topics in both directions.
Aside from provisioning the username and password on the client server, no other changes are required. However to ensure that all clients can equally access both servers, the password files should be kept in sync.
===== Floating IP Work-arounds =====
The movement of a floating IP from one server to another can cause some inconsistencies with clients. The clients believe they have a persistent connection to the server, based on an established TCP connection. In the event of a CARP transition, the server taking over the IP will have no knowledge of these connections.
The simplest solution I have found for this is to restart the MQTT server process on the machine taking over the floating IP. This is a little crude, but does ensure that all clients successfully re-establish their connections. It has the added benefit of forcing most clients to re-publish their discovery information, effectively updating Home Assistant of their status.
Additionally, Mosquitto is unable to listen on the floating IP address unless the IP is present on the system at start-up. A quick work-around is to NAT incoming MQTT connections on this address to the server's static IP address.
iptables -t nat -A PREROUTING -d 192.168.29.2/32 -i eth0.29 -p tcp -m tcp --dport 1883 -j DNAT --to-destination 192.168.29.3:1883
This allows clients connecting to the floating IP address to be redirected to the listening socket on the static address.