This is an old revision of the document!
This description will focus on Mosquitto, however it is possible to achieve a similar outcome with other MQTT implementations.
The aim of clustering MQTT servers is to improve the fault tolerance of the system; both the MQTT servers themselves and the services which make use of them. In this description the servers will be arranged in a Master/Slave configuration, although from the perspective of clients utilising the server they will be functionally equivalent.
Additionally a mechanism is required to allow clients to connect to either of the available servers. A number of options are available here, depending on the implementation and features of the client. These include:
I have chosen to utilise a Virtual IP arrangement using CARP, as this is the simplest to implement with the majority of the MQTT clients I need to service.
Mosquitto includes the ability to replicate messages to and from another server. In some circumstances this replication may even be bidirectional. This offers the advantage of allowing clients to publish to which ever server that can connect to and still inform all subscribers of messages.
On the master server I have added the following to the mosquitto.conf file:
connection Bridge address 192.168.29.3 clientid leonard username mqttbridge password **REMOVED** try_private true topic home/# both 2 "" "" topic homeassistant/# both 2 "" "" topic tasmota/# both 2 "" "" topic tasmotas/# both 2 "" "" topic tele/# both 2 "" ""
This configuration causes the master MQTT server to connect to the slave and replicate the home, homeassistant, tasmota, tasmotas, and tele topics in both directions.
Aside from provisioning the username and password on the client server, no other changes are required. However to ensure that all clients can equally access both servers, the password files should be kept in sync.
Failover is only required in the event of the failure of one of the servers. This is further complicated in my set up by the use of a floating IP address; clients believe they have a persistent connection to the server. In the event of a CARP transition, the server taking over the IP will have no knowledge of these connections.
The simplest solution I have found for this is to restart the MQTT server process on the machine taking over the floating IP. This is a little crude, but does ensure that all clients successfully re-establish their connections. It has the added benefit of forcing most clients to re-publish their discovery information, effectively updating Home Assistant of their status.