FusionPBX Database Replication for High Availability (HA)

markjcrane

Well-Known Member
Staff member
Jul 22, 2018
712
268
63
50
FusionPBX supports native PostgreSQL's native logical mult-master replication. Whether you're setting up a 2-node or 3-node cluster, or additional nodes for more redundancy.

The FusionPBX project was an early adopter of BDR (Bi-Directional-Replication), and we taught many of you how to use it. 2nd Quadrant's BDR was released in 2016. Over time, around 80 to 90% of BDR was donated as open source and added to PostgreSQL. This new logical multi-master replication is based on that work. When this new logical multi-master replication was released in PostgreSQL 16, the FusionPBX team was one of the first in the open source PBX community to adopt it, learn it, and then we immediately started training people to use it. If you know how to do this newer replication, it's likely you learned directly or indirectly from the FusionPBX project.

This new replication is better than the original BDR in multiple ways. In BDR it would do a snapshot of the entire database and then copy it over to the new node. If the database were large, it could take a while before you saw any data on the new node. The newer native PostgreSQL logical multi-master replication replicates each table separately, and the replication begins immediately.

At first, with this new replication, it took time to realize how to fix all the problems clients were experiencing. However, we have done our time and gained the necessary experience to fix all the issues that our clients have faced up to this point. With the help of our client base, we have accelerated experience-based on real-world demand.

Why Choose a 3-Node Database Cluster?

While a 2-node cluster provides redundancy and failover capabilities, adding a node offers several significant advantages:

1. Fault Tolerance: In a 2-node setup, if one server fails or becomes unavailable, 1 server is left without any remaining redundancy. In a 3-node setup, even if one node fails, the other two nodes can continue to operate without interruption. This ensures that your system remains fully functional, minimizing downtime. Why have a 2-node cluster when you can have the additional peace of mind of a 3-node cluster?

2. Geographical Distribution: A 3-node cluster allows for more flexible geographical distribution of your database servers. You can place nodes in different locations to improve geographic redundancy, enhancing disaster recovery capabilities.

3. Larger Workload A 3-node or larger cluster can handle a larger workload.

The FusionPBX team set up a 3-node cluster for someone last week. It's not our first time, we have done it many times.

Additional 3rd party references as to why you consider more redundancy
- https://www.acronis.com/en/blog/posts/backup-rule/
- https://www.ipmcomputers.com/the-3-2-1-backup-rule-explained-and-why-its-still-relevant-in-2025/
- https://www.impossiblecloud.com/blog/the-golden-rules-of-backup-strategy-from-3-2-1-to-3-2-1-1-0

Even AWS has outages. Recent outage by Cloudflare a company considered by many to be one of the safest recently had a major outage. This further drove the point to those paying attention that more redundancy is a very good thing.
 
Last edited:
  • Like
Reactions: babak and krooney
In a 3-node setup, it’s important to remember that one server will typically handle more traffic, because it has to replicate to two other nodes. On a busy VoIP system, that extra replication traffic can matter—you generally want to preserve as much bandwidth and system resources as possible for call traffic.

Database replication is effectively near real-time. By default it isn’t throttled or run through any special queue, so it competes with the rest of the workload on the server. If you also replicate files such as call recordings, voicemails, and faxes, that adds even more network and I/O load. All of this needs to be carefully engineered, and the “right” design will vary by environment.

In my experience, most deployments are perfectly well-served by a 2-node setup. If you find yourself needing more than that, it’s often a sign that something else might be misconfigured or could be optimized before adding extra nodes.

One more thing worth mentioning: replication is not a backup strategy. It’s a failover strategy. You still need proper backups, and in many cases you’ll want multiple backup copies as described in the articles above. Designing redundant VoIP infrastructure is not the same as protecting your data, and the two should never be confused.

Another important point: even if you deploy 3, 5, or 10 nodes, none of that matters if your authoritative DNS is down. If your DNS is hosted on AWS Route 53, Cloudflare, or another provider, and they go down, your traffic won’t route to any of your nodes because clients simply can’t resolve your domain. Redundancy must extend beyond your servers.
 
A 3-node cluster does not have a lot of extra bandwidth. It's not a concern for most people with servers in a data center. Call recordings use MP3 rather than a WAV file, as the MP3 is many times smaller. A 2-node cluster is popular and inexpensive. Some are using 3 nodes, and a few are larger.

Nodes in a cluster are set up to act as one server. However, each one of them is a distinct server, and each one backs up to itself. If something were to happen to the database, it's faster to restore the database if needed, as the server is ready to be used at any time. Multiple data center and multiple providers is a very good thing because all providers have outages.

I suspect that the reasoning for discouraging this is more about steering people to a solution you prefer or want to support. Which one is the right to choose is dependent on multiple factors.