FS PBX High Availability (HA) Setup Scripts — Database + File Replication

pbxgeek

Active Member
Jan 19, 2021
566
167
43
38
We’ve added documentation and new scripts for High Availability (HA) deployments in FS PBX
With two FS PBX nodes, you can now run your platform in Primary–Standby mode — both servers stay synchronized, ready for instant cutover if one fails.

Overview​

The HA setup uses bi-directional replication for both:
  1. PostgreSQL Database
    A bash automation script configures PostgreSQL logical replication in both directions.
    It sets up publications, subscriptions, replication slots, and peer firewall rules — creating a true master-to-master database link between two servers.
  2. Syncthing File Replication
    Another script installs and configures Syncthing on both servers, pairs them automatically, and shares the core FS PBX directories (recordings, voicemails, sounds, and cache).
    Files stay mirrored in near real time.

⚙️ Typical Architecture​

RoleDNS RecordExample
Primary Nodeserver1.fspbx.comHandles live calls and GUI
Standby Nodeserver2.fspbx.comContinuously synced
Floating DNSpbx.fspbx.comPoints to the active node
The floating record can be flipped manually or via a small health probe script (e.g., with Cloudflare or Route 53 API).

Requirements​

  • SSH key-based access between both nodes
  • Open ports:
    • 5432 (Postgres replication)

Documentation​

Step-by-step guides are now live in our documentation:
Each article includes the full automation script and detailed explanations of what it does.

Result​

Once configured:
  • Both servers continuously replicate databases + files
  • Failover takes only a DNS change
  • The standby can take over immediately with minimal interruption
Perfect for geo-redundant or mission-critical deployments.


Check out the full HA setup guide in the FS PBX Docs, and feel free to share your experiences or improvements in this thread!
 
This is not the only way, it's just one of the ways, and it works well when you have TTL set to a very low number, like 60 seconds. Then phones reregsitrer to the backup server fast.

How do you do it? @s2svoip
 
  • Like
Reactions: s2svoip
Nice, I was going to have a crack at this myself sometime soon. This is exactly, using the DNS method and route53, that I have been doing it for the last 10 years in FusionPBX

I usually leave my TTL at 120 secs that works just fine. I have route53 healthchecks on port 5060 of the primary, if it fails, route53 switches the ip to the secondary. Though I have never had a real failure, I have made a switchover for maintenance simply by stopping freeswitch on the primary.

Faillback is sometimes a pain, in going back to the primary, Yealinks will handle it fine usually and switchover quickly after the primary comes back online.

Ciscos seem particularly awkward and it seems that if the server is still up, despite the dns having the primary, they will not switch. I usually use iptables to temporarily block their source ips and this usually does the trick. Actually, thinking about it, stopping freeswitch on the secondary may achieve the same.
 
Anyway to tweak the HA script to allow for 3 servers rather than 2? Or can I run the script on the primary FS PBX server twice?
 
Anyway to tweak the HA script to allow for 3 servers rather than 2? Or can I run the script on the primary FS PBX server twice?
Handling 3 nodes is more complex than it seems. In logical replication with three nodes, you must replicate A to B, B to C, and A to C. It's complicated and needs special attention. It's possible, but it's best to ask for support to ensure proper setup. If you know enough about database replication, you can also do it yourself manually.
 
Handling 3 nodes is more complex than it seems. In logical replication with three nodes, you must replicate A to B, B to C, and A to C. It's complicated and needs special attention. It's possible, but it's best to ask for support to ensure proper setup. If you know enough about database replication, you can also do it yourself manually.
Exactly how I’ve done it! After some playing around and playing with some edited scripts. Do now have replication working across the nodes, same with the Syncthing.
 
  • Like
Reactions: pbxgeek
After working on this for many years, I have learned that two redundant servers are usually enough. However, they should be located in different geographic areas and run by two separate companies. Recent global outages for both AWS and Cloudflare demostrate how important it is to keep the eggs in different baskets.
 
  • Like
Reactions: kenn10