Upgrade and Resilience Advice - FusionPBX 4.5.10 to Latest

Status
Not open for further replies.

finnsloss

New Member
Oct 17, 2019
5
0
1
44
Hi,

I currently have a standalone FusionPBX virtual server running version: 4.5.10 (Switch Version 1.10.1) on Debian 9.11 Postgres 12.1
I'd like some advice from the community on how best to bring the server up to date and improve the resilience of the system.
I have been considering whether it is worthwhile setting up a High Availability cluster of FusionPBX servers to mitigate for a server failure.

Should I upgrade the system 'in-place'?
  • Update FusionPBX to latest version
  • Update host operating system to latest version - is this a stable option?
Should I install a fresh instance of FusionPBX and migrate the database?
  • Perhaps the safer option to upgrade?
  • What are people's experience with database migration?
Should I look to have a cluster of servers to mitigate for server failure?
I was initially drawn to the BDR option, but this appears to only work on Postgres version 9.4 and if you want to have live call failover you can only use UDP.
I'd take the hit on live call failover, just having the ability to "try the call again" and it work in the event of failure is acceptable.
I ran into trouble when migrating the database, as you can't downgrade from postgres 12.1 to 9.4 for the BDR setup.

Should I look at Multi Master Replication?

Should I just bring up the virtual machine at an alternative data centre in the event of failure?


Any advice from the community on the best way forward for this server would be greatly appreciated. We have around 700 handsets on the system, so migrating the database is essential, re-entering the data would be tiresome (even with CSVs)!

Thanks in advance,
Finn
 

mingus

Member
Mar 23, 2018
43
7
8
53
In an effort to save you tons of time, here is what I have found.
  • Live call recovery/failover only works with UDP. As soon as the TCP/TLS connection is lost, the call is terminated, leaving nothing to recover but a cold and lonely CDR.

  • Live call recovery usually takes several seconds. Assuming that you've set your timeout to a relatively short 3 seconds, it will theoretically take about another 2 to 5 seconds for everything to failover. As such, your callers will hear dead air for, at best, 5 seconds, but likely closer to 8 seconds, assuming everything works as expected. In this amount of time, one of the call parties will likely hang up. So are the time, cost, vulnerability of the call, and the likelihood that the callers will hang up worth implementing call recovery? It wasn't for me.

  • Never upgrade a production environment without testing. Never. It may work 95% of the time, but the 5% it doesn't will hurt. A lot.

  • Create a staging environment and test the upgrade there. If all is well, you should be able to upgrade your production environment during a non-peak time without any surprises. Test all of your PBX functions on the staging. Make sure your staging environment matches your production environment.

  • It's always a good idea to have a live backup system at a different data center. There is an argument for using separate cloud providers, but I haven't seen an issue where all of the data centers run by a single cloud provider go down simultaneously. Nonetheless, I guess it could happen. However, using separate cloud providers will likely require using proxies, which isn't a bad idea anyway.

  • Automate the backup of your database. Make sure it happens, check it periodically, and test recovery periodically. When your system goes down hard, remembering how to recover a Postgres database with BDR or Bucardo installed is not fun. Create scripts that you've tested for recovery.

  • Do not rely on your replicated database as your backup. Mistakes are also replicated.

  • Multimaster replication is great, but it's not trivial in the long run. How you manage 2 servers is not the same for 10 servers? How will you grow the storage? BDR is tied to a proprietary Postgres version fixed on version 9.4. Bucardo can be temperamental. Choose wisely. Please don't ask me which I prefer. Both give me indigestion.

  • TablePlus is your friend. Consider buying a license.

  • If your servers are on the Internet, look into placing Kamailio or OpenSIPS between the public internet and your FreeSWITCH server. The learning curve is steep, but your life will improve immeasurably. Please don't ask me which I prefer. I have used both for very different tasks.

  • Never use default ports. In particular, never use ports 5060 or 5061.

  • Consider paying for a FusionPBX membership. Especially if you are earning a living using FusionPBX. You'll learn a lot from the training sessions.

  • FreeSWITCH is an incredibly robust platform. However, consider registrations, presence, and each call leg as separate sessions when architecting your system. Encryption adds about 11x the CPU load to the media and signaling. Each memcache, database, and file write request adds to the load. Transcoding takes a considerable amount of CPU load. Transcoding OPUS takes more than one would have thought!

    Your FusionPBX/FreeSWITCH Application server will run a significantly smaller number of simultaneous calls than you think it will. Make sure you test and scale appropriately.

  • Don't be a jerk.
 
Status
Not open for further replies.