Backup and replication

sly

New Member
Feb 21, 2021
8
2
3
I enabled backup in default settings, and it creates backup files in var/backups/fspbx
How many backup files are kept in this folder before they get deleted automatically? I didn't see an option to set that anywhere.
I want to do a test restore of this backup file to a new server. How do i go about doing that? Just want to document disaster recovery plans here that are beyond normal VPS snapshot restore.
Is there real-time server replication on the roadmap?

Thank you!
 
Hi @sly
It keeps 2 days of tarball file back-ups and 7 days of database backups.
If you think adding a variable to the default settings is a good idea, I can add it in the next release.

For disaster recovery, I would do the following:
1) Install a fresh copy of FS PBX
2) restore the database
3) restore files from the tarball.

FS PBX is fully production-ready and is used by many VoIP providers. We have a database logical replication script that helps you create 2 way replication between two servers. Multi-node is possible too, with some limitations from Postgres. However, we deployed it successfully on a few installs.
Run this script that will walk you through setting up a two-way communication between two nodes. Servers need to be able to communicate by SSH to run the script successfully.

Code:
cd /var/www/fspbx
sh install/setup_logical_replication.sh

For file replication, I recommend Syncthing. It's been working great on all of our deployments. I don't have a tutorial posted anywhere online yet, but it's pretty straightforward. Reach out if you get stuck.
 
I have no doubts that it's production-ready; it's in production here now, replacing Fusion.

Main ISP in my area is Spectrum cable, it's solid for most part, but once or twice a year it likes to lose connection to our East Coast data center. It just point-blank refuses even to ping anything; only Spectum customers are affected, and it generally lasts from 1-6 hours and just comes back like nothing happened. Because of that, we set up another PBX instance in the West Coast data center, so when these shenanigans happen with Spectrum, we switch affected customers to it for the remainder of the day and flip it back to the East Coast once Spectrum is done playing games. I think replication will come in handy here.

I think the variable in the default settings for how many backups to keep could be helpful in some cases, depending on the size of their drives. I have plenty of storage, so your 2/7 setting is fine with me.

What would be nice to have is the ability to let's say install a fresh copy of FS and click restore button in the GUI, and it will let you select and/or upload backup files and just handle overall backup restore process for you or even within production instance, if you want to restore from couple days ago for some reason, would be nice just to click restore button and select file you want to restore from. I'm not very good with Linux commands personally, I can install a fresh copy of anything, it's restoring files using SSH that I can get lost sometimes. I managed to restore Fusion that way while learning, but it took me a day or two. I had to play with the DB password in different areas, which was pretty overwhelming for me. However, I got it done in the end. Don't drop everything and make this happen, just saying it would be nice to see it work like that one day.

Thank you for all the help you provide here, very much appreciated!
 
  • Like
Reactions: kenn10 and ou812
@sly I understand your point—a web-based restore feature would certainly be convenient. The challenge is that, by design, the web user (www-data) has very limited access outside of the web directory. That’s why many operations currently require SSH access. If we can design a secure method to achieve this in the future without compromising system security, we’ll definitely consider implementing it.

For your situation, what you really need is a standby failover server. With file and database replication between the primary and backup servers, you won’t have to worry about losing data. A simple DNS failover can automatically reroute phones to register with the backup server and then back to the primary once it’s available—without losing CDRs, voicemails, call recordings, or any settings applied during failover. Probes can be used to monitor health and direct traffic to the active node automatically.

We also offer professional support plans for added peace of mind. These include phone support, remote sessions, installations, implementations, and the design of robust, fault-tolerant VoIP networks, as well as any general troubleshooting for anything related to VoIP. So whenever you feel stuck, we’re only a phone call away.
 
What if we upload file via ssh/sftp to backup directly, meaning uploading files to be restored must happen that way in order to avoid www-data. Once files are there, or existing backups for that matter, you just select it from drop-down via gui and script hands rest, like asking questions about any needed DB passwords and giving hints where to find it. Something like that would be pretty good start for sure to make paranoid admins sleep better, not pointing fingers at anyone. Anyways, just an idea.

Seems like dns failover is something we will be looking into. Our end points would have to support that as that's the only way to detect specific ISP outage, and its not even internet outage, its just routing from this ISP to our data center only, everything else still functions.

We are aware of your support plans, they priced same as fusion and honestly you can even charge more if you wanted to because you're support is worth that. As mentioned earlier in DM, we are not there yet to be able to afford it but one day we will for sure.
 
Any process executed through the portal runs under the www-data user. Because this user has limited permissions, it does not have the access required to perform a full restore.

For your situation, a DNS failover will work very well. Health probes should be configured to check for valid SIP responses. If the primary server does not respond, traffic will automatically fail over to the secondary IP. Endpoints will then resolve to the new IP address. While most devices handle this transition smoothly, some may require a manual reboot to re-register. This type of automatic failover covers internet outages, data center issues, and server issues. Virtually any issue you may encounter should be resolved without any manual intervention. This gives you time to address the problem and then gracefully fail back to the main server.

This approach is widely used and proven effective for both small and large providers.
 
Any process executed through the portal runs under the www-data user. Because this user has limited permissions, it does not have the access required to perform a full restore.

For your situation, a DNS failover will work very well. Health probes should be configured to check for valid SIP responses. If the primary server does not respond, traffic will automatically fail over to the secondary IP. Endpoints will then resolve to the new IP address. While most devices handle this transition smoothly, some may require a manual reboot to re-register. This type of automatic failover covers internet outages, data center issues, and server issues. Virtually any issue you may encounter should be resolved without any manual intervention. This gives you time to address the problem and then gracefully fail back to the main server.

This approach is widely used and proven effective for both small and large providers.
This all makes sense, endpoint failover can be completely automated, now, more interesting part is how do we get DID failover to work since somehow our provider's end has to detect that specific endpoints lost of connections to the primary server, yet the provider still can ping our main server just fine and DIDs are pointed to it. This might be beyond my capabilities to make this happen, but that's okay. I can still handle everything manually, as I have been. Once or twice a year won't be a problem for me.
 
Normally, trunk providers also have a failover that you can set up in standby mode. They send an SIP INVITE to your main server, and if it doesn't respond, they send an INVITE to the backup server. Pretty much every provider has this option. Please don't confuse it with load balancing. That's something different. You need to set it up so the primary server is always tried first. Even if your outages happen once or twice a year, which is typical for a lot of FS PBX users, it's still a must-have feature.
 
Yes, of course, my DID provider has a failover server option, but I don't think it will help in my use case here.
Maybe I didn't explain this correctly, or I'm unsure what's happening, but my server is always on, so the provider's SIP invite won't fail over to the secondary server. It's the end endpoints that lose connection to my server; these endpoints use a specific ISP, Spectrum in my case, and the rest of the endpoints that use a different ISP stay connected. Yes, it's an odd issue, to be honest. Possibly, either Spectrum or the Data Center temporarily loses some backbone provider that is responsible for routing between Spectrum and the Data Center, but I'll blame Spectrum.
 
Oh my bad, I understand what's happening now. Yeah, server failover won't help the endpoints. You might want to think of a backup internet connection or manual switching is the only way you have left.
 
  • Like
Reactions: sly