HA Across the Country

ict2842 · Mar 23, 2021

I am not sure how to do it yet, and it is very likely NY will be set up down the road, but how does this look? Heck, I barely understand what Kamailio does yet, but it will be toyed with and learned eventually.

Scubadave112 · Mar 30, 2021

My two cents, feel free to ignore if needed

its not HA but i have around 400 users on server A (VPS w/ DO) and using BDR/Syncthing with server B (VPS w/ DO in diff region) as documented in the membership docs. I am using AWS Route 53 for failover so I have domain phones.contoso.com pointing to server A and if route 53 goes 1 minute without being able to see it, then it redirects all queries to server B. For all my customer domain I use CX1.phones.contoso.com, CX2.phones.contoso.com ect... and those all point to the phones.contoso.com. So when i set this up with about 100 lite users to make sure everything works and i passed me test. So when i test I take Server A down and withing 3-5 minutes I will see all the phones registered to Server B and can make and receive test calls and everything

So this is isn't HA but i seriously doubt it would result in more than 5 minutes of downtime and if it did, I would just make up some BS to the customer about how they may have had a network interuption that caused the issue and just ask them to reboot their phones (Clear their dns cache) and then it will connect just fine.

I only have two customer who run call centers and they wouldn't complain about 3-5 min of down time. Also both server are hosted in DO and I have yet to see a single outage for even 1 second with my DO server....

I would love to have a real HA solution but I don't have the customers who can justify the effort to set this up so Server A can fail and kick over to Server B without interupting active calls or anything like this.

One other thing I want to point out is that i use skyetel and Server A and B are point of an endpoint group and the phone number all point to that group so everything goes to both servers

ict2842 · Mar 31, 2021

Scubadave112 said:
So when i test I take Server A down and withing 3-5 minutes I will see all the phones registered to Server B and can make and receive test calls and everything

To make sure I am understanding this part correctly: the phone has to send another DNS query for the A record in order to register on the second server? What if there is DNS caching somewhere along the route from the phone?

Scubadave112 · Mar 31, 2021

ict2842 said:
To make sure I am understanding this part correctly: the phone has to send another DNS query for the A record in order to register on the second server? What if there is DNS caching somewhere along the route from the phone?

I keep my Ttl at 30 seconds and during my testing I did notice for some reason some odd ball customer like only a couple they never updated, which was resolved. So I didn’t troubleshoot why I just assumed something was not updating so I just provisioned the phones to use 1.1.1.1 and 8.8.8.8 as the dns and the issue never happened again.

i also want to say the logic of using cx1.phones.contoso.com pointing to phones.contort.com was that both those records are in aws and when phone. Redirects to the secondary server in event of a failure is that the CX1.phones.* never changes so less dependent on other variables to propagate the DNS change.
I’m not a network engineer but it sounded logical in my head and has worked out great. I have done multiple tests and have yet to recognize any serious issues

Scubadave112 · Mar 31, 2021

S

Scubadave112 said:
I keep my Ttl at 30 seconds and during my testing I did notice for some reason some odd ball customer like only a couple they never updated, which was resolved. So I didn’t troubleshoot why I just assumed something was not updating so I just provisioned the phones to use 1.1.1.1 and 8.8.8.8 as the dns and the issue never happened again.

i also want to say the logic of using cx1.phones.contoso.com pointing to phones.contort.com was that both those records are in aws and when phone. Redirects to the secondary server in event of a failure is that the CX1.phones.* never changes so less dependent on other variables to propagate the DNS change.
I’m not a network engineer but it sounded logical in my head and has worked out great. I have done multiple tests and have yet to recognize any serious issues

orry for grammar spelling mistakes, typing on phone while 1 year old smacking my face

ict2842 · Mar 31, 2021

Scubadave112 said:
i also want to say the logic of using cx1.phones.contoso.com pointing to phones.contort.com was that both those records are in aws and when phone. Redirects to the secondary server in event of a failure is that the CX1.phones.* never changes so less dependent on other variables to propagate the DNS change.
I’m not a network engineer but it sounded logical in my head and has worked out great. I have done multiple tests and have yet to recognize any serious issues

It makes sense. Can you used a wildcard for CNAMEs? That'd make it super easy to do...
I need to work on BDR and once I have Dallas working I can mess with more stuff, like DNS.

Scubadave112 · Mar 31, 2021

ict2842 said:
It makes sense. Can you used a wildcard for CNAMEs? That'd

I guess I can test tonight and report back I just can’t imagine why you would use a wild card. Every tenant gets a domain so I can’t justify a reason for using wildcard. But In order to answer your question. I put kids down in about 3 hours and then I can lab out and report back

ict2842 · Mar 31, 2021

Scubadave112 said:
I guess I can test tonight and report back I just can’t imagine why you would use a wild card. Every tenant gets a domain so I can’t justify a reason for using wildcard. But In order to answer your question. I put kids down in about 3 hours and then I can lab out and report back

I can test using CloudFlare. The reason I want to use a wildcard is to cut down on the number of records and to cut down on my work for creating and removing a client

ict2842 · Apr 8, 2021

I tried to follow the Docs for the Active/Passive DB and to modify the install as needed to only install the components needed for the PBX VMs and DB VMs.
I have HA Proxy setup and am currently getting the error `FATAL: password authentication failed for user "fusionpbx" FATAL: password authentication failed for user "fusionpbx". I modified the config at /etc/fusionpbx/config.php to include the proper password, yet it still fails. I am entering the password in plain text, just as I am setting it inside Postgres...is that an issue?

ict2842 · Apr 8, 2021

At this point, I am open to paying someone to do the installation so it is done right. I reached out to Fusion but 8 days later I still have no response from them...

Scubadave112 · Apr 8, 2021

i mean i dunno why this is such a pain, I literally just followed the member videos you get with the cheapest membership and I had things up and running within two hours from two fresh installs. Literally no technical knowledge required if you just follow the steps.

Also you are using the specific phrase "VMs" and not "VPS" are you using VMs to do this through your own hypervisor like ESXi or Hyper-V, both of with come with their own HA technology which is literally 10x easier than this and more reliable. I mean a failover cluster is braindead easy with 100s of youtube videos. The only reason i recommend this method is because I am using VPS (Hosted cloud VMs through DO, AWS and Tiernet).

If you want to tell me what your willing to pay lemme know and I spin up two server for you tomorrow on a DO account in two diff regions, spin up a AWS account with Route 53 and make sure that you or your end users never experience more that 5 min of down time

ict2842 · Apr 8, 2021

The Dallas servers will run on two servers I have colocated, running Proxmox. The part I am struggling with here is separating the database to be its own "entity".
I don't have servers in NY, so I would get two VPSs from a provider and months down the road migrated the VPSs to VMs on my own servers that I send out.

I've gone through the documentation and watched most of the March 2019 videos, I didn't see anything about installing the DB separately. It's one install script. I commented out parts and ran it on each of the four VMs (2 FreeSwitch and 2 DB) so that I wouldn't be installing the obviously unneeded components.
Like you said...it shouldn't be this difficult and it should only take a few hours from start. If I left everything on one server, it would likely work without issue, but I would only come back and separate them later (whether in a few weeks or months).

I modified the config at /etc/fusionpbx/config.php to include the proper password, yet it still fails. I am entering the password in plain text, just as I am setting it inside Postgres...is that an issue?

I modified the password in the file and the two DBs to use a shorter password of [a-z], [A-Z], and [0-9] and it still failed. I modified the resources/require.php file to tell me which config file was being used, and /etc/fusionpbx/config.php is indeed being used. I am lost on why it won't take it.

As for someone setting it up, there is no magic number, it's what you want to charge me and if I can justify the amount.

Scubadave112 · Apr 8, 2021

OHHH i appologize I don't know how i missed it but didn't realize you were trying to seperate the DB from fusion, i know I have seen people discuss that they have done this before on the forum, however, I don't believe I have ever see a thorough step by step on how to do that anywhere. I used to wonder why people do this and I believe it is required if you want True HA (unlike my setup which is poor mans HA, LoL). I wish I wasn't in the middle of another project otherwise I would do this with you. in my labs now, but Im roped into a cloud storage project for Security Cams.

I may look at this next week and I'm thinking of upping my membership to purple then I can use my three hours of hand on support to get some official assistance with this. We are doing some big pitches next week and if I could officially say I have a setup where my servers can fail in middle of calls that there calls wouldn't drop that would be amazing

ict2842 · Apr 8, 2021

Scubadave112 said:
OHHH i appologize I don't know how i missed it but didn't realize you were trying to seperate the DB from fusion, i know I have seen people discuss that they have done this before on the forum, however, I don't believe I have ever see a thorough step by step on how to do that anywhere. I used to wonder why people do this and I believe it is required if you want True HA (unlike my setup which is poor mans HA, LoL). I wish I wasn't in the middle of another project otherwise I would do this with you. in my labs now, but Im roped into a cloud storage project for Security Cams.

The reason I want to separate it is two: Mark suggests it and it allows the switch and database to be scaled independently. Like I said, if I don't set it up like this today, I would do it down the road with active clients which may not end well and would cause downtime. To avoid the future maintenance window and because I own the servers, I am not worried about having another VM for the databases. The cost is there no matter how I do it.

I think I bricked the servers trying to figure things out, so I'm likely going to start over again and try things a bit differently.

Scubadave112 · Apr 8, 2021

ict2842 said:
The reason I want to separate it is two: Mark suggests it and it allows the switch and database to be scaled independently. Like I said, if I don't set it up like this today, I would do it down the road with active clients which may not end well and would cause downtime. To avoid the future maintenance window and because I own the servers, I am not worried about having another VM for the databases. The cost is there no matter how I do it.

I think I bricked the servers trying to figure things out, so I'm likely going to start over again and try things a bit differently.

i mean how big of an environment do you think you would scale to 1 year from launch, i mean right now people are pushing these things with 500+ concurrent on small servers so i mean what kinda numbers you predicting you will encounter.. also i could very well be wrong about this but isn't there some sort of 1000 concurrent php limitation or something... i dunno right now im at around 500 end users on my main server (i build dedicated for my big customers) and it never has more than 20 concurrent so if I get to a point where I even start to see 150-200 concurrent i will have an insane budget to to hire developers to make me a customized solution lol with a big smile on my face and down time would be min to migrate everyone to new server. I have just completed a migration couple months ago. I just migrate everything, test and then 2am point their dedicated domain to new server and as long as you ensure short TTLs on all your domains it is minimal downtime.

but this isn't to say this would be propper for your needs just my take on why i don't concern myself with this issue.... but if you got the time and resources to do everything scalable the first time then go for it. I just like compartmentalization and seperation of all these things so also im not dependant on a single topology for whole environment because I don't trust myself to properly maintain and properly build. with all the diff server i have I won't be able to break everyone and everything so easily, LoL

ict2842 · Apr 8, 2021

Scubadave112 said:
i mean how big of an environment do you think you would scale to 1 year from launch

Small. I don't see myself being big anytime soon, but migrations are always a pain.

Also, I plan to target smaller businesses and not the large businesses/call centers. I have put a little bit of thought into multiple clusters, but I am so far away from that I'd like to forget about it. But hopefully having the three servers will stop me from breaking it all because I do have that habit.

Scubadave112 · Apr 8, 2021

well if thats your plan, I would only suggest you don't let this desire get in your way of starting sooner than later. when i first started (2 years ago) i knew nothing. The guy I knew asked if I can help him get a new phone system for 20 users. I said I would charge 25/month per handset (cause 8x8 only charged me 18) and I would just put the account in my name and make a new bill for them and take my cut off the top. I told my friend he told me about fusion. I bought a dell R410 off ebay, colo for 300/month (1/4rack). Customer was thrilled cause I offered them human, hands on support for everything. Fast forward to now, I just quit working for Microsoft (3 weeks ago) do this full time and am just now starting my website and everything else. I guarantee to beat anyone with similar billing structure by 10% and usually onboard 1 new customer a month (in three years we only lost 2 customers). My point is don't get caught up trying to create a perfect system that will never fail because in the end the best use of your time is to just sell. Create two servers in two regions with BDR and use DDNS for failover and just be done with it, make that money then make your perfect system..

I say this because in the end your system can still fail.... last month skyetel who we have all 900+ of our CX phone numbers with went down for half a day, something 100% out of our control failed for half a day and suprisingly our customers were annoyed but generally most of our customers don't even remember it today (I thought we were going to lose our biggest customer, LoL). When working at microsoft last month we had insane issues with zero day but suprisingly most people were chill, same-thing a year ago, all azure west coast dns servers went down, most people don't even remember, lol. as angry as they get at the end of the day people understand shit happens.

ict2842 · Apr 9, 2021

Scubadave112 said:
well if thats your plan, I would only suggest you don't let this desire get in your way of starting sooner than later.

You win. I have two servers in Dallas setup and I am at the 2 hour mark of video 2 trying to see what other goodies I should do to the servers. I had a bit of hiccups, caused by running commands incorrectly or in the wrong order/wrong server but I've decided to take snapshots along the way to give me easy restore points...I've had enough of reinstalling Debian for 2021.

ict2842 · Apr 11, 2021

@Scubadave112 Sorry, I am working on this when I have the time and not in a straight shot...
Did you ever encounter the error "error: SQLSTATE[08006] [7] server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request." I am getting it when accessing FusionPBX through a browser after configuring haproxy and switching /etc/fusionpbx/config.php from port 5432 to 5433. If I switch it back to 5432, everything works. I am looking through the forums and am only seeing one person from 2017 or so that had the error, but in a different place. I am trying to sort through things and find the solution to try out.

# netstat -ant | grep 5433
tcp 0 0 127.0.0.1:5433 0.0.0.0:* LISTEN

I can confirm BDR is working because any modification on one server is replicated to the second. I also got syncthing working after some trouble and referencing the documentation for additional assistance.

--TL;DR--
I looked over every part of the haproxy.cfg except for the ports I was using....I mistyped them. Issue resolved.

Scubadave112 · Apr 22, 2021

ict2842 said:
@Scubadave112 Sorry, I am working on this when I have the time and not in a straight shot...
Did you ever encounter the error "error: SQLSTATE[08006] [7] server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request." I am getting it when accessing FusionPBX through a browser after configuring haproxy and switching /etc/fusionpbx/config.php from port 5432 to 5433. If I switch it back to 5432, everything works. I am looking through the forums and am only seeing one person from 2017 or so that had the error, but in a different place. I am trying to sort through things and find the solution to try out.

I can confirm BDR is working because any modification on one server is replicated to the second. I also got syncthing working after some trouble and referencing the documentation for additional assistance.

--TL;DR--
I looked over every part of the haproxy.cfg except for the ports I was using....I mistyped them. Issue resolved.

So sorry for the late reply, no I never encountered this issue before i literally followed the membership videos step by step and didn't run into any issues except for when i fat fingered a command, sorry man.

HA Across the Country

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member