what type of HA is mostly everyone doing

djacob · Aug 10, 2021

Hey All,

Trying to design the system and figure out what kind of HA to do. wanted to do domain based load balancing but it seems from here there is no one really doing it anymore. is there a reason why?

Is it the presence issue?

Does it even work anymore?

what are your network layouts?

I see in the member section of fusion there is some info on dom based load bal. but nothing up to date or in there for the kamailio setup. I asked Mark about it and he said its not really something he supports since its not fusion. I can understand that, its a addon to his platform.

I have made the script work with Kam 5.4. will be testing the presence tomorrow. i only have 2 phones up and 2 fusion boxes.

Just want to see if anyone is doing it anymore or not.

Thanks
Dave

DigitalDaz · Aug 10, 2021

Hi Dave,

No reason to believe Kamailio wouldn't scale it out.

I'm personally a much smaller provider and with hardware being relatively cheap these days, I just do not find the need for Kamailio.

I just use pairs of servers and route53 to switch over.

pbz · Feb 16, 2022

DigitalDaz said:
I just use pairs of servers and route53 to switch over.

Do you give each tenant/domain their own subdomain? For example t1.domain.com, t2.domain.com or do you use multi-tenant setups with subdomains like west1.domain.com, west2.domain.com.

Also how fast do route53 dns changes take effect? I find that with DNS even if you set the TTL very low you have to worry about every other DNS server caching the data for much longer than your TTL. So when I have made DNS changes in the past to live server there is a painful propagation period of a few hours. This propagation period seems like a problem if you are trying to use it for failover.

DigitalDaz · Feb 16, 2022

I user the t1.domain.com, t2.domain.com scenario

I set all my ttls to 120 and route53 just uses a records. If the primary fails, the primary a record is removed. I have never had any problem with failover DNS wise.

pbz · Feb 22, 2022

Are you using the Route53 health checks for automatic failover or are you doing it manually?

DigitalDaz · Feb 22, 2022

I use route 53 health checks.

gflow · Feb 23, 2022

I've just about to start testing out Telium https://telium.io/en/high-availability-for-freeswitch/

OhSeeGee · Feb 23, 2022

I think people have different definitions of HA. Sometimes HA is confused with load balancing. To me HA means one box taking over telephony services when the other box fails. But the devil is in the details.

There are lots of simple tests like check process running, OS alive, external ping, (route 53) you can do with scripting...though when you test enough stuff that script/code gets pretty big. The bigger problem is when freeswitch fails but the box/process/OS are still there ticking away. Great example is OS running out of file handles. Calls may not bridge anymore but to external checks everything looks ok. (That's also why generic heartbeat + cluster tools aren't terribly useful for telephony specific outages).

IMHO, HA means serious health checking. Next, intelligent negotiating between the 2 boxes should they both come up, or alternate failures, etc. I see STONITH was popular for a while but I don't agree with it. If this is for a small install then this may not matter, but preventing one dying PBX from corrupting the other is a big deal in large call centers. Keeping the boxes in sync without the dying one corrupting the healthy one is a tough nut to crack.

So route 53 health checks are a great start. You really have to figure out the cost/impact of an outage, and scale your solution to meet the needs of the customer. For large/critical call centers commercial solutions may be appropriate, for a mom & pop shop (or home use) some DIY scripting or route 53 health check are probably ample.

DigitalDaz · Feb 23, 2022

For me, route53 is perfectly adequate. I also have other health checks going on with Nagios doing sip options pings, disk space test etc, etc,

It has server me well for a few years now. When I do maintenance I usually get a very smooth failover, I may have to use IP tables to block ping access to a few stragglers but that's been the only problem, the same with failing back over.

I think that far more important is reliable hardware and datacenters.

OhSeeGee · Feb 23, 2022

I worked with an ITSP who built their own HA software (running on hundreds of clusters). It worked ok, but constant effort to upgrade, maintain, fix, etc. made it really uneconomical. And every time their HA software failed they updated it for the new "use case". When you run 500-1000 PBX's you discover all the interesting ways in which telephony services can fail. And you see why simplistic solutions don't cover all the possible scenarios. When you have a legal requirement to deliver phone services (in case of emergency etc) I think the penalties can be pretty big too. Their hardware was excellent (and even had a lot of multi-cloud clusters) - but node failure was rarely due to hardware failure.

Eventually they gave up and bought a commercial product. But it has to be worthwhile for your case. If this is your own cluster, or phones are not critical (you're not losing a thousand dollars a minute with an outage), I think DIY is a a reasonable approach. It's always hard to spend $ for an add-on for a FOSS product. (At least in principal). But business realities kick in, and like the ITSP the right decision just hit them in the wallet.

I too avoid commercial products if FOSS does the job. I resisted MS Office for a long time and tried to convince our own company that Libre Office was good enough. But I have to admit that the accumulation of little technical issues caused by the free option was costing us more than just buying MS Office. I'm actually a champion of FOSS...and no fan of certain big tech companies....but I swallow the pill and hand over the money when necessary.

Search

Search

what type of HA is mostly everyone doing

djacob

Member

DigitalDaz

Administrator

pbz

New Member

DigitalDaz

Administrator

pbz

New Member

DigitalDaz

Administrator

gflow

Active Member

OhSeeGee

New Member

DigitalDaz

Administrator

OhSeeGee

New Member