Experimenting with a full load-sharing cluster

itia

New Member
May 29, 2020
22
2
3
USA
SRV
sip.aws.domain.com
10 50 5060 server-0.aws.domain.com
10 50 5060 server-1.aws.domain.com

I don't think you need this record... SRV records need to start with _sip._tcp as far as I've read. If you're trying to use it for http load balancing, I'm afraid no browsers honor this. Kind of sucks.

Thanks guys for mentioning that there can be multiple A records for the same domain address. This was very helpful.

Now it seems that just having the A records and the SRV records isn't enough... I'm just getting a round robin across the A records... (this is on one of the new Grandstream GRP2614 deskphones... even though I told it to specifically use SRV to resolve DNS).

I'm going to try a NAPTR record to see if this fixes it so that the SRV priority and weight parameters work.
 

gflow

Member
Aug 25, 2019
52
1
8
Ok thanks phonesimon. I tested this and can confirm that works. I was not expecting it to work when both servers happened to bind to the same IP on the internal profile however it works with calls going to both servers.

Now the only thing I can not figure out how to test is the client side "failover". Testing on Bria Mobile if I crash one of the servers during an active call the app doesn't seem to go next in list on the SRV. However hanging up the call and immediately placing another I hit the remaining up server.

Any settings DNS or Freeswitch wise to encourage the other SRV servers to pick up, and the client to try, the failed servers connection? Here are the DNS settings so far;

SRV
_sip._tcp.sip.aws.domain.com
10 50 5060 server-0.aws.domain.com
10 50 5060 server-1.aws.domain.com

SRV
_sip._udp.sip.aws.domain.com
10 50 5060 server-0.aws.domain.com
10 50 5060 server-1.aws.domain.com

SRV
sip.aws.domain.com
10 50 5060 server-0.aws.domain.com
10 50 5060 server-1.aws.domain.com

A
sip.aws.domain.com
1.2.3.4
2.3.4.5


Check out this link, it might help to keep the calls alive during failover: https://freeswitch.org/confluence/display/FREESWITCH/High+Availability#HighAvailability-TrackCalls
 

mydigitalself

Member
Oct 20, 2019
55
4
8
When you did this initial test, was it on a cloud provider such as AWS and did you need to bind to a non-local IP address?

I see in the logs that the xml_handler wants to send the call to the other server; however does not and nothing shows in sngrepo or tcpdump as an attempt.
 

Mikey

New Member
Feb 10, 2020
15
1
3
50
When you did this initial test, was it on a cloud provider such as AWS and did you need to bind to a non-local IP address?

I see in the logs that the xml_handler wants to send the call to the other server; however does not and nothing shows in sngrepo or tcpdump as an attempt.

AWS and initially I was binding to public IP. However I ended up switching to a private IP on the cluster and sticking OpenSips in-front of it as the only internet facing servers
 

kt351b

New Member
Feb 24, 2020
1
0
1
21
Test Results (so far)

Test setup:
  • a domain called "test.example.com" set up in FusionPBX
  • extensions 1000 and 1001
  • conference room 2000
  • DNS SRV for test.example.com pointing to node1 and node2 with equal weight/priority
Registered 1000 and 1001 and forced them to use node1 (1000) and node2 (1001) by specifying the proxy setting in the SIP client. So we are testing cross-cluster domain calls.

Works
  • extension-to-extension calls (both directions)
  • call hold/resume
  • blind transfer
  • attended transfer
  • conference
    • whoever starts the conference by dialing 2000 first hosts the conference on his node; when the other extension calls in, his call is routed over to that node to join the conference
  • call park and park retrieval
  • inbound calls from PSTN provider to either server (using DNS SRV pointing to external profile port 5080)
(More to come)
Thank you for your notes, they inspired me to try FusionPBX master-master at my new voip project. I faced some strange problem, maybe you could explain to me what am I doing wrong.
1) I want to install at server A FusionPBX with FreeSWITCH, at server B the same, and a database PSQL at server C.
So, servers A and B - FusionPBX, server C - database.
2) At server C I created databases from the install script, added rules to pg_hba.conf and so on.
3) Installed FusionPBX at servers A and B, changed the database credentials in config.sh script before installation. Installation was without errors. And checked that I can get SQL queries to the server C from servers A and B. Servers A and B use the same credentials (user fusionpbx and the same pass).
4) Done all as you wrote in comments #1-#6.
5) Now I register two extensions 1001, 1002 at the softphone at my PC and 1 extension 1001 at the softphone at my mobile phone. Two of them located at server A, another one at server B.
1000, 1002 - registered at server A
1001 - registered at server B
I can call from 1000 to 1002.
But I can't reach the extension 1001 registered at my mobile phone, I see the registration at FusionPBX Registration menu, see it in the freeswitch.registrations table, but got such error at fs_cli:

[ERR] switch_core_sqldb.c:1369 SQL ERR: [SELECT hostname FROM registrations WHERE reg_user = '1004' AND realm = 'fortest' AND to_timestamp(expires) > NOW()] no such function: NOW

When I do this request from the server's CLI, I got:
CLI: psql -h 10.20.30.1 -U fusionpbx -d freeswitch -c"SELECT hostname FROM registrations WHERE reg_user = '1001' AND realm = 'fortest' AND to_timestamp(expires) > NOW();"
Password for user fusionpbx:
hostname
-----------
fusion149
(1 row)

So, I can get this record from the server using credentials from the ${dsn}.
eval ${dsn} also shows me this variable from fs_cli.

But when I register all those extensions at my PC, everything works well! I tried to register from the mobile network (thought that it is because of NAT), tried different softphones (GS Wave, CSipSimple) and no luck.
I added to the /etc/hosts the names and ip-address of the servers as I see it in the database freeswitch.registrations.host.
I thought that I made a mistake while creating the database, installed the FusionPBX to another server, made SQL dump, and restored it at the database server, but no luck, the same problem occurs.

Sometimes it works, and I call make a call from the mobile softphone to my PC, but then this problem became again...
 
Last edited: