SOLVED Cluster issue: Inbound calls to wrong server

SilkBC · Jul 27, 2018

Hello,

I am playing around a bit with my two-node cluster and have actually started noticing that my inbound calls are coming in on my secondary PBX. If I reboot or disconnect the networking from the secondary PBX, then incoming calls go to my primary, but within a few minutes inbound calls start going into the secondary again.

I did recently upgrade the cluster from 4.2 to 4.4 and everything seemed to work fine (and I am quite certain incoming calls were coming in to the primary PBX no problem before), but I am puzzled why this would be the case.

I did not change anything with my SRV records; my primary PBX still has the lower priority number ("10", secondary has priority "20").

Thoughts on why his would be the case?

Thanks!

DigitalDaz · Jul 28, 2018

I have now switched to Route53 failover of A record instead.

But this should absolutely not be a PBX issue. If the caiier is not sending them to the primary then they need to tell you why. I doubt very much the call is being rejected at the PBX but if it is, it should be in the logs.

SilkBC · Sep 1, 2018

DigitalDaz said:
If the caiier is not sending them to the primary then they need to tell you why. I doubt very much the call is being rejected at the PBX but if it is, it should be in the logs.

I spun up a brand new 2-node cluster today and the exact same behaviour is happening. fs_cli running on both nodes shows no attempt to put the call on the primary node and instead lands on the secondary node. On the primary node, if I go into "Status > SIp Status" and restart the external profile, calls start coming in to the primary node again, but as before, shortly thereafter, calls start going onto the secondary node instead.

My testing SIP trunk is through VoIP.ms. Registration is via username and password. Since this info is kept in the database, I imagine it is getting synced over and causing the conflict. Is it recommended in the case of a 2-node cluster to register the SIP trunk by IP authentication instead of username/password? I think VoIP.ms has an either/or option.

SilkBC · Sep 1, 2018

OK, after playing with this a bit more, I *think* I stumbled upon the solution: I need to have the gateway stopped on the secondary node, then it will stop trying to reg. I didn't realise you could have the gateway stopped on one node while still running on the other (I thought maybe the status was inherited via database sync)

Anyway, I have stopped the gateway on node 2 and node 1 gateway is showing "REGD", so I will keep an eye on it and see if the gateway on node 2 starts up on its own.

A manual failover step would be to start the gateway on node 2, OR, there could possibly be a script that runs every minute pings node 1 (maybe sends 10 pings), and in the case of it being completely down (100% packet loss), starts the gateway(s) from the CLI, then when node 1 comes back up, fires off a CLI command to stop the gateway(s)...?

DigitalDaz · Sep 1, 2018

No, registration is no good in these scenarios, and don't mess with ip auth either. Just leave the trunk with username/password but set register to false.

In voipms you will see a facility to create sip uris for your dids, use that to send them to did@domain:5080

krooney · Sep 3, 2018

Hi @DigitalDaz i created sip uri as per example did@domain:5080 and changed gateway to not register but the call is not hitting my pbx any other setting that needs to be changed in order to use sip uri for inbound calls

DigitalDaz · Sep 3, 2018

No, did you change the actual did at voipms to use the new SIP URI you created? You don't just create the URI you then choose it as the method to use for that particular did.

krooney · Sep 3, 2018

Thank you for the quick response i had just found it thanks!

inform11 · Sep 6, 2018

SilkBC said:
OK, after playing with this a bit more, I *think* I stumbled upon the solution: I need to have the gateway stopped on the secondary node, then it will stop trying to reg. I didn't realise you could have the gateway stopped on one node while still running on the other (I thought maybe the status was inherited via database sync)

Anyway, I have stopped the gateway on node 2 and node 1 gateway is showing "REGD", so I will keep an eye on it and see if the gateway on node 2 starts up on its own.

A manual failover step would be to start the gateway on node 2, OR, there could possibly be a script that runs every minute pings node 1 (maybe sends 10 pings), and in the case of it being completely down (100% packet loss), starts the gateway(s) from the CLI, then when node 1 comes back up, fires off a CLI command to stop the gateway(s)...?

try the command on the secondary node :
sofia global standby on
this will stop the node, but when you enter the command sofia global standby off instantly revive it.
I have so work trunks with registration without problems.

I use UCARP

Ucarp script up:
#!/bin/sh
/sbin/ifup $1:ucarp
/sbin/ifup eth1:0
/usr/bin/fs_cli -x 'sofia global standby off'
/usr/bin/fs_cli -x 'sofia recover'

Ucarp script down:
#!/bin/sh
/sbin/ifdown $1:ucarp
/sbin/ifdown eth1:0
/usr/bin/fs_cli -x 'sofia global standby on'

you can use keepalived instead of ucarp

SilkBC · Sep 11, 2018

inform11 said:
try the command on the secondary node :
sofia global standby on
this will stop the node, but when you enter the command sofia global standby off instantly revive it.
I have so work trunks with registration without problems.

What ramications does that command have on the slave? Does it just affect the trunk registration? Would the devices that are set to fail over to the slave still be able to register, regardless (just wouldn't be able to make/receive calls until the trunk is registered, obviously)

I assume all the other synchronisations would still occur from the master, otherwise?

inform11 said:
I use UCARP

I haven't got to that sophistication yet, but would definitely be more elegant than my proposed ping script

SilkBC · Sep 11, 2018

inform11 said:
I use UCARP

I just realised that I don't think UCARP or Keepalived would probably work in the environment I would plan on having the cluster working in. UCARP and Keepalived seem to assume the two servers are in the same datacenter/network segment but each member of the cluster would in fact be located at different datacenters.

inform11 · Sep 18, 2018

SilkBC said:
I just realised that I don't think UCARP or Keepalived would probably work in the environment I would plan on having the cluster working in. UCARP and Keepalived seem to assume the two servers are in the same datacenter/network segment but each member of the cluster would in fact be located at different datacenters.

Well, then that's not your option. I have a High Availability cluster.
https://freeswitch.org/confluence/display/FREESWITCH/High+Availability

inform11 · Sep 18, 2018

SilkBC said:
What ramications does that command have on the slave? Does it just affect the trunk registration? Would the devices that are set to fail over to the slave still be able to register, regardless (just wouldn't be able to make/receive calls until the trunk is registered, obviously)

I assume all the other synchronisations would still occur from the master, otherwise? - yes

Master makes changes to the registration and call status database. Slave will also make changes if you do not put it in standby. Slave would interfere with the operation of the cluster.

Search

Search

SOLVED Cluster issue: Inbound calls to wrong server

SilkBC

Member

DigitalDaz

Administrator

SilkBC

Member

SilkBC

Member

DigitalDaz

Administrator

krooney

Member

DigitalDaz

Administrator

krooney

Member

inform11

New Member

SilkBC

Member

SilkBC

Member

inform11

New Member

inform11

New Member