Extensions register/unregister at begining of work time

Status
Not open for further replies.

M.GH

New Member
Jan 28, 2020
10
0
1
Hi.
I'm supervisor of network services in my office.
There is about 8 months that i'm working on migrating from our commercial voip product to opensource fusionpbx.
My team and I did all tasks and test them but there are two times that when we switch to fusionpbx as voip server and when work time starts, all extensions(endpoints) start to unreg and reg and different types of errors occur. I realy don't know where is the problem.
My system and configuration specifications are:
- Debian 9.1 X64
-Server: HP
- Ram: 16G
- CPUs: 8
Extensions: 2914
Phones: Yealink T19-T20-T22-T27-T28-T53 , ...
- Provisioning phones through DHCP and tftp (setting their sip server and expire , ...)
Features:
- IVRs
-Queues
- SRTP (rtp_secure_media=optional and SRTP=compolsury in phone web interface)
- codec= alaw,ulaw
- BLF
- aut-acl per extension to limit ip address of endpoint and extension number(ex: 172.21.6.6/32)
- 4 gateway (2 E1 gw, 2 FXO gw)
- setting prefix and gateway per extension based on it's number (scripting in dialplan)
- implementing Boss/Secretary feature and valid lists to call bosses (scripting in dialplan and defining boss and secretary number and valid list to call boss per boss)
- Inbound routes (one rule for all extensions DID and multiple lines for FXOs)
- connecting fusion and gateways via same network adapter which clients register (172.20.1.8)
- All gateways and phones and fusion server have local ip address
- Disabled Fail2ban adn iptables
- Changing Max sessions to 150000 and Max concurrent per sec to 30000 (switch.conf.xml)
- No Nat and UDP in phone web interface
- Expire: 60 in phone web interface
- But in Registeration page of fusion interface I see expression "UDP-NAT (unknown)" in front of each extension row

PROBLEM: I switch to fusion server via DNS at middle night and all extensions after about 1 hour register to this server. But tomorrow morning (7 am) when work time starts, after about 1 hour at 8 am, number of registered extensions reduce and increase frequently and calls drop with different cause errors (forbiden, user_not_register, ...). and there are many IPPhone@172.a.b.c logs instead of EXT@IP.
whould you help me please to solve this problem. I really confused that what is the problem.
 

Adrian Fretwell

Well-Known Member
Aug 13, 2017
1,388
364
83
First, when registrartions start to drop, check your fail2ban logs, you could have a rouge device that is causing your IP to be put in a drop list (f2b jail).

iptables -nvL will show you at a glance what jails are active, what IP address are in the jails and how may packets have been matched by any given rule.

You can also use the fail2ban tools like:
fail2ban-client status to show the jails
fail2ban-client status freeswitch-ip to show the status of a single jail.
 

M.GH

New Member
Jan 28, 2020
10
0
1
Hi adrian. thanks for your reply but I disabled iptables and fail2ban and all extensions were registered at night but in the morning they started to unreg and reg again.
 

Adrian Fretwell

Well-Known Member
Aug 13, 2017
1,388
364
83
Hmm... I wonder if it could be associated with load, do you know what the loading is when the problems start? Do you ever see entries in the freeswitch log (depending on loglevel) similar to:
2020-01-26 13:04:12.677862 [WARNING] switch_core_db.c:92 SQLite is BUSY, ...
 

M.GH

New Member
Jan 28, 2020
10
0
1
Hi guys.
PLEASE HELP ME!!!!
Sorry for delay in my response. There was no busy error about database in logs.
I wrote a script to count each extension's REGISTER and INVITE logs and all of extensions had at least 4 REGISTER logs in that period of 2 hours. for exaple my extension 3242 logs was like below:
The first line is for the time that I restarted freeswitch (without iptables and fail2ban) and it's normal but I don't know the reason for the other lines:
James! I don't know how should I extract phone logs and these are from freeswitch logs on those days:

2020-01-22 06:43:10.772757 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81
2020-01-22 07:58:26.612759 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81
2020-01-22 08:09:50.932823 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81
2020-01-22 08:10:37.072758 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81
2020-01-22 08:12:18.192755 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81
2020-01-22 08:33:47.392832 [WARNING] sofia_reg.c:1793 SIP auth challenge (REGISTER) on sofia profile 'internal' for [3242@voip.domain.com] from ip 172.21.6.81

Also there were a lot of ACL logs which occured in a short and frequentlyperiod of time. Are they OK? (I just paste part of them for my phone)
NOTE: I have restricted each extension to it's phone Ip address with "auth-acl" in extenen's variables

2020-01-22 08:16:52.292758 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:16:57.212759 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:08:15.112760 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:08:24.692776 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:08:51.712814 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:08:58.792819 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:09:16.232757 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:10:31.932821 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:10:37.812755 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:10:44.212759 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:11:55.372758 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:12:20.132757 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:12:40.532759 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:13:20.232759 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:04:08.132760 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:04:08.292792 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:04:28.012760 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:04:38.232825 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:04:57.732758 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:05:08.652904 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:05:09.532760 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
2020-01-22 08:05:29.612756 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
.......

==============================================================
and output of show registerations was like below:

Call-ID: 0_1694769552@172.24.6.241
User: 6983@voip.domain.com
Contact: "6983" <sip:6983@172.24.6.241:5060;line=f43cf97e42ee605;fs_nat=yes;fs_path=sip%3A6983%40172.24.6.241%3A5060%3Bline%3Df43cf97e42ee605>
Agent: Yealink SIP-T23 44.80.0.5
Status: Registered(UDP-NAT)(unknown) EXP(2020-01-21 16:39:40) EXPSECS(103)
Ping-Status: Reachable
Ping-Time: 0.00
Host: voip
IP: 172.24.6.241
Port: 5060
Auth-User: 6983
Auth-Realm: voip.domain.com
MWI-Account: 6983@voip.domain.com

Call-ID: 1439376350@172.21.70.151
User: 2541@voip.domain.com
Contact: "2541" <sip:2541@172.21.70.151:5062;fs_nat=yes;fs_path=sip%3A2541%40172.21.70.151%3A5062>
Agent: Yealink SIP-W52P 25.50.0.20
Status: Registered(UDP-NAT)(unknown) EXP(2020-01-21 16:39:32) EXPSECS(95)
Ping-Status: Reachable
Ping-Time: 0.00
Host: voip
IP: 172.21.70.151
Port: 5062
Auth-User: 2541
Auth-Realm: voip.domain.com
MWI-Account: 2541@voip.domain.com
=========================================================
which part of my works is wrong?
- I don't used NAT on the phones and all phone's serverexpire is set to 60 with 3 retry but what is the meaning of diffrenet EXPSECs and (UDP-NAT)?
I used :
- auth-acl per extension to restrict them.
- 172.0.0.0/8 in domain's ACL (for my phones). I don't know this is important or not
- srtp
-codec: PCMA,PCMU
- BLF (but I disabled *8 in dialplan beacause of possiblity of misconfiguration on some phones) but they can see BLF light without transfer the call and answer it.
and alot of INVITE logs for them.
- I don't know where does WMI account is used and what is it's functionality.(there are settings in our phone whith current voip system and set by it's company)
.....

PLEASE HELP ME !!!!
thanks.
 

bdmonsey

Member
Jul 23, 2019
146
6
18
42
What router are you using for this network? is sip alg enabled?
did you try extension on TCP?
 

M.GH

New Member
Jan 28, 2020
10
0
1
Hmm... I wonder if it could be associated with load, do you know what the loading is when the problems start? Do you ever see entries in the freeswitch log (depending on loglevel) similar to:
2020-01-26 13:04:12.677862 [WARNING] switch_core_db.c:92 SQLite is BUSY, ...

Hi Adrian.
As I mentioned in my last post there was no BUSY log about db.
But what does this line mean:
2020-01-22 08:16:52.292758 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
there are a lot of logs like this in that period of time (per each extension's ip). Do you know when this log ocure? thanks
I set 172.21.6.81/32 for my extension (3242) aut-acl field in fusionpbx to restrict this extension to it's IP. Is this approach OK for restricting extension to ip?
 

Adrian Fretwell

Well-Known Member
Aug 13, 2017
1,388
364
83
Hi,
2020-01-22 08:16:52.292758 [DEBUG] sofia_reg.c:3043 IP [172.21.6.81] passed ACL check [172.21.6.81/32]
You are seeing this message because you have set auth-acl per extension. There is nothing wrong with this, but your extensions are on a RFC1918 private subnet so what is the risk of not having it?

To answer your other question
I don't used NAT on the phones and all phone's serverexpire is set to 60 with 3 retry but what is the meaning of diffrenet EXPSECs and (UDP-NAT)?
EXPSEC is the number of seconds left before the registration will expire. The UDP-NAT flag will appear if the phone advertises a different IP or port to the actual IP and port that the registration request was received from. (UNKNOWN) just refers to the presence status.

My guess at the moment is that your issues are being caused by the loading of your system. As you say, everything is OK until people start coming in to work. How many endpoints do you have?

Each SIP profile handles SIP messages with a single CPU thread, this is a limitation in the Sofia SIP library. With a lot of SIP messages there can be a bottle neck here. You could try to get a packet capture while the issue is going on and examine it for SIP retries, if you see a lot, then you do have a bottleneck.

You could try setting the registration expiry to 900 instead of 60, this will reduce some of the registration traffic. A phone (depending on manufacturer) will normally try to re-register when half of its registration expiry time has elapsed.

If you do prove you have a bottleneck with SIP messaging, consider creating more internal SIP profiles on different ports and splitting the phones between them.
 

M.GH

New Member
Jan 28, 2020
10
0
1
Hi,

You are seeing this message because you have set auth-acl per extension. There is nothing wrong with this, but your extensions are on a RFC1918 private subnet so what is the risk of not having it?

To answer your other question

EXPSEC is the number of seconds left before the registration will expire. The UDP-NAT flag will appear if the phone advertises a different IP or port to the actual IP and port that the registration request was received from. (UNKNOWN) just refers to the presence status.

My guess at the moment is that your issues are being caused by the loading of your system. As you say, everything is OK until people start coming in to work. How many endpoints do you have?

Each SIP profile handles SIP messages with a single CPU thread, this is a limitation in the Sofia SIP library. With a lot of SIP messages there can be a bottle neck here. You could try to get a packet capture while the issue is going on and examine it for SIP retries, if you see a lot, then you do have a bottleneck.

You could try setting the registration expiry to 900 instead of 60, this will reduce some of the registration traffic. A phone (depending on manufacturer) will normally try to re-register when half of its registration expiry time has elapsed.

If you do prove you have a bottleneck with SIP messaging, consider creating more internal SIP profiles on different ports and splitting the phones between them.

-------------------------------
Adrian! Thanks for your perfect answer. there are about 2700 phones in my office. I changed Max sessions to 150,000 on those days. is it good or bad ? and my next question is that, what does happen when number of sessions reach max session. (realy I don't understand max session meaning. max session for what period of time? and what does happen when it reach max ?) because there were more than 10K sessions per half an hour in my office.
Thanks for your attension.
 

Adrian Fretwell

Well-Known Member
Aug 13, 2017
1,388
364
83
You system sounds pretty busy, when it comes to scaling and performance tuning there are no easy answers and certainly no quick fixes. Simply setting max sessions to 150,000 won't make it happen, your hardware and system configuration need to be able to support that number. The limits are there for you to set, once you know the capability of your system, so your system does not get over loaded.

Everything you do has a cost in terms of CPU cycles and disk I/O, the simple fact that you have your loglevel set to DEBUG will have a significant impact on a busy system like yours.

As I said earlier, get a packet capture so you can understand exactly what is going on.

This is not something that can be easily worked through with forum posts, I would encourage you to seek some paid support from the FusionPBX team. https://www.fusionpbx.com

You can also get a support contract with the Freeswitch people if required.
 
Status
Not open for further replies.