SOLVED Nightmare_xfer on transfers

hfoster

Active Member
Jan 28, 2019
701
87
28
35
Hey everyone.

We have a client who is experiencing problems with failed transfers, and I'm just wondering if anyone has any bright ideas about what the typical causes of the nightmare_xfer scenario tend to be:

Code:
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.578055 100.00% [DEBUG] sofia.c:9012 Process REFER to [105@X.X.X.131]
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.578055 100.00% [DEBUG] sofia.c:9037 Replaces: [0_1386011598@10.0.1.22]
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.578055 100.00% [DEBUG] sofia.c:9427 REFER from 4c5f0ee2-0a68-4a98-9f7d-2775eabd310b replaces 0_1386011598@10.0.1.22 (105@X.X.X.131) with 191a78aa-f79a-4837-9942-e1dfb977b468 on another server
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.578055 100.00% [DEBUG] sofia.c:9545 Exporting replaces URL header [Replaces:0_1386011598@10.0.1.22;to-tag=4vK20DeU3j7SD;from-tag=1385895279]
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.578055 100.00% [DEBUG] sofia.c:9587 Good Luck, you'll need it......
4c5f0ee2-0a68-4a98-9f7d-2775eabd310b 2025-06-02 12:34:59.598054 100.00% [DEBUG] sofia.c:8820 1 .. 2 .. Freddies commin for you...

I've spared you the rest of the log, but essentially it fails as a 404 route not found, so the transfer fails.

It doesn't come up a lot, and is very hard to reproduce as it typically disappears when you move routers, networks, etc for the end user devices. I'm not exactly sure why FreeSWITCH thinks that the UUIDs are on different servers. It's quite a difficult problem to research too, as the only examples you see on the mailing list are where people are wrangling with their own unique setup for FreeSWITCH, typically with failover.

Tried all the usuals, like disabling the SIP ALG (even though the Cisco implementation typically seems to be quite friendly). The endpoints are just Yealink T42S's provisioned as per usual. Major deviation from the FusionPBX out of the box is using the postgresql core for FreeSWITCH though, but I would expect a lot more to manifest if there's a discrepancy in core databases.

If nothing else, when I do figure it out, it will be a useful resource for future FusionPBX users!
 
Well, I discovered what it is, it was the Cisco SIP ALG. I never thought to clear the NAT translations after disabling and the Cisco kept pushing SIP traffic through it long after a typical UDP timeout, I presume it just tracks them like TCP sessions.

You have to wonder why router manufacturers still keep including these ALGs!