[Tutorial] Creating a two node FusionPBX cluster the easy way.

Status
Not open for further replies.

fortissimus

New Member
Jan 15, 2017
3
0
1
123
Are you sure your BDR is functioning correctly? Those folders on mine are very small and more or less the same size

Thanks DigitalDaz.
I think BDR seems functional: at least it is doing what it's supposed to do. Changes from NodeA, replicates on NodeB and vice versa, etc.
Pardon my ignorance, would you have a clever way I can validate the BDR is fully functional?
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,038
556
113
Well, to start with, on each server and within the postgres db:

Code:
select * from pg_replications_slots;
 

fortissimus

New Member
Jan 15, 2017
3
0
1
123
Seems like something is not right here. NodeA (master) has 4 rows while NodeB has 2: shouldn't there be the same?

NodeA (Master)
fusionpbx=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------+--------+-----------+--------+------------+--------+------+--------------+-------------
bdr_16386_6521986482759153011_1_16386__ | bdr | logical | 16386 | freeswitch | f | | 3662913 | 1/75330790
bdr_16385_6506717661537256898_1_16385__ | bdr | logical | 16385 | fusionpbx | t | | 13602767 | 9/CD775598
bdr_16386_6506717661537256898_1_16386__ | bdr | logical | 16386 | freeswitch | t | | 13602767 | 9/CD775598
bdr_16385_6521986482759153011_1_16385__ | bdr | logical | 16385 | fusionpbx | f | | 3662913 | 1/7532A578
(4 rows)

NodeB(Slave)

fusionpbx=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
-----------------------------------------+--------+-----------+--------+------------+--------+------+--------------+-------------
bdr_16385_6506716560380556778_1_16385__ | bdr | logical | 16385 | fusionpbx | t | | 13639588 | 8/B4E498C0
bdr_16386_6506716560380556778_1_16386__ | bdr | logical | 16386 | freeswitch | t | | 13639588 | 8/B4E498C0
(2 rows)
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,038
556
113
I would export the db and drop that BDR cluster and recreate it, there is something definitely not right there.
 

Nikola

New Member
Apr 26, 2018
3
1
1
44
Hi,

I was following you guide and install 2 nodes on 2 clean debian 8.10. All went fine without errors but when I check log viewer I see lots of SQL errors:
1524743983772.png

I have also noticed that Switch Variables is completely empty.
I have tried clean install couple of times, both on debian 8.3 and debian 8.10, always with the same result.
Attached are also installation logs from both master and slave.

Thanks
 

Attachments

  • master.txt
    113.7 KB · Views: 18
  • slave.txt
    110.2 KB · Views: 9
  • Like
Reactions: Alex.K

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,038
556
113
Hmmm...

Interesting, that looks to me like something from callcenter, maybe the new version, 4.4, has broken my cluster builder.

I'll see if I can spin a couple up today and see what's what.
 

Nikola

New Member
Apr 26, 2018
3
1
1
44
Hi,

Ok, so I find a way to fix it:
I had to manually enter all switch variables using the GUI (if I edit vars.xml directly using the cli then it is not displayed in the GUI).
Also, I had to enter switch variable for "dsn=pgsql://hostaddr=127.0.0.1 dbname=freeswitch user=fusionpbx password='somepassword'"
password for db is from "/etc/fusionpbx/config.php"
I hope you find a way to fix it in the script...

Thank you
 

DigitalDaz

Administrator
Staff member
Sep 29, 2016
3,038
556
113
Hold on a sec, I do not thing my script replicates the freeswitch DB, are you saying that you added this AFTER the script installation and that's what is causing the problem??
 

Nikola

New Member
Apr 26, 2018
3
1
1
44
No, it was not working from start, i followed your tutorial exactly and made no changes to configuration. First time I logged in after installation and checked logs I saw error messages and empty switch variable page.
After that I was trying to investigate dsn variable and empty vars.xml file and manually entered all variables from other working fusinpbx installation.
 

mjoubert

New Member
Apr 4, 2018
5
0
1
41
Just to second what Nikola says. Same thing happens for me as well. I ended up taking bits and pieces of the script and just manually doing the replication after the default fusion install. I did copy your configs, and other stuff you had automated in it which helped out. I haven't located where things go awry yet, if I do, I'll post an update.
 

inform11

New Member
Feb 21, 2017
17
2
3
48
Russia
  • Like
Reactions: DigitalDaz

abraham

New Member
May 8, 2018
6
0
1
41
Any Intention of fixing the script?

I got it half working by inserting the variables from and older version, but inbound calls dont seem to work. Cant work our why it wont match the user. Im suspecting the tweaks ive made to get it this far has broken something else
 

Roget Hoffman

New Member
Nov 24, 2017
17
0
1
72
I have this 2 node master/failover working well. Took a bit of time to dial-in file syncing, stopping/starting services during failover (no reason to run FS on slave while the master is good) - So i thank you for the help and the scripts! - I do however have an issue that i hope is an easy fix:

When the slave takes over (if manually down the master) - the phones take a minute or so to re-register - and they register fine, but when I look at the front end (now on the slave) i see registrations from both master and slave (hostname) - i also notice that when the registrations reach zero on the timer the ones marked as the "master" host start counting into negative numbers and they never clear off the screen. As you know, if i "UNREGISTER" any phone - i loose both the master and the slave entries and the phone reboots - so that's not the answer. Also, when I bring back up the master, I also have 2 reg of every phone 1 for the master and 1 for the slave and the condition mentioned above is still present.

90% of time, this doesn't have any adverse affects. Phones work, inbound and outbound - but today - i had one phone which didn't receive inbound calls until i rebooted it (which cleared the "old" registration listed from the slave host).

If I look into the freeswitch db, i do see records in the registration table - maybe that should be turned off? If so, I'm not 100% sure how to disable the data going into freeswitch. I also have the setting track_calls = disabled

Any suggestions would be great - thanks
 

inform11

New Member
Feb 21, 2017
17
2
3
48
Russia
Тhere is a command in fs_cli to move the backup node to standby: sofia global standby on. The translation of the node in working mode:
sofia global standby off. Then no need to shutdown fs on slave. Maybe this will help you. I found this command by accident myself.

https://freeswitch.org/confluence/display/FREESWITCH/High+Availability:
The hostname on both machines needs to be the same, as the sql query to recover calls selects by hostname. To do this, simply set the following parameter in switch.conf.xml on both FS instances to the same value:

<param name="switchname" value="freeswitch"/>

This will make the transition from the first node to the second node invisible for most phones. Only cisco 79хх will reboots.
 
Last edited:

inform11

New Member
Feb 21, 2017
17
2
3
48
Russia
Today I set up a cluster. FusinPBX 4.4 does not work. The error is the same as in the picture above.
I had to fix the installation script: /usr/src/fusionpbx-sce-install/debian/resources/fusionpbx.sh (line 25)
# FUSION_VERSION=$FUSION_MAJOR.$FUSION_MINOR
FUSION_VERSION=4.2


4.2 installed and works well.
 
Last edited:
Status
Not open for further replies.