Experimenting with a full load-sharing cluster

Overview

I wanted to share my general notes on what I have working so far, and how, to achieve a full load-sharing FusionPBX cluster. I am hoping that we can collaborate in this thread to work up a document showing what's possible and what's not.

My goal is to have a PBX cluster with separated data layer and service layer so that any number of FreeSWITCH nodes (service layer) can be joined in and the configs and live data will all exist in the database, which does not necessarily have to coexist with FreeSWITCH. That said, there's value in having a local DB on every FS machine to reduce latency between the DB and the things that are accessing it (FS, lua scripts, FusionPBX web interface). So what I had in mind is just to do a full install of FusionPBX on every node and link the databases.

I'm not too comfortable with BDR yet so at this point in my experiments I have two nodes and both of them pointing to a single node for their shared fusionpbx and freeswitch databases. I believe the effective result is the same as if we were doing BDR and each FusionPBX accessed its local clustered DB instance.

The FS nodes should be able to service any endpoints or incoming calls routed from a provider. By way of the shared freeswitch database they should be able to route internal calls between nodes in order to do extension-to-extension calling, call park and retrieval, conferencing, queues, etc.

Of course, this should work with the multi-tenant domain model.

I'll add posts to the thread to write about some of the specifics.

NOTE! There are likely errors in my ideas and methods so please point them out. Ultimately I would like a refined and accurate how-to guide to come out of this.
 
Last edited:
Lab Setup

Two nodes on Digital Ocean, two different datacenters in the same region. I launched two Debian 8 instances @ 1GB RAM.

On both nodes I did standard installs of FusionPBX using the installer at https://github.com/fusionpbx/fusionpbx-install.sh . Before running debian/install.sh, I edited debian/resources/config.sh and made the following changes:
  • system_branch=master
  • database_repo=2ndquadrant
(I chose the 2ndquadrant repo so that I could change this over to BDR later.)

Run the installs.

I decided "node1" would be my DB host so I basically ignored the DB installation on node2 at this point.

Configure postgresql on node1 so that it listens on '*' interfaces (postgresql.conf) and provide access for the other node (in pg_hba.conf). Also add a line for node1's IP. Adjust iptables to allow access to postgres port 5432 from node2 to node1.

DB

Edit /etc/fusionpbx/config.lua on both nodes. Set the database.system and database.switch DSNs to point to the public address on node1 using the password in the file of node1.

While in this file, find the xml_handler.fs_path line and set it to true, as this will be needed for cluster routing.

Edit /etc/fusionpbx/config.php in the same manner as above. Point both nodes to the database on node1.

Now from the web interface of node1 you can log in using admin/the password given after setup. From the web interface of node2 you can log in using admin@node1ipaddress and the password for node1.
 
Telling FS to use PGSQL as its core DB

Edit /etc/freeswitch/autoload_configs/switch.conf.xml on both nodes.

Uncomment the "core-db-dsn" param which has the value $${dsn}.

Uncomment the "core-dbtype" param and set the value to "pgsql".

In fusionpbx, go to the Variables and add a new variable:
  • name: dsn
  • value: the same DSN string used in the config.lua file with dbname=freeswitch
  • enabled: true
Go to the SIP Profiles section and edit each SIP profile. Enable the "odbc-dsn" parameter which already points to $${dsn}. (Note - I don't know strictly why this is necessary to specify odbc settings for the SIP profiles if we have postgres enabled for the core)

At this point to get all the configs reloaded I restarted memcached and freeswitch on both nodes. Using psql you should be able to check the freeswitch database and see that it populated the schema and if you select from the freeswitch.interfaces table you can see entries for both nodes.
 
Last edited:
Configure FusionPBX for DB storage

My goal is to store as much as possible in the database so that we don't have to worry about data on the filesystem and can add and remove service nodes somewhat freely. There are likely performance considerations in doing this, but I don't know yet. Doing filesystem replication with corosync among nodes is the other option and has been written about elsewhere on the forum.

FusionPBX -> Advanced -> Default Settings
  • Recordings storage_type = base64, Enabled = true (store recordings in the database)
  • Voicemail storage_type = base64, Enabled = true (store voicemail messages in the database rather than in the filesystem)
  • Fax storage_type = base64, Enabled = true

lua-sql is needed for this. Install it on Debian like this:
  • apt-get install libpq-dev lua-sql-postgres-dev
 
Last edited:
Permitting intra-cluster calling

Some intra-cluster calling will work without making any changes; for example, extension-to-extension calling when the extensions are registered on different servers. If you watch the FreeSWITCH consoles you will see that the call is initiated (for example) on node1, finds the registered target on node2 and sends the call there. Node2 then sees the call and because it is coming in on the internal profile (sip 5060) it issues an auth challenge, which the caller is able to answer because he is a member of the domain.

I worked out something different to eliminate the auth challenge and handle situations where this wouldn't work.

In Advanced - Access Controls I have edited the Domains ACL and used this to keep all of the cluster nodes. NOTE - if you do this then you can't use this ACL for your providers - so make another one for that purpose or send your providers to the external profile as you should. My Domains ACL contains the IP addresses of the two nodes.

Then I edited SIP Profiles - Internal: apply-inbound-acl = domains:cluster-in (rather than just domains). This syntax (according to https://freeswitch.org/confluence/display/FREESWITCH/ACL ) says that any calls that match this ACL should go to the context "cluster-in" rather than the default context for this profile which is "public."

Now create the cluster-in context:

Mine looks like this in Dialplan manager and what it does is figures out where the call is destined based on the domain of the call and transfers it into that part of the dialplan; pretty simple.

1520825406402.png

I'm still testing it out but this little piece of dialplan seems to handle what I have thrown at it so far.
 
Test Results (so far)

Test setup:
  • a domain called "test.example.com" set up in FusionPBX
  • extensions 1000 and 1001
  • conference room 2000
  • DNS SRV for test.example.com pointing to node1 and node2 with equal weight/priority
Registered 1000 and 1001 and forced them to use node1 (1000) and node2 (1001) by specifying the proxy setting in the SIP client. So we are testing cross-cluster domain calls.

Works
  • extension-to-extension calls (both directions)
  • call hold/resume
  • blind transfer
  • attended transfer
  • conference
    • whoever starts the conference by dialing 2000 first hosts the conference on his node; when the other extension calls in, his call is routed over to that node to join the conference
  • call park and park retrieval
  • inbound calls from PSTN provider to either server (using DNS SRV pointing to external profile port 5080)
(More to come)
 
Last edited:
Great work. I would be interested to see what would happen if you have the two DB's clustered and one node fails or stops responding.

Do both nodes have the exact same hostname? Since you have two IP's you must be relying on NAPTR/SRV for the load balancing correct?
 
Last edited:
Yes use SIP SRV records for the load sharing/balancing on each SIP domain; the individual hostnames are different. I think DB is the most sensitive part. Would be nice if there were a failover/LB option for DB connections. With MySQL I have used mysqlproxy and mysql-router in the past. I don't know what the postgresql option is.
 
Yes use SIP SRV records for the load sharing/balancing on each SIP domain; the individual hostnames are different. I think DB is the most sensitive part. Would be nice if there were a failover/LB option for DB connections. With MySQL I have used mysqlproxy and mysql-router in the past. I don't know what the postgresql option is.
Thanks for the info. I think I will try test this out myself.
 
This is pretty cool. I've got WebRTC clients connecting to a pool of servers behind a load balancer. The connections establish randomly to the servers in pool, and registration is shared in the common pgsql database. Inter-cluster calling works with fs_path - but my inbound calls I had to create a dedicated context for them to regex the number (depending on where it originated) before transferring to the context.

Works like a charm. Conferencing doesn't work, so I just created a dedicated conference server. The rest seems to work.
 
Yes use SIP SRV records for the load sharing/balancing on each SIP domain; the individual hostnames are different. I think DB is the most sensitive part. Would be nice if there were a failover/LB option for DB connections. With MySQL I have used mysqlproxy and mysql-router in the past. I don't know what the postgresql option is.
Grate work, I will give this a try within Microsoft Azure. In Microsoft Azure there is a PostgresSQL managed service cluster, it offers an SLA of 99.99% starting at only < 30 USD/month. It's only available with version 9.5 to 10. Do you have any experience with FreeSwitch and PostgresSQL version 9.5?