Configuring Servers for Active-Active Failover

Active-Active fail-over is a scalable and fault-tolerant configuration of a Horizontal Cluster, where multiple Iotellect servers, referred to as nodes, actively provide services simultaneously. This configuration is designed to ensure that even if one node or system fails, the services remain available without significant disruption.

The following tutorial shows how to use the Cluster Coordinator plugin to configure three servers as the basis for a scalable cluster.

Architecture

The system to be configured will have the following servers:

  • Two Primary Nodes, that will execute the usual functions of a single Iotellect server.

  • A single Cluster Coordinator, that will supervise the cluster, allocate resources, implement the designated failover logic, and ensure the coordinated functioning of all the other cluster elements.

Server Setup

Start three Iotellect servers. For ease of reference, they will be referred to as Cluster Coordinator, Primary Node A, and Primary Node B.

For all three servers, note the following information, which will be needed for adding them to the cluster:

  • IP address of the Iotellect server.

  • Port where the Iotellect server can be accessed.

  • Login and Password for the admin user.

For all of the servers, set a Server ID, under General Settings of the Global Configuration Options. Note that each Server ID must be unique within the cluster.

For all servers, check the Enabled property of the Horizontal Cluster of the Global Configuration Options.

In the Primary Nodes, configure the Address, Port, Login, and Password properties of the Horizontal Cluster, using values to indicate a connection with the Cluster Coordinator server.

Restart the Primary Nodes in order for the Server ID values to take effect.

In Primary Node A create a Virtual Device, or some other device, named virtual, resulting in the context path users.admin.devices.virtual. This will later be used to illustrate how resources are allocated by the Cluster Coordinator.

The server which will server as Cluster Coordinator will be restarted during the next step, so there is no need to restart it now.

Configuring the Cluster Coordinator

Navigate to the server that will serve as the Cluster Coordinator, and open the Cluster Coordinator plugin configurations.

The first group of properties, aptly named “Cluster Coordinator”, contains an option Enable, to specify the server as the Cluster Coordinator.

Additional properties in this group allow databases to be specified for the Coordinator Cache and Coordinator Storage. The default options are sufficient for most use cases, and for this tutorial.

Restart the server to activate coordinator-related databases and enable additional functions in the Cluster Coordinator plugin.

Designating Primary Nodes in the Cluster Coordinator

After restart is complete, open the context menu of the Cluster Coordinator plugin on the Cluster Coordinator server.

Call the Add Tenant action of the Cluster Coordinator plugin. Specify a unique tenant ID to identify the tenant in the cluster. This value will be used in all places which call for a Tenant ID.

Call the Add Primary Node function from the context menu for each node. Use the IP address, login details, and Server ID for each of the Primary Nodes as input for the function when prompted.

Confirming the Addition of Primary Nodes

To ensure that the Primary Nodes have been added to the Cluster Coordinator, open the Primary Nodes table in the Cluster Coordinator plugin.

Both Primary Nodes should appear in the table, indicating the IP address.

From the Cluster Coordinator plugin context menu, call the function View Status, and confirm that detailed information about each Primary Node is displayed.

Allocation Rules

To improve resource usage by the cluster, Allocation Rules are used to determine which contexts and resources are registered with the Cluster Coordinator, and which Primary Node should be the default location for these resources.

Suppose that the unique ID for Primary Node A is the string "primaryNodeAIdentifier".

Suppose that context path of the resource to be allocated contains the string "devices.virtual".

Add a single row to the Allocation Rules table, with the following data:

Field Name

Value

Explanation

Name

ExampleRule

Name for reference purposes.

Expression

"primaryNodeAIdentifier"

This expression must return the node ID of the Primary Node to which a resource should be allocated. In this example, the node ID is hardcoded, but typically would be an expression which can return a number of different node IDs based on the given resource.

Condition

contains({resourcePath[0]},"devices.virtual")

The standard reference {resourcePath[0]} gets the context path of the resource being evaluated, by taking the value indexed by the field ‘resourcePath’ and the first row of the default data table.

The string processing function contains(), as used here, returns True in the case that the context path of the resource being evaluated contains the string "devices.virtual"

Commentary

For user reference purposes, no value required.

With this Allocation Rule, all resources with a context path containing the string "devices.virtual" will be allocated to Primary Node A.

Registering Node Resources with the Cluster Coordinator

From the context menu of the Cluster Coordinator plugin, call the Scan Resources function to register the resources in the Cluster Coordinator storage. A resource must match at least one of the allocation rules in order to be registered

Testing Failover Actions

In order to see how the Cluster Coordinator manages a failing server, shutdown the server with Primary Node A.

The device resource which was on Primary Node A, has the context path users.admin.devices.virtual, so the allocation Condition returns True when the Cluster Coordinator evaluates this resource.

However, the Expression expression of the allocation rule returns an invalid node ID, since the server for Primary Node A has been shut down.

Log into Primary Node B, and navigating to the resource which was allocated to Primary Node A, in this case, the device with context path users.admin.devices.virtual.

Restart Primary Node A, and check both nodes. The resource should have been reallocated back to Primary Node A, and removed from Primary Node B.

Was this page helpful?