Working with External Cassandra Database

The embedded Cassandra service used by the Iotellect server is sufficient for a wide range of use cases, but exceptionally large or specialized systems may require unique database configurations. Iotellect offers the flexibility to connect with an external Cassandra instance, giving administrators detailed control over the database infrastructure.

Deploying a High-Performance External Cassandra Instance

By default, the Iotellect server configures the embedded Cassandra service by applying databaseCassandra* parameters from its own configuration (server.xml by default) and some hard-coded values.

In order to configure a high-performance Cassandra instance, use the external configuration file cassandra.yaml. This file is the default way for configuring a standalone installation of Cassandra DB. All the supported parameters are described in the official Cassandra documentation.

To use this feature:

  1. Install Java 8 or JDK 8.

  2. Install (unpack) the latest version of Cassandra.

  3. Edit config/cassandra.yaml

    • Configure data directory and commitlog directory (preferably to different physical drives).

    • Set start_rpc to true.

    • Specify the IP address of the Cassandra server in seeds, listen_address and rpc_address.

      If you modify the listen_address in your cassandra.yaml file to an IP address other than 127.0.0.1, you must also specify this address in the Iotellect server server.xml with the following option:

      databaseCassandraHost>your_cassandra_ip_address</databaseCassandraHost>
  4. Set JAVA_HOME environment variable to JRE/JDK's root folder.

  5. Set CASSANDRA_HOME environment variable to Cassandra installation folder.

  6. Edit bin/cassandra

    • Change Xms and Xmx settings to a higher value, leaving at least 2-4 Gb of RAM for an operating system.

  7. Launch bin/cassandra and make sure Cassandra started listening for incoming connections.

  8. Enable the “Use External YAML Configuration File" flag in Database tab of Server Configuration page and reset the server.

After restarting, the server will use the external cassandra.yaml file and ignore all the databaseCassandra* parameters in its own configuration (server.xml by default). Note that such a switching may cause issues with an existing database. For example, if cassandra.yaml contains a different value for cluster_name parameter, the server won’t start at all.

You usually should not change cluster name. However, if this is unavoidable, you can add the following JVM options to ag_server.vmoptions file (or another corresponding file):
-Dcassandra.ignore_rack=true
-Dcassandra.ignore_dc=true
This will prevent server from failure upon starting with a different cluster name.

Also note that the name and the location of the cassandra.yaml file are not configurable. The file is supposed to be in the server home (installation) directory. The file is provided as a part of server’s installation bundle and, unlike server.xml, is not changed by the server in any way (i.e. used in read-only mode).

The cassandra.yaml file in Iotellect is a modified copy of the standalone Cassandra’s default configuration file. All the modifications are augmented with a big comment header and description like this:

########################################################################################################################
# NOTE
########################################################################################################################
# This parameter was changed from default 'true' to 'false' according to internal config
########################################################################################################################

Because changes to this file can make Cassandra not work or perform poorly, the Iotellect distribution bundle also contains a backup of working a cassandra.yaml, named cassandra.default.yaml file with all the initial values kept intact. The file is not used by Iotellect and is intended to be used as a roll back point for the working file in case of failure.

Optimizing Cassandra Performance for Write-Heavy Workloads

Write-heavy workload means many insert operations. The faster you insert data, the faster you need to compact in order to keep the stable count down. Thus you will need to edit the following parameters in cassandra.yaml:

  • Increase the number of concurrent_compactors.

  • Increase the value of compaction_throughput_mb_per_sec or disable throttling by setting this parameter to zero.

Also you may need to change compaction strategy of particular tables. The following parameters might need to be changed:

  • compaction - the preferable strategy is TimeWindowCompactionStrategy. It was particularly designed for time series and expiring TTL workloads. For its customization you will only need to change compaction_window_unit and compaction_window_size. Also unchecked_tombstone_compaction must be set to true to make Cassandra drop expired sstables in real-time.

  • gc_grace_seconds - should be decreased or set to zero (only if you're using a single-node cluster).

To change compaction parameters you will need to execute a query using CQL interactive terminal. Launch bin/cqlsh to connect to your current Cassandra node:

USE aggregate;
ALTER TABLE <table_name> 
  WITH compaction = {'class' : 'TimeWindowCompactionStrategy', 'compaction_window_unit' : 'HOURS', 'compaction_window_size' : '1', 'unchecked_tombstone_compaction' : 'true'} 
  AND gc_grace_seconds = 0;

Before tuning your database using listed parameters please check the documentation for Apache Cassandra. Tuning the database may be different in your particular case.

Network Timeout Settings

Consider changing timeout settings when you're planning to perform time-consuming operations. Otherwise you might get TimedOutException while working with a database.

Parameter

Description

range_request_timeout_in_ms

How long the coordinator should wait for seq or index scans to complete.

read_request_timeout_in_ms

How long the coordinator should wait for read operations to complete.

write_request_timeout_in_ms

How long the coordinator should wait for writes to complete.

request_timeout_in_ms

The default timeout for other, miscellaneous operations.

Be judicial when raising timeout rates, and understand the memory and CPU usage of the operations being performed. Timeout rates are a protection against clients waiting too long on operations which never complete because of an error or underlying resource constraints.

Troubleshooting with Nodetool

nodetool is a built-in Cassandra tool for getting various insights from the Cassandra nodes. It can be extremely useful for hunting down performance issues.

To use nodetool with the Iotellect embedded Cassandra service:

  1. Download Cassandra 3.11 installation bundle and extract it to any folder.

  2. Make sure the Iotellect server is running with embedded Cassandra service.

  3. Make sure you have at least Java 8 set in your JAVA_HOME environment variable or corresponding java executable in your PATH variable. You can check the last by issuing java -version command.

  4. Navigate to the extracted bin/ directory and run the following command:

$ ./nodetool --port 11111 status

The --port option refers to embedded Cassandra' JMX port. If you run nodetool from a different machine, you also need to specify --host option with Iotellect IP address or host name.

The output of the command should look like:

Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 1.35 MiB 256 100,0% 38512adb-b868-4ea1-9593-bf3b9950e687 rack1

The basic information commands of nodetool are status, version and info. There are many other helpful commands described in the documentation and in the tool itself (just run it without options or with the help command).

Direct Querying with CQLSH

cqlsh is an interactive tool for performing various queries to Cassandra database node (the name stands for “CQL Shell”). It can be used to interactively select/update the data inside the database instance.

Like nodetool, the shell is also provided as a part of Cassandra standalone installation.

To use it with Iotellect embedded Cassandra:

  1. Download Cassandra 3.11 installation bundle and extract it to any folder.

  2. Make sure the Iotellect server is running with embedded Cassandra service.

  3. Make sure you have Python 2.7 available in your PATH environment variable. You can check it by issuing python --version command in console. Note that Python 3+ is not compatible.

  4. Navigate to the extracted bin/ directory and run the following command:

$ ./cqlsh

This will try to connect to Cassandra deployed at localhost:9042. If necessary, the host and port can be specified either by $CQLSH_HOST and $CQLSH_PORT environment variables or by appending them to the command itself, e.g. $ ./cqlsh localhost 9042 (note that the pair is separated by a space, not a colon).

Note that the previous command assumes default username and password, but if you need specify them, add the following options:

$ ./cqlsh -u cassandra -p cassandra

In either case the output should look like:

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
WARNING: pyreadline dependency missing. Install to enable tab completion.
cqlsh>

To test some query, try this:

cqlsh> select count(*) from aggregate.ag_events;

For more details on cqlsh see this page of the documentation. The syntax of CQL is described here.

Was this page helpful?