Working with External Cassandra Database
The embedded Cassandra service used by the Iotellect server is sufficient for a wide range of use cases, but exceptionally large or specialized systems may require unique database configurations. Iotellect offers the flexibility to connect with an external Cassandra instance, giving administrators detailed control over the database infrastructure.
Deploying a High-Performance External Cassandra Instance
By default, the Iotellect server configures the embedded Cassandra service by applying databaseCassandra*
parameters from its own configuration (server.xml
by default) and some hard-coded values.
In order to configure a high-performance Cassandra instance, use the external configuration file cassandra.yaml
. This file is the default way for configuring a standalone installation of Cassandra DB. All the supported parameters are described in the official Cassandra documentation.
To use this feature:
Install Java 8 or JDK 8.
Install (unpack) the latest version of Cassandra.
Edit
config/cassandra.yaml
Configure data directory and commitlog directory (preferably to different physical drives).
Set start_rpc to
true
.Specify the IP address of the Cassandra server in seeds, listen_address and rpc_address.
If you modify the listen_address in your
cassandra.yaml
file to an IP address other than127.0.0.1
, you must also specify this address in the Iotellect serverserver.xml
with the following option:databaseCassandraHost>your_cassandra_ip_address</databaseCassandraHost>
Set
JAVA_HOME
environment variable to JRE/JDK's root folder.Set
CASSANDRA_HOME
environment variable to Cassandra installation folder.Edit
bin/cassandra
Change Xms and Xmx settings to a higher value, leaving at least 2-4 Gb of RAM for an operating system.
Launch
bin/cassandra
and make sure Cassandra started listening for incoming connections.Enable the “Use External YAML Configuration File" flag in Database tab of Server Configuration page and reset the server.
After restarting, the server will use the external cassandra.yaml
file and ignore all the databaseCassandra*
parameters in its own configuration (server.xml
by default). Note that such a switching may cause issues with an existing database. For example, if cassandra.yaml
contains a different value for cluster_name
parameter, the server won’t start at all.
![]() | You usually should not change cluster name. However, if this is unavoidable, you can add the following JVM options to |
Also note that the name and the location of the cassandra.yaml
file are not configurable. The file is supposed to be in the server home (installation) directory. The file is provided as a part of server’s installation bundle and, unlike server.xml
, is not changed by the server in any way (i.e. used in read-only mode).
The cassandra.yaml
file in Iotellect is a modified copy of the standalone Cassandra’s default configuration file. All the modifications are augmented with a big comment header and description like this:
########################################################################################################################
# NOTE
########################################################################################################################
# This parameter was changed from default 'true' to 'false' according to internal config
########################################################################################################################
Because changes to this file can make Cassandra not work or perform poorly, the Iotellect distribution bundle also contains a backup of working a cassandra.yaml
, named cassandra.default.yaml
file with all the initial values kept intact. The file is not used by Iotellect and is intended to be used as a roll back point for the working file in case of failure.
Optimizing Cassandra Performance for Write-Heavy Workloads
Write-heavy workload means many insert operations. The faster you insert data, the faster you need to compact in order to keep the stable count down. Thus you will need to edit the following parameters in cassandra.yaml
:
Increase the number of concurrent_compactors.
Increase the value of compaction_throughput_mb_per_sec or disable throttling by setting this parameter to zero.
Also you may need to change compaction strategy of particular tables. The following parameters might need to be changed:
compaction - the preferable strategy is TimeWindowCompactionStrategy. It was particularly designed for time series and expiring TTL workloads. For its customization you will only need to change compaction_window_unit and compaction_window_size. Also unchecked_tombstone_compaction must be set to
true
to make Cassandra drop expired sstables in real-time.gc_grace_seconds - should be decreased or set to zero (only if you're using a single-node cluster).
To change compaction parameters you will need to execute a query using CQL interactive terminal. Launch bin/cqlsh
to connect to your current Cassandra node:
USE aggregate;
ALTER TABLE <table_name>
WITH compaction = {'class' : 'TimeWindowCompactionStrategy', 'compaction_window_unit' : 'HOURS', 'compaction_window_size' : '1', 'unchecked_tombstone_compaction' : 'true'}
AND gc_grace_seconds = 0;
![]() | Before tuning your database using listed parameters please check the documentation for Apache Cassandra. Tuning the database may be different in your particular case. |
Network Timeout Settings
Consider changing timeout settings when you're planning to perform time-consuming operations. Otherwise you might get TimedOutException while working with a database.
Parameter | Description |
---|---|
range_request_timeout_in_ms | How long the coordinator should wait for seq or index scans to complete. |
read_request_timeout_in_ms | How long the coordinator should wait for read operations to complete. |
write_request_timeout_in_ms | How long the coordinator should wait for writes to complete. |
request_timeout_in_ms | The default timeout for other, miscellaneous operations. |
![]() | Be judicial when raising timeout rates, and understand the memory and CPU usage of the operations being performed. Timeout rates are a protection against clients waiting too long on operations which never complete because of an error or underlying resource constraints. |
Troubleshooting with Nodetool
nodetool
is a built-in Cassandra tool for getting various insights from the Cassandra nodes. It can be extremely useful for hunting down performance issues.
To use nodetool
with the Iotellect embedded Cassandra service:
Download Cassandra 3.11 installation bundle and extract it to any folder.
Make sure the Iotellect server is running with embedded Cassandra service.
Make sure you have at least Java 8 set in your
JAVA_HOME
environment variable or correspondingjava
executable in yourPATH
variable. You can check the last by issuingjava -version
command.Navigate to the extracted
bin/
directory and run the following command:
$ ./nodetool --port 11111 status
The --port
option refers to embedded Cassandra' JMX port. If you run nodetool
from a different machine, you also need to specify --host
option with Iotellect IP address or host name.
The output of the command should look like:
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 127.0.0.1 1.35 MiB 256 100,0% 38512adb-b868-4ea1-9593-bf3b9950e687 rack1
The basic information commands of nodetool
are status
, version
and info
. There are many other helpful commands described in the documentation and in the tool itself (just run it without options or with the help
command).
Direct Querying with CQLSH
cqlsh
is an interactive tool for performing various queries to Cassandra database node (the name stands for “CQL Shell”). It can be used to interactively select/update the data inside the database instance.
Like nodetool
, the shell is also provided as a part of Cassandra standalone installation.
To use it with Iotellect embedded Cassandra:
Download Cassandra 3.11 installation bundle and extract it to any folder.
Make sure the Iotellect server is running with embedded Cassandra service.
Make sure you have Python 2.7 available in your
PATH
environment variable. You can check it by issuingpython --version
command in console. Note that Python 3+ is not compatible.Navigate to the extracted
bin/
directory and run the following command:
$ ./cqlsh
This will try to connect to Cassandra deployed at localhost:9042
. If necessary, the host and port can be specified either by $CQLSH_HOST
and $CQLSH_PORT
environment variables or by appending them to the command itself, e.g. $ ./cqlsh localhost 9042
(note that the pair is separated by a space, not a colon).
Note that the previous command assumes default username and password, but if you need specify them, add the following options:
$ ./cqlsh -u cassandra -p cassandra
In either case the output should look like:
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.10-SNAPSHOT | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
WARNING: pyreadline dependency missing. Install to enable tab completion.
cqlsh>
To test some query, try this:
cqlsh> select count(*) from aggregate.ag_events;
For more details on cqlsh
see this page of the documentation. The syntax of CQL is described here.
Was this page helpful?