Storage
Time Series Storage and Aggregation
Machines are the world's most active generators of data, which is mostly times series data showing how different metrics change with time. If we poll ten metrics from just one thousand devices every minute and keep data for ten years, we'll need to store 50 billion samples in our database!
Depending on subscription parameters, a single Iotellect instance can process up to a billion data samples per day and keep such raw data for weeks.
However, we're genuinely not interested in the per-minute statistics that is five years old. For such old data, we'd normally like to know hourly or even daily averages. And there are solutions for this case as well.
Statistics
Iotellect can store long-term time series data in a Round-Robin Database (RRD). The data is stored in a circular buffer, so the storage footprint remains small and constant over time. RRD-based storage also ensures extremely fast access to the historical data for any time period.
Other features of the SPC module include automatic calculation of counter rates (e.g., flow rate or traffic), configurable degradation of precision for older values, and effective concurrent operation with raw data storage.
Granulation
Granulation module is designed to split continuous time into sections (granules) for calculating and storing various aggregate values in each granule. It is similar to RRD-based statistics, but instead of fixed-length interval sets it can use any advanced slicing of the time axis:
- Real days in a certain time zone, including daylight saving consideration
- Morning, day, evening and night hours of every day
- Real months with respect to leap years
- Flexible company's work shifts
- Weekdays, weekends and holidays
- Any more complicated time slicing
Higher flexibility of granulation compared to the round-robin database pays back by lower data update/retrieval performance and higher disk space consumption.
Granules use a regular (NoSQL or SQL) storage facility to keep any user-defined data for every time slice. Here are some samples:
- Average, minimum and maximum value
- Sum of values of any other equation-based result
- First and last values in the time slice, as well as their timestamps
- Counts of samples with different quality (good, bad, unreliable, etc.)
- Granule-wide marks, such as "data not available"
- Any custom numeric, textual or binary data.
Persistent Storage Facilities
All persistent data stored by Iotellect are divided into just a few major groups: configuration, events, binary blocks, statistics, and topologies. This simple division provides absolute flexibility in adding new types of devices and business objects to your app without changing or even knowing the structure of the data storage facility.
Depending on the delivery model, Iotellect instance may work with some or all of the following storage facility types:
NoSQL Database
Integrated NoSQL database engine is the primary storage facility that offers very high insertion rates and failover clustering, as well as storage-level horizontal scalability.
Relational Database
This storage option has limited time series data processing performance substantiated by limitations of any SQL database. However, it’s the preferred choice for keeping topological data with multiple relations between data elements.
Graph Database
Not suitable for time series data, a graph storage facility houses large-scale topological structures. These can be network topologies, hierarchies of services, configuration management databases, electrical and piping schemes, and more.
Round-Robin Database
RRD is a time series storage facility that keeps numeric values aggregated by time periods. It offers constant disk/memory footprint and extremely fast storage/acquisition rates.
File-Based Storage
Plain file storage facility is normally used by edge gateways with extremely limited resources. Its operation minimizes CPU power and memory consumption required to store configuration and binary data. However, time series storage is limited to buffering data on its way to the cloud or a higher tier server.