Flow Data Aggregation
NetFlow collection server in a typical large-scale network receives thousands of flows per second. Storing all flows in a database, keeping them persistent for a significant time, and loading them quickly on-demand for visualization purposes will require enormous server resources (CPU power, memory and HDD space).
To find a compromise between detailed flow statistics and server resource consumption, Iotellect Network Manager uses a complex multi-level flow caching and aggregation system. This system includes two primary tiers:
- In-memory flow cache
- Multi-level flow aggregation in the server database
Flow data aggregation settings can be configured in a global configuration of NetFlow plugin. To access this configuration:
- Expand Drivers/Plugins (
) node in the System Tree.
- Double-click NetFlow plugin (
).
In-Memory Flow Cache
In-memory flow cache is the first cache each received flow gets into. It's also the fastest cache, it's much faster comparing to lower level database flow storages. The caveat is high server RAM usage.
The memory cache is controlled by three parameters:
Property | Description |
Maximum Memory Cache Size | Specifies a maximum number of flows that can be collected in memory before a first-level aggregation round starts. |
Memory Cache Period | Specifies a period of first-level aggregation. The aggregation will be performed each period regardless whether the number of flows contained in memory cache has reached Maximum Memory Cache Size. |
Discarded Traffic Volume, % | Defines how many flows will be discarded during any first-level aggregation round. |
All flows received from devices are initially stored in the memory cache. All cache elements are processed by the server in a so-called aggregation round that starts:
- In the end of every Memory Cache Period
- If number of flows in the cache exceeds Maximum Memory Cache Size
Flow Aggregation Rounds
A flow aggregation round includes several steps:
- Joining "equal" flows into a single flow. All flows in the memory cache or a database storage are processed. Server finds flows with the same source/destination addresses, ports, traffic types, and other metrics. Those flows are combined into a single flow by summation of their traffic volumes. The total number of flows in the cache is, thus, decreased.
- Discarding the least important flows. Flows reflecting a small number of transferred bytes are deemed less important than flows representing large traffic volumes. The server calculates total traffic volume represented by all flows in the cache, sorts aggregated flows by traffic volume, and discards flows with smallest traffic counters one-by-one until the total traffic volume represented by remaining flows will decrease by Discarded Traffic Volume percents comparing to the original traffic volume. In practice, discarding just 1-3 percents of traffic volume will actually cause dropping of 80-90% of flows. Those flows match applications that doesn't generate a lot of traffic and, thus, are often not important for analysis (for example, mail traffic, messenger applications, and traffic caused by network monitoring). Because the interface "0" data shows the total amount of traffic transferred through all other interfaces, all traffic that has "0" interface both as an inbound and outbound interface is also discarded. This is done in order to avoid the duplication of traffic.
- Relocation of flows to a lower-level cache. Once flows are aggregated and less important flows are discarded, the number of flows it in most cases decreased a lot. The remaining flows are then moved to a lower level cache:
- Flows contained in the memory cache are moved to a first-level database storage
- Flows contained in a database storage are moved to a lower-level storage (e.g. from first level to second level)
Database Flow Storages
Database flow storages are dedicated database tables that contain flows stored persistently. There are multiple storages called "levels", the each storage is in many senses similar to In-Memory Flow Cache.
Each storage is configured by:
- Storage Period, i.e. a period of flow aggregation rounds for this storage. This option is similar to Memory Cache Period of In-Memory Flow Cache.
- Discarded Traffic Volume percentage used for dropping less important flows during an aggregation round.
The database flow storages are configured via Database Storage Configuration table. Here is how a default storage configuration table may look like:
Storage Period | Discarded Traffic Volume, % |
5 Minutes | 1 |
1 Day | 5 |
1 Month | 5 |
1 Year | 5 |
After an aggregation round of in-memory flow cache in completed, all flows are pushed to the first level database storage. Each five minutes all flows are aggregated, 1% of least important flows is discarded, and remaining flows are moved to second level cache where they remain for 1 day, and so on.
![]() | Flow Data Aggregation is not working with embedded NoSQL storage at the moment due to usage of native functionality of DB engines. |
Was this page helpful?