Data Aggregation

Comparison real-time - historical data

In addition to the real-time telemetry data that flows into our IoT platform via MQTT, the so-called OSF files play a crucial role in the OptiMEAS ecosystem. OSF files are files in binary OSF file format (Optimeas Streaming Format) and are not transferred from the connected devices to optiCLOUD in real time, but at configurable intervals of typically 5 to 15 minutes. MQTT data, on the other hand, are snapshots of all incoming data in JSON format, which the OptiMEAS devices transmit to the cloud at a maximum of 1Hz. In this section you will learn more about the differences between these two data types and why they play an important role in our platform.

Differences between MQTT data and .OSF data

In optiCLOUD, there are differences between the data that is transmitted via MQTT in real time and the OSF data that is sent at periodic intervals. These differences are crucial for the quality, reliability and usability of the data. Some of the differences between these two types of data are explained below:

Time delay and real-time transmission

MQTT data (real-time): Data transmitted via MQTT(s) is available in real time. It provides immediate insights into the current status of networked devices and systems. This is particularly useful for viewing the most up-to-date status of connected assets and is particularly useful for real-time monitoring and control.
.OSF data (Delayed): In contrast, .OSF data is transmitted at configurable intervals of typically 5 to 15 minutes via HTTP(s). There is therefore a natural delay effect in updating the data and we therefore do not refer to it as real-time data. In this strict definition of live data (strict because television broadcasts, for example, are also referred to as "live", although they are sometimes artificially delayed by X minutes), OSF data is therefore not suitable for real-time monitoring purposes.

In this example, the OSF data is uploaded every 2 minutes, which is an unusually short interval, but you can still see the difference between the two data types.

Reliability

MQTT data (real-time):
Real-time telemetry data can be lost under certain circumstances, especially if there is no internet connection. This is due to the fact that they are transmitted at QoS (Quality of Service) level 0, where no delivery confirmations are made and they are not cached on the device side. If the internet connection of the connected edge device is lost, the data for the time period of the connection interruption is also lost.
OSF data (delayed):
OSF data is guaranteed to be uninterrupted, as it is recorded even if the Internet connection is interrupted and uploaded later as soon as a connection is re-established. Only a genuine device failure or a connection interruption that lasts so long that the entire ring memory of the SD card has been written to once can lead to data loss.

In this example, you can see the data gap that occurs when the internet connection of the connected asset is lost. The gap can be seen in the MQTT data, while the OSF data was still recorded without a gap and then transferred to optiCLOUD afterwards.

Data quality

MQTT data (real-time):
Due to the "low" sampling rate at which the telemetry data is sent via MQTT and optiCLOUD, depending on the measured signal or process, there may be a loss of information and aliasing due to undersampling. The pure live data is therefore not suitable for making correct statements about the underlying process, especially for the subsequent analysis of fast-running processes.
OSF data (delayed):
The sampling rate at which the measurement data is written toi OSF files is configurable and can therefore also be adapted to fast-running processes. The loss of information that may occur due to undersampling can thus be avoided and the data is ideally suited for later analysis.

This example shows the differences between the various sampling rates. While the MQTT data was sent to optiCLOUD at 1Hz, it was measured in the OSF data at 10Hz.

Conclusion

Overall, OSF data is the preferred source for downstream analysis and accurate mapping of measured processes due to its configurable sampling rates and reliability. It is particularly well suited for retrospective evaluations and detailed insights into the underlying processes and monitored equipment. On the other hand, MQTT data is valuable for real-time monitoring in order to always have an up-to-date insight into the processes and to be able to react quickly to current events. The choice between these two data sources depends on the specific requirements and objectives of your IoT application. When configuring the connected devices, it is therefore advisable to select the data that should be transmitted live and the data that should not be live but should be of high quality so that it can be used in subsequent analyses. For some slow-running or non-critical processes, this selection does not play a decisive role, as the data quality of the MQTT data is often completely sufficient in these cases. However, for fast-running processes or those in which data gaps should not occur under any circumstances, it is advisable to specify which of the two recording and transmission paths should be selected for each measurement channel.

The transfer of .OSF files

The .OSF files generated by the connected devices therefore contain data that can be used to analyse and visualize your IoT infrastructure. optiCLOUD takes on the task of requesting, receiving, processing and saving these files. This is done automatically, without you as the user having to take any manual steps. This process ensures that all relevant data is available in our platform so that it can be used.

Background processing

The raw data contained in the OSF files is often recorded at a high frequency in the kilohertz range. As this leads to an enormously high volume of data, it would not be possible to visualize the data in acceptable periods of time, as the transmission of this amount of data alone would take a certain amount of time. It also makes no sense to try to display 10,000 data points on a screen with a width of 1000px, for example, as they would all overlap and the individual data points would form a single surface. However, plotting this area would take up a lot of the computer's processing power, which would also result in long waiting times. In order to display large amounts of data on optiCLOUD in a performant and meaningful way, algorithms are used to calculate intervals from the raw data in a kind of data pyramid. The use of these interval calculations makes it possible to significantly reduce the amount of data while retaining the information content. This is crucial for loading and displaying large time ranges efficiently. The resulting intervals serve as the basis for creating dashboards and performing initial view analyses.

Data pyramid and visualization

As soon as the OSF files are uploaded from the connected devices, our platform starts this background processing. It processes the data based on the aggregation levels set in the aggregation settings. This is done by optiCLOUD calculating the time intervals set here from the incoming data and reading out the MIN, MAX and LATEST values for each interval and saving them in the databases.

By entering additional OSF files, larger intervals can be calculated and thus the amount of data to be visualized later can be drastically reduced. If you now select a correspondingly large time range on a dashboard, the system calculates in the background which existing time interval is most suitable for this time range in conjunction with the "Maximum number of data points to be displayed" set here. The user is then shown the interval that comes closest to the maximum number of data points to be displayed without exceeding this number. This ensures that optiCLOUD users receive the most informative insight into their data that is still visually meaningful for the selected time period in a matter of seconds. Only when the selected time range is so small that it makes sense to plot the actual raw data will it be loaded and visualized.

Selection of the aggregation level

The selection of the aggregation level to be set here depends on the underlying processes. Different aggregation levels should be selected for fast processes that are measured in the kilohertz range than for slower processes that are measured. For the exact selection

warning

Once an aggregation level has been created, it can only be changed if the database is empty, as otherwise the entire history of data that has already been started would have to be recalculated.

Comparison real-time - historical data​

Differences between MQTT data and .OSF data​

Time delay and real-time transmission​

Reliability​

Data quality​

Conclusion​

The transfer of .OSF files​

Background processing​

Data pyramid and visualization​

Selection of the aggregation level​