Criteria and requirements
The evaluation of the data formats takes into account the following main criteria:
- Application area: How well does the data format fit the specific requirements of rail transportation, including the acquisition and archiving of measurement and diagnostic data.
- Performance: Efficiency of the format in processing large amounts of data.
- Packing density: Efficiency of data compression and the resulting file size.
- Metadata: Support and flexibility of the metadata structure.
- Simplicity and complexity: Effort required to implement and use the format.
- Robustness: Resilience of the format to interruptions such as power outages.
- Tool support: Availability and quality of tools for processing and analyzing the data.
- Documentation: Completeness and comprehensibility of format documentation.
- Dissemination: Acceptance and use of the format in the industry.
- Human readability: Readability of the format for developers and technicians.
- Extensibility and longevity: Ability of the format to integrate future requirements and changes.
Ranking list of data formats
- OSF4 (optiMEAS Streaming Format)
- Application area: Very suitable for measurement and automation data.
- Performance: High, efficient in data processing.
- Packing density: Low storage requirements, compressible.
- Metadata: Flexible and extensively supported.
- Simplicity: Comparatively easy to implement.
- Robustness: Extremely robust against interruptions.
- Tool support: Availability of tools still expandable.
- Documentation: Well documented.
- Dissemination: Currently still low, but growing.
- Human readability: Acceptable.
- Extensibility: Very expandable and future-proof.
- MDF4 (Measurement Data Format)
- Application area: Especially for measurement data in the automotive industry, highly customizable.
- Performance: Very high.
- Packing density: Good.
- Metadata: Extensive and detailed.
- Simplicity: More complex than OSF4.
- Robustness: Less robust against switching off.
- Tool support: Widely used and well supported.
- Documentation: Very good.
- Distribution: Widely used in the automotive industry.
- Human readability: Low.
- Extensibility: Good extensibility.
- HDF5 (Hierarchical Data Format)
- Application area: Wide application, especially in science.
- Performance: Good, but not leading.
- Packing density: Good.
- Metadata: Very well supported.
- Simplicity: Complex to implement.
- Robustness: Not very robust against disabling.
- Tool support: Very good in the scientific and research community.
- Documentation: Excellent.
- Dissemination: Very common in the scientific community.
- Human readability: Low.
- Extensibility: Very extensible.
- PARQUET
- Application area: Good for tabular and structured data.
- Performance: Good for large, dense tables.
- Packing density: Efficient for large files.
- Metadata: Supported, but less flexible.
- Simplicity: Comparatively simple.
- Robustness: Less robust for continuous data streams.
- Tool support: Very good in the Hadoop ecosystem.
- Documentation: Good.
- Distribution: Widely used in the Big Data space.
- Human readability: Medium.
- Extensibility: Good, but with limitations for sparse tables.
- CSV/TSV
- Application area: Basic and broadly applicable.
- Performance: Low.
- Packing density: Very poor.
- Metadata: Very limited.
- Simplicity: Very simple.
- Ruggedness: Not robust.
- Tool support: Very widely used.
- Documentation: Simple and widely used.
- Dissemination: Very widespread.
- Human readability: Very high.
- Extensibility: Very limited.