Skip to main content

Criteria and requirements

The evaluation of the data formats takes into account the following main criteria:

  1. Application area: How well does the data format fit the specific requirements of rail transportation, including the acquisition and archiving of measurement and diagnostic data.
  2. Performance: Efficiency of the format in processing large amounts of data.
  3. Packing density: Efficiency of data compression and the resulting file size.
  4. Metadata: Support and flexibility of the metadata structure.
  5. Simplicity and complexity: Effort required to implement and use the format.
  6. Robustness: Resilience of the format to interruptions such as power outages.
  7. Tool support: Availability and quality of tools for processing and analyzing the data.
  8. Documentation: Completeness and comprehensibility of format documentation.
  9. Dissemination: Acceptance and use of the format in the industry.
  10. Human readability: Readability of the format for developers and technicians.
  11. Extensibility and longevity: Ability of the format to integrate future requirements and changes.

Ranking list of data formats

  1. OSF4 (optiMEAS Streaming Format)
    1. Application area: Very suitable for measurement and automation data.
    2. Performance: High, efficient in data processing.
    3. Packing density: Low storage requirements, compressible.
    4. Metadata: Flexible and extensively supported.
    5. Simplicity: Comparatively easy to implement.
    6. Robustness: Extremely robust against interruptions.
    7. Tool support: Availability of tools still expandable.
    8. Documentation: Well documented.
    9. Dissemination: Currently still low, but growing.
    10. Human readability: Acceptable.
    11. Extensibility: Very expandable and future-proof.
  2. MDF4 (Measurement Data Format)
    1. Application area: Especially for measurement data in the automotive industry, highly customizable.
    2. Performance: Very high.
    3. Packing density: Good.
    4. Metadata: Extensive and detailed.
    5. Simplicity: More complex than OSF4.
    6. Robustness: Less robust against switching off.
    7. Tool support: Widely used and well supported.
    8. Documentation: Very good.
    9. Distribution: Widely used in the automotive industry.
    10. Human readability: Low.
    11. Extensibility: Good extensibility.
  3. HDF5 (Hierarchical Data Format)
    1. Application area: Wide application, especially in science.
    2. Performance: Good, but not leading.
    3. Packing density: Good.
    4. Metadata: Very well supported.
    5. Simplicity: Complex to implement.
    6. Robustness: Not very robust against disabling.
    7. Tool support: Very good in the scientific and research community.
    8. Documentation: Excellent.
    9. Dissemination: Very common in the scientific community.
    10. Human readability: Low.
    11. Extensibility: Very extensible.
  4. PARQUET
    1. Application area: Good for tabular and structured data.
    2. Performance: Good for large, dense tables.
    3. Packing density: Efficient for large files.
    4. Metadata: Supported, but less flexible.
    5. Simplicity: Comparatively simple.
    6. Robustness: Less robust for continuous data streams.
    7. Tool support: Very good in the Hadoop ecosystem.
    8. Documentation: Good.
    9. Distribution: Widely used in the Big Data space.
    10. Human readability: Medium.
    11. Extensibility: Good, but with limitations for sparse tables.
  5. CSV/TSV
    1. Application area: Basic and broadly applicable.
    2. Performance: Low.
    3. Packing density: Very poor.
    4. Metadata: Very limited.
    5. Simplicity: Very simple.
    6. Ruggedness: Not robust.
    7. Tool support: Very widely used.
    8. Documentation: Simple and widely used.
    9. Dissemination: Very widespread.
    10. Human readability: Very high.
    11. Extensibility: Very limited.