From Opaque Streams to Explainable Systems: Semantic MQTT Integration at the Edge
Niklas Doerner, Maria MaleshkovaIndustrial systems increasingly rely on MQTT-based message streaming to enable automated, data-driven production processes at the network edge. While semantic models such as the SSN/SOSA ontology enable machine-interpretable descriptions of observations and actuations, an explicit model of message transport is rarely considered. Consequently, MQTT-based communication remains opaque, particularly regarding information processing, hindering the semantic analysis of application-specific topic structures and the behavior of transport protocols. To close this gap, this work introduces the revised MQTT4SSN ontology as a key contribution, extending existing semantic models with protocol-aware representations of MQTT entities, control packets, and transport-level interactions. MQTT4SSN enables end-to-end semantic traceability, from sensor observations and actuator controls to the underlying message transmission within distributed systems. Building on this contribution, the MQTT2RDF integration framework incorporates MQTT4SSN as its core to capture live MQTT traffic and represent both payload meaning and transport-level provenance within an RDF knowledge graph. This work presents a novel approach for representing edge computing and information processing over MQTT, addressing two key challenges. First, the framework supports semantic interpretation of topic hierarchies and provides configurable mappings between MQTT topics, payload structures, and observation or actuation semantics. This approach facilitates the setup of edge computing systems and enables context-aware subscription management and structured data formatting, thereby improving interoperability between heterogeneous deployments. Second, transport-level provenance analytics provide a semantic basis for query-based detection, classification support, and diagnostic analysis of malformed or incomplete MQTT communication. The approach provides explainable, traceable information processing through transport provenance, which is essential for safety-critical industrial environments. The contributions are validated through an industrial use case from a production environment, demonstrating its applicability for system monitoring, troubleshooting, and semantic analytics of MQTT-based infrastructures.