Description
This dataset is associated with the master's dissertation entitled “AirCEP – An Application for Monitoring Air Quality with Complex Event Processing”, developed at the Institute of Computing, University of Campinas (UNICAMP), Brazil, in 2025.
Study Overview
The study addresses the challenges of air quality monitoring in regions with limited infrastructure and high operational costs, proposing a modular and scalable architecture called AirCEP. The system integrates Edge Computing, configurable data filtering mechanisms, Complex Event Processing (CEP), and real-time visualization to reduce network traffic, computational resource consumption, and alert latency.
AirCEP was designed to:
- Minimize network bandwidth usage through configurable data filters applied at the edge.
- Reduce CPU and memory consumption during real-time stream processing.
- Detect complex environmental events from continuous air quality measurements.
- Generate real-time alerts for adverse air quality conditions.
- Provide a monitoring dashboard for visualization and decision support.
The architecture is composed of three main components:
- Data Router (Edge Layer): Applies configurable filters to reduce data volume before transmission.
- Stream Processing Engine: Implements Complex Event Processing using Apache Flink to analyze continuous data streams and detect patterns of interest.
- Visualization Dashboard: Built with Grafana to display real-time metrics, historical data, and alerts.
Dataset Description
The dataset includes the measurements and experimental results used to evaluate the AirCEP architecture. It contains:
- Air quality measurements from monitoring stations and/or simulated sensors.
- Pollutants monitored:
- PM2.5
- PM10
- SO₂ (Sulfur Dioxide)
- NO₂ (Nitrogen Dioxide)
- O₃ (Ozone)
- CO (Carbon Monoxide)
- Timestamped readings collected as continuous data streams.
- Experimental logs from two deployment scenarios:
- Local environment: All components deployed on the same machine.
- Remote environment: Sensors physically separated from the processing unit.
- Performance metrics collected during experiments:
- Network traffic (bytes transmitted)
- CPU utilization
- Memory consumption
- End-to-end latency
- Filter configurations and CEP rule definitions used for event detection.
Data Structure
The dataset is structured to support reproducibility and reuse:
- Time-series formatted records (timestamp + pollutant measurements).
- System performance logs aligned with experimental runs.
- Configuration files defining filtering thresholds and CEP rules.
- Scenario identification metadata (local vs. remote deployment).
Methodology Context
The experiments compare:
- Baseline transmission (without filtering) versus filtered transmission.
- Resource consumption in different deployment architectures.
- Latency impact of physical distance between sensors and processing nodes.
Results demonstrate:
- Up to ~30% reduction in network traffic.
- Up to ~19% reduction in CPU consumption.
- Latency more strongly influenced by physical distance than filtering itself.
Reuse Potential
This dataset can be reused for:
- Research in real-time stream processing.
- Complex Event Processing evaluation.
- Edge Computing performance studies.
- Network optimization experiments.
- Air quality monitoring systems benchmarking.
- Smart city and IoT research.
- Comparative analysis between cloud-centric and edge-based architectures.
Researchers may replicate the experimental setup, validate performance trade-offs, benchmark alternative stream processing engines, or test new event detection rules.