Analyze ThingsBoard IoT Data in JupyterLab Using PySpark

After filtering and transforming telemetry from ThingsBoard into a clean Delta Lake table, you can begin exploring the data interactively using JupyterLab and PySpark.

This guide walks through installing required packages, connecting to Spark, and performing aggregation, health classification, and outlier detection.


Prerequisites

Before starting, make sure:

device_id, device_name, ts_datetime (timestamp), event_date, temperature, humidity

  1. Install Required Packages

At the top of your notebook, install required packages:

import sys
!{sys.executable} -m pip install --quiet pyspark==3.5.6 pandas matplotlib

  1. Start SparkSession
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("ThingsBoard IoT Analysis") \
    .enableHiveSupport() \
    .getOrCreate()

  1. Daily Aggregation

Aggregate temperature and humidity per device per day:

daily_df = spark.sql("""
SELECT
  device_id,
  device_name,
  event_date,
  AVG(temperature) AS avg_temperature,
  AVG(humidity) AS avg_humidity
FROM thingsboard.temp_humidity_processed
GROUP BY device_id, device_name, event_date
""")

daily_df.show(10)

  1. Hourly Aggregation

Summarize values per device per hour:

hourly_df = spark.sql("""
SELECT
  device_id,
  device_name,
  DATE(ts_datetime) AS event_date,
  HOUR(ts_datetime) AS event_hour,
  AVG(temperature) AS avg_temperature,
  AVG(humidity) AS avg_humidity
FROM thingsboard.temp_humidity_processed
GROUP BY
  device_id,
  device_name,
  DATE(ts_datetime),
  HOUR(ts_datetime)
""")

hourly_df.show(10)

  1. Outlier Detection

Flag readings outside expected range:

outlier_df = spark.sql("""
SELECT *
FROM thingsboard.temp_humidity_processed
WHERE temperature NOT BETWEEN 10 AND 40
   OR humidity NOT BETWEEN 20 AND 90
""")

outlier_df.show(10)

  1. Compare Device Behavior

Rank devices by average/min/max temperature and humidity:

behavior_df = spark.sql("""
SELECT
  device_name,
  AVG(temperature) AS avg_temp,
  MIN(temperature) AS min_temp,
  MAX(temperature) AS max_temp,
  AVG(humidity) AS avg_humid
FROM thingsboard.temp_humidity_processed
GROUP BY device_name
ORDER BY avg_temp DESC
""")

behavior_df.show()

  1. Visualize

Plot average temperature per device:

import pandas as pd
import matplotlib.pyplot as plt

pandas_df = behavior_df.toPandas()

pandas_df.plot(kind="bar", x="device_name", y="avg_temp", legend=False)
plt.ylabel("Avg Temperature (°C)")
plt.title("Average Temperature per Device")
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top