Scope: wipe & rebuild the full ThingsBoard data path
Keyspace: thingsboard
(drop + recreate)
Kafka topics: tb.thingsboard.ts_kv_cf
, tb.thingsboard.ts_kv_partitions_cf
(auto-create enabled)
Delta root: /delta
(bronze/
, silver/
, _checkpoints/
)
Checkpoint to reset: /delta/_checkpoints/driver/cdc_fanout_upsert
Debezium state dir: /opt/debezium/state
This guide cleans everything produced by the pipeline in my earlier build — Cassandra 5.0.5 CDC → Kafka 4.0 (SASL/SSL) → Spark 3.5.6 → Delta Lake — then recreates ThingsBoard’s keyspace and restarts the flow. We start ThingsBoard before Spark so Debezium can auto-create Kafka topics, avoiding UnknownTopicOrPartition
.
Destructive. Dropping the keyspace deletes all ThingsBoard data. Back up first.
TL;DR — Exact Order
- Stop ThingsBoard
- Stop Spark streaming + Standalone services
- Stop Debezium
- DROP keyspace
thingsboard
(while Cassandra is running) - Stop Cassandra
- Clear CDC logs (
/var/lib/cassandra/cdc_raw/*
,/var/lib/cassandra/cdc_relocate/*
) - Start Cassandra
- Delete Kafka topics over SASL_SSL (HAProxy 9093)
- Delete Spark checkpoint (
/delta/_checkpoints/driver/cdc_fanout_upsert
) - Delete Delta tables (only
ts_kv_cf
,ts_kv_partitions_cf
) underbronze/
&silver/
- Delete Debezium state (
/opt/debezium/state
+ offsets files) - Recreate keyspace & TS schema
- Enable CDC on
ts_kv_cf
&ts_kv_partitions_cf
- Start Debezium (producer)
- Start ThingsBoard (writers → Debezium publishes → topics auto-create)
- Start Spark (consumer;
startingOffsets
as needed)
- Stop ThingsBoard (writers)
sudo systemctl stop thingsboard
systemctl is-active thingsboard # expect: inactive
- Stop Spark (Standalone via systemd)
# If you wrapped your app in a unit, stop it first
# sudo systemctl stop my-spark-stream.service
sudo systemctl stop spark-worker.service
sudo systemctl stop spark-connect.service
sudo systemctl stop spark-master.service
ps -ef | grep -E 'spark-submit|SparkSubmit' | grep -v grep | awk '{print $2}' | xargs -r kill
- Stop Debezium
sudo systemctl stop debezium-cassandra
- DROP the entire ThingsBoard keyspace
cqlsh -e "DROP KEYSPACE IF EXISTS thingsboard;"
- Stop Cassandra
sudo systemctl stop cassandra
- Clear Cassandra CDC logs
sudo rm -rf /var/lib/cassandra/cdc_raw/*
sudo rm -rf /var/lib/cassandra/cdc_relocate/*
- Start Cassandra
sudo systemctl start cassandra
- Delete Kafka topics (SASL_SSL via HAProxy 9093)
Create /home/administrator/client.properties
(match your working producer):
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user1" password="password1";
ssl.endpoint.identification.algorithm=https
# If using a private CA/self-signed, add a proper truststore:
# ssl.truststore.location=/path/to/truststore.jks
# ssl.truststore.password=<password>
Delete & verify:
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server kafka.maksonlee.com:9093 \
--command-config /home/administrator/client.properties \
--delete --topic tb.thingsboard.ts_kv_cf
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server kafka.maksonlee.com:9093 \
--command-config /home/administrator/client.properties \
--delete --topic tb.thingsboard.ts_kv_partitions_cf
/opt/kafka/bin/kafka-topics.sh \
--bootstrap-server kafka.maksonlee.com:9093 \
--command-config /home/administrator/client.properties \
--list | grep '^tb\.thingsboard\.' || echo "✓ topics deleted (auto-create on first publish)"
- Delete Spark checkpoint (reset offsets)
sudo rm -rf /delta/_checkpoints/driver/cdc_fanout_upsert
(Remove the entire folder: offsets/
, sources/
, commits/
, metadata
, possibly state/
.)
- Delete Delta tables (only these two)
# Inspect first
find /delta/bronze -type d \( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -print
find /delta/silver -type d \( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -print
# Delete targeted
sudo find /delta/bronze /delta/silver -type d \
\( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -prune -exec rm -rf {} +
(If registered in a metastore (Hive/Glue), drop those entries too.)
- Delete Debezium local state (global cold-start)
sudo rm -rf /opt/debezium/state
sudo rm -f /opt/debezium/offsets.dat \
/opt/debezium/commitlog_offset.properties \
/opt/debezium/snapshot_offset.properties
- Recreate keyspace & time-series schema
cqlsh -f /usr/share/thingsboard/data/cassandra/schema-keyspace.cql
cqlsh -f /usr/share/thingsboard/data/cassandra/schema-ts.cql
- Enable CDC on the two tables
ALTER TABLE thingsboard.ts_kv_cf WITH cdc = true;
ALTER TABLE thingsboard.ts_kv_partitions_cf WITH cdc = true;
(Enable CDC on any other TB tables you plan to stream.)
- Start Debezium
sudo systemctl start debezium-cassandra
- Start ThingsBoard (writers → Debezium publishes → topics auto-create)
sudo systemctl start thingsboard
systemctl is-active thingsboard # expect: active
- Start Spark
sudo systemctl start spark-master.service
sudo systemctl start spark-worker.service
sudo systemctl start spark-connect.service
Run streaming job:
spark-submit --master spark://spark.maksonlee.com:7077 \
--deploy-mode client \
cdc_to_delta.py
That’s the complete, copy-pasteable post matching your environment, including dropping the keyspace, SASL/SSL Kafka topic management, Debezium state wipe, Delta cleanup, and your exact spark-submit command.