Global Reset: Cassandra CDC → Debezium → Kafka (SASL/SSL via HAProxy) → Spark → Delta Lake

Scope: wipe & rebuild the full ThingsBoard data path
Keyspace: thingsboard (drop + recreate)
Kafka topics: tb.thingsboard.ts_kv_cf, tb.thingsboard.ts_kv_partitions_cf (auto-create enabled)
Delta root: /delta (bronze/, silver/, _checkpoints/)
Checkpoint to reset: /delta/_checkpoints/driver/cdc_fanout_upsert
Debezium state dir: /opt/debezium/state

This guide cleans everything produced by the pipeline in my earlier build — Cassandra 5.0.5 CDC → Kafka 4.0 (SASL/SSL) → Spark 3.5.6 → Delta Lake — then recreates ThingsBoard’s keyspace and restarts the flow. We start ThingsBoard before Spark so Debezium can auto-create Kafka topics, avoiding UnknownTopicOrPartition.

Destructive. Dropping the keyspace deletes all ThingsBoard data. Back up first.


TL;DR — Exact Order

  1. Stop ThingsBoard
  2. Stop Spark streaming + Standalone services
  3. Stop Debezium
  4. DROP keyspace thingsboard (while Cassandra is running)
  5. Stop Cassandra
  6. Clear CDC logs (/var/lib/cassandra/cdc_raw/*, /var/lib/cassandra/cdc_relocate/*)
  7. Start Cassandra
  8. Delete Kafka topics over SASL_SSL (HAProxy 9093)
  9. Delete Spark checkpoint (/delta/_checkpoints/driver/cdc_fanout_upsert)
  10. Delete Delta tables (only ts_kv_cf, ts_kv_partitions_cf) under bronze/ & silver/
  11. Delete Debezium state (/opt/debezium/state + offsets files)
  12. Recreate keyspace & TS schema
  13. Enable CDC on ts_kv_cf & ts_kv_partitions_cf
  14. Start Debezium (producer)
  15. Start ThingsBoard (writers → Debezium publishes → topics auto-create)
  16. Start Spark (consumer; startingOffsets as needed)

  1. Stop ThingsBoard (writers)
sudo systemctl stop thingsboard
systemctl is-active thingsboard   # expect: inactive

  1. Stop Spark (Standalone via systemd)
# If you wrapped your app in a unit, stop it first
# sudo systemctl stop my-spark-stream.service

sudo systemctl stop spark-worker.service
sudo systemctl stop spark-connect.service
sudo systemctl stop spark-master.service
ps -ef | grep -E 'spark-submit|SparkSubmit' | grep -v grep | awk '{print $2}' | xargs -r kill

  1. Stop Debezium
sudo systemctl stop debezium-cassandra

  1. DROP the entire ThingsBoard keyspace
cqlsh -e "DROP KEYSPACE IF EXISTS thingsboard;"

  1. Stop Cassandra
sudo systemctl stop cassandra

  1. Clear Cassandra CDC logs
sudo rm -rf /var/lib/cassandra/cdc_raw/*
sudo rm -rf /var/lib/cassandra/cdc_relocate/*

  1. Start Cassandra
sudo systemctl start cassandra

  1. Delete Kafka topics (SASL_SSL via HAProxy 9093)

Create /home/administrator/client.properties (match your working producer):

security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user1" password="password1";
ssl.endpoint.identification.algorithm=https
# If using a private CA/self-signed, add a proper truststore:
# ssl.truststore.location=/path/to/truststore.jks
# ssl.truststore.password=<password>

Delete & verify:

/opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka.maksonlee.com:9093 \
  --command-config /home/administrator/client.properties \
  --delete --topic tb.thingsboard.ts_kv_cf

/opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka.maksonlee.com:9093 \
  --command-config /home/administrator/client.properties \
  --delete --topic tb.thingsboard.ts_kv_partitions_cf

/opt/kafka/bin/kafka-topics.sh \
  --bootstrap-server kafka.maksonlee.com:9093 \
  --command-config /home/administrator/client.properties \
  --list | grep '^tb\.thingsboard\.' || echo "✓ topics deleted (auto-create on first publish)"


  1. Delete Spark checkpoint (reset offsets)
sudo rm -rf /delta/_checkpoints/driver/cdc_fanout_upsert

(Remove the entire folder: offsets/, sources/, commits/, metadata, possibly state/.)


  1. Delete Delta tables (only these two)
# Inspect first
find /delta/bronze -type d \( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -print
find /delta/silver -type d \( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -print

# Delete targeted
sudo find /delta/bronze /delta/silver -type d \
  \( -name 'ts_kv_cf' -o -name 'ts_kv_partitions_cf' \) -prune -exec rm -rf {} +

(If registered in a metastore (Hive/Glue), drop those entries too.)


  1. Delete Debezium local state (global cold-start)
sudo rm -rf /opt/debezium/state
sudo rm -f  /opt/debezium/offsets.dat \
            /opt/debezium/commitlog_offset.properties \
            /opt/debezium/snapshot_offset.properties

  1. Recreate keyspace & time-series schema
cqlsh -f /usr/share/thingsboard/data/cassandra/schema-keyspace.cql
cqlsh -f /usr/share/thingsboard/data/cassandra/schema-ts.cql

  1. Enable CDC on the two tables
ALTER TABLE thingsboard.ts_kv_cf            WITH cdc = true;
ALTER TABLE thingsboard.ts_kv_partitions_cf WITH cdc = true;

(Enable CDC on any other TB tables you plan to stream.)


  1. Start Debezium
sudo systemctl start debezium-cassandra

  1. Start ThingsBoard (writers → Debezium publishes → topics auto-create)
sudo systemctl start thingsboard
systemctl is-active thingsboard   # expect: active

  1. Start Spark
sudo systemctl start spark-master.service
sudo systemctl start spark-worker.service
sudo systemctl start spark-connect.service

Run streaming job:

spark-submit --master spark://spark.maksonlee.com:7077 \
  --deploy-mode client \
  cdc_to_delta.py

That’s the complete, copy-pasteable post matching your environment, including dropping the keyspace, SASL/SSL Kafka topic management, Debezium state wipe, Delta cleanup, and your exact spark-submit command.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top