Goal of this post
Document the exact debugging path we followed when our Delta table started showing more rows than Cassandra—step by step—until we proved it was caused by per-cell TTL on specific value columns. Only after proving it did we conclude this behavior matches our architecture (warehouse keeps all history).
Stack & context
ThingsBoard CE 4.x on Cassandra (thingsboard.ts_kv_cf) → CDC (Debezium) → Kafka → Spark 4.0 → Delta Lake. For audits we used cqlsh COPY (no Spark Cassandra connector in this env).
- Freeze the pipeline (no new writes)
We stopped ThingsBoard first to quiesce writes. With the source idle, Cassandra counts and the CDC tail stay stable while we compare.
- Symptom: counts diverge after a few days
- Cassandra (LOCAL_QUORUM): ≈ 1,298,381
- Delta (Spark): ≈ 1,301,300
- Quick sanity check: duplicates only
Are we double-writing rows in Delta?
WITH dup AS (
SELECT entity_type, entity_id, key, ts, COUNT(*) AS c
FROM ts_kv_cf
GROUP BY entity_type, entity_id, key, ts
HAVING COUNT(*) > 1
)
SELECT
SUM(c - 1) AS extra_rows_due_to_dups,
COUNT(*) AS duplicate_groups
FROM dup;
-- Result in our case: 0 duplicates
Duplicates ruled out. Move on.
- Establish ground truth: snapshot Cassandra, load into Spark
Export from Cassandra
-- cqlsh
CONSISTENCY LOCAL_QUORUM;
COPY thingsboard.ts_kv_cf (
entity_type, entity_id, key, partition, ts, str_v, long_v, dbl_v, bool_v, json_v
) TO '/tmp/ts_kv_cf_full.csv' WITH HEADER=TRUE;
Load snapshot in Spark
-- spark-sql
CREATE OR REPLACE TEMP VIEW cass_full (
entity_type STRING, entity_id STRING, key STRING, partition BIGINT, ts BIGINT,
str_v STRING, long_v BIGINT, dbl_v DOUBLE, bool_v BOOLEAN, json_v STRING
)
USING csv
OPTIONS (path '/tmp/ts_kv_cf_full.csv', header 'true', inferSchema 'false');
SELECT COUNT(*) AS cass_rows FROM cass_full; -- ~1,298,381
SELECT COUNT(*) AS delta_rows FROM ts_kv_cf; -- ~1,301,300
- Diff the two sets (who has what?)
-- Delta-only (present in Delta, missing in Cassandra)
WITH d AS (SELECT entity_type, entity_id, key, ts FROM ts_kv_cf),
c AS (SELECT entity_type, entity_id, key, ts FROM cass_full)
SELECT COUNT(*) AS delta_only
FROM d LEFT ANTI JOIN c
ON d.entity_type=c.entity_type AND d.entity_id=c.entity_id AND d.key=c.key AND d.ts=c.ts; -- 2,919
-- Cassandra-only
WITH d AS (...), c AS (...)
SELECT COUNT(*) AS cass_only
FROM c LEFT ANTI JOIN d
ON c.entity_type=d.entity_type AND c.entity_id=d.entity_id AND c.key=d.key AND c.ts=d.ts; -- 0
All the drift is Delta-only: 2,919 rows.
- What exactly are those 2,919 rows?
CREATE OR REPLACE TEMP VIEW delta_only AS
SELECT d.*
FROM ts_kv_cf d
LEFT ANTI JOIN cass_full c
ON d.entity_type=c.entity_type AND d.entity_id=c.entity_id AND d.key=c.key AND d.ts=c.ts;
SELECT key, COUNT(*) AS cnt
FROM delta_only
GROUP BY key
ORDER BY cnt DESC
LIMIT 30;
Result (each exactly 417 rows):
cpuCount 417
totalMemory 417
memoryUsage 417
clusterMode 417
totalDiscSpace 417
discUsage 417
cpuUsage 417
Seven keys × 417 = 2,919 → the gap is entirely ThingsBoard system metrics (not business telemetry).
- Verify in Cassandra: do these keys use TTL (and which column has it)?
In ts_kv_cf, a value is stored in exactly one of five columns by type:
long_v (ints), dbl_v (floats), str_v, bool_v, json_v.
TTL is cell-level in Cassandra. TTL(column) returns the remaining seconds for that cell; NULL
means no TTL.
-- cqlsh
SELECT ts,
WRITETIME(long_v) AS wt_us,
TTL(long_v) AS ttl_l,
TTL(dbl_v) AS ttl_d,
TTL(str_v) AS ttl_s,
TTL(bool_v) AS ttl_b,
TTL(json_v) AS ttl_j,
toUnixTimestamp(now()) AS now_ms
FROM thingsboard.ts_kv_cf
WHERE key='cpuUsage'
LIMIT 50 ALLOW FILTERING;
What we saw
- ttl_l (i.e., TTL(long_v)) had positive values (e.g., 17, 88, 150, …).
- ttl_d/ttl_s/ttl_b/ttl_j were NULL.
Conclusion: for cpuUsage, the value is in long_v, and that cell has TTL.
- Recover the original TTL (not just remaining)
TTL(column) is remaining seconds. Combine with WRITETIME(column) and now() to estimate original TTL:
orig_ttl_s ≈ TTL(column) + (now_ms/1000 − WRITETIME(column)/1e6)
Example (real row): WRITETIME(long_v)=1755345861005858, TTL(long_v)=17, now_ms=1755950644989 → ≈ 604,801 s ≈ 7 days (7×86,400 = 604,800).
Multiple rows clustered around ~7 days, confirming a consistent per-cell TTL
- Confirm table settings & CDC behavior
From DESCRIBE TABLE thingsboard.ts_kv_cf:
PRIMARY KEY ((entity_type, entity_id, key, partition), ts)
default_time_to_live = 0
cdc = true
- default_time_to_live = 0 → no table-wide TTL; the TTL we observed is set on write to the value cell (ThingsBoard Save Timeseries node or metadata.TTL).
- cdc = true → TTL expiry does not emit CDC delete events. Debezium/Kafka won’t see deletes when a cell ages out.
- The “aha”: why Delta > Cassandra
Only after the steps above did we accept the explanation:
- ThingsBoard writes those system metrics with per-cell TTL ≈ 7 days.
- When TTL expires, Cassandra silently stops returning those cells.
- CDC emits no delete on TTL expiry.
- Our Spark/Delta sink is append-only—we do not delete from the warehouse.
Therefore Delta keeps historical rows that Cassandra has already expired → over time Delta > Cassandra.
- So… is this a problem? Our decision
After proving the mechanics, we concluded: this is not a bug or limitation, it reflects our design:
- Operational store (Cassandra): hot data with TTL on selected keys (system metrics) to control footprint.
- Data warehouse (Delta): append-only, keeps all history for audits/replay. We don’t delete in Spark/Delta.
- CDC reality: TTL expiry doesn’t produce deletes; the lake won’t mirror removals and that’s fine for us.