TTL with secondary index

[Question posted by a user on YugabyteDB Community Slack ]

We need to expire data using a TTL with a table that uses a secondary index. How would we remove data that currently expires using a TTL, but can’t due to the need for a secondary index?

You will need to do explicit deletes.

  1. These deletes can be point deletes, or say range deletes within a partition key.
  2. If they are range deletes within a partition key, there was an issue where we were scanning the whole partition key, instead of just the range. This was recently addressed and backported to 2.1.x/2.2/etc. around last week.
  3. If using range deletes, or point deletes, it would be still be better if they can control the number of rows touched by the range or the batch sizes – we generally don’t prefer to do very large multi-row operations. So batch size of 256 or 512 would be ideal.
  4. They can write a utility to make the scan/delete job run in parallel using a set of worker threads. This can be modeled similar to how we do “row counts” in parallel using partition_hash

On #4, recently, I had responded thus, to a community user question:Using the partition_hash function (YCQL equivalent of the token function in Apache Cassandra) to split the 0…64K hash space of a table is a reliable way (stable API) to partition the work among a set of worker tasks.Here’s an example python program that uses the same concept to count the total number of rows in a table using a configurable number of worker threads. a Go version of the same:yb-tools/ycrc at main · yugabyte/yb-tools · GitHub

#2 was addressed in issue: