What's the best way to count number of rows in a YCQL table with some degree of parallelism?

dorian_yugabyte · August 11, 2020, 10:25am

[Question posted by a user on YugabyteDB Community Slack ]

What’s the best way to count number of rows in a YCQL table with some degree of parallelism?

kannan · August 11, 2020, 3:58pm

One option to do this kind of parallel processing (reporting or analytic queries) on a YCQL table would be to use Spark.

For the simpler SELECT Count(*), if you don’t want to fire up a spark job to get the table counts, and need something lighter weight, you can try this Python script.

gist.github.com

https://gist.github.com/kmuthukk/5899f38a147e2ccd36ac8fc2a81ca5c7

ycql_table_row_count.py

# pip install yb-cassandra-driver

from cassandra.cluster import Cluster
from cassandra.cluster import ResultSet
import time
import random
import os
from functools import partial

from multiprocessing.dummy import Pool as ThreadPool

This file has been truncated. show original

You’ll need to update these params (at least the first two) in the script:

cluster = Cluster(['127.0.0.1'])     --> change to a comma separated list of a few IPs in the cluster
keyspace_name="ybdemo_keyspace"      --> change to keyspace you want
num_tasks_per_table=4096             --> can leave this setting as is
num_parallel_tasks=8                 --> can leave this setting as is

The script will find the all tables in a YCQL keyspace, and then for each table, find out the partition columns, and then do SELECT COUNT(*) queries of the form below for each “sub-task/partition” with a max parallelism you specify above and aggregate the sums.

The queries issued will be of the form:

SELECT count(*) as rows FROM k2.test 
WHERE partition_hash(partition_key_col) >= ? AND partition_hash(partition_key_col) <= ?

for various partition slices like (0, 15) (16, 31) and so on.

Topic		Replies	Views
YCQL Count Number of Records in Table General	2	927	September 11, 2022
Best way to export YCQL data General	2	876	July 23, 2019
Guarantees of YCQL counters General	5	1090	March 27, 2019
Yugabyte Query Pattern / Tracing in CQL General	5	783	August 25, 2022
Querying proper instance(leader) based on parition key General	4	40	June 11, 2025

What's the best way to count number of rows in a YCQL table with some degree of parallelism?

Related topics