Function partition_hash not working as expected

dorian_yugabyte · January 20, 2023, 4:08pm

I got this as a response:

Not sure increasing timeout is the solution. The data size could increase and invalidate the timeout we select. Also, the current load on the cluster could make the execution time unstable.
Figuring out smaller batches to delete using already mentioned strategies is the way to go.

What do you think?

ulim · January 20, 2023, 4:44pm

I can certainly try that, but I wonder how it would produce less stress on the cluster if I do many small batches instead of a few large ones. Sure, RAM can run out, if the batch becomes too large, but 50000 Delete-Statements is a few MBs.

At the end of the day the amount of work to be done is the same. But yeah, I will keep it in mind to try that.

dorian_yugabyte · January 20, 2023, 7:18pm

You can send several range-queries, and delete rows while iterating them. In pseudo-code:

start_key = x
while True:
    query 500 rows where start_key>x:
    delete 500 rows
    start_key = last_row[start_key]

dorian_yugabyte · January 20, 2023, 7:39pm

The total work will actually be bigger because you’re getting rows to the client and sending many queries compared to a single DELETE.

But the database in general is optimized for many concurrent small operations compared to few big ones. Think OLTP vs OLAP database as an example.

This can also be resumable from the client. Assuming you keep track of the “start_key”. And can also delete much faster, in case of spread dataset.

You can also have 1 thread reading and several threads to send the deletes. The delete queries would be spread over many threads on the server side and won’t interfere with each other.

Topic		Replies	Views
Why DELETE query with yb_hash_code() degrades performance? General	5	929	April 11, 2022
How is data stored internally in YugaByte General	1	2416	March 25, 2019
TTL with secondary index General	1	723	January 13, 2021
SELECT with yb_hash_code() and DELETE General	3	886	April 9, 2022
cassandra.OperationTimedOut General	2	3167	August 19, 2019

Function partition_hash not working as expected

Related topics