How would i implement a circular buffer in YCQL?

Noorain_Panjwani · March 19, 2019, 2:04pm

In my model, I want to store a variable sized circular buffer for each key. The buffer size may go upto 2k elements. Was wondering how would that be possible.

karthik · March 24, 2019, 4:34pm

Hi @Noorain_Panjwani,

This will depend on exactly when the circular buffer will wrap around. Can you please explain that part a bit?

For example, in a fixed size buffer, this is relatively easier to implement since you know when the wrap around happens (at a certain element count that is being added).

Noorain_Panjwani · March 25, 2019, 2:31am

In my use case, another table would hold the length of the buffer. We can assume both the keys to be the same or easily derivable from each other. The length of the buffer changes dynamically based on user behaviour.

karthik · March 25, 2019, 4:11am

Got it… so the circular buffer for one key needs to hold n entries at any time (the value of n can be dynamic. Lets say you’re implementing the buffer for key k1.

NOTE: I assume the app is distributed, but if exactly one instance of the app is running, things become simpler. Both solutions are shown below.

Distributed app technique

The table circular_buffer_info will store the following for the key k1: buffer_length: n and the max_entry_count: .... The value of max_entry_count is incremented each time an entry is inserted.
The table circular_buffer will store the actual value for the key. The keys for this table will be (k1, <entry_count>). For every insert into the buffer, we increment entry_count for this key and update it in the circular_buffer_info table (max_entry_count column).
Each select query does a LIMIT n along with it. So this will implicitly return only the entries in this buffer.
Periodically, issue a delete from circular_buffer where entry_count > n to flush the older values. Depending on the read volume, you can do this with every read as well.
If you know the values will get refilled in a certain time-period, you can simply set a TTL for auto-expiry.

Single app instance technique

The table circular_buffer_lengths will store the key-value pair (k1, n).
The table circular_buffer will store the actual value for the key. The keys for this table will be k1-0, k1-1, k1-2, … k1-(n-1), the values for these keys are stored in the value column. In the app, you store (or read) the value n, and roll the next entry to k1-0.

Noorain_Panjwani · March 25, 2019, 6:51am

My use case is distributed in nature (multiple app instances are running).

So if i understand this correctly, my entry-count is the index of the buffer? I’ll have to use transactions to keep the two tables in sync?

can you be a bit more clear on the fields in each table? Isn’t buffer_length the same as n? How do we roll over or reset the entry_count in the table circular_buffer?

karthik · March 26, 2019, 4:41am

I have simplified things a bit here, we can deal with max_entry_count if we need it later on.

The circular_buffer_info table has the following columns:

key which represents the queue we are dealing with. This could just be the user id.
buffer_length which is the current length of the circular buffer

The circular_buffer table contains the following fields:

key which represents the primary key (user id)
entry_id which is a monotonically increasing number for each newly added entry. In practice, this can simply be the timestamp of insertion of the entry (millisecond resolution) or timeuuid (if you expect multiple inserts in the same millisecond)
value which is the value of the circular buffer.

Note that the primary key in the circular_buffer_table is ((key), entry_id) where key is the partitioning column and entry_id the clustering column with DESC sort order.

The transactional guarantees needed and how often the circular buffer length changes will change the queries you perform. Typically, in such applications, you need consistency of data but may not need transactional guarantees when the buffer length is changing, but this depends on your use case. Here is a simple version:

Inserts

Simply insert data into the circular_buffer table. Each insert is an append into the table.

Reads

First read the size of the buffer. This is stored in the circular_buffer_info table buffer_length column. Let us say this value is N.
Select the last N entries from the circular_buffer table. This should give you all the entries in the circular buffer. You can select less than that to access fewer items.

Pruning

You would have to clean up entries in the circular_buffer table since it grows for ever. A simple way to accomplish this is to delete all entries that are an offset N away. Depending on the read or write volume, you can perform these with reads/writes. Or you could periodically run this query. If you know that the data is regenerated periodically (say every week or so) and old data is not useful, you can set a TTL to expire the older data.

Noorain_Panjwani · March 26, 2019, 6:40am

I get it now. Makes sense. Thanks.

kannan · March 26, 2019, 9:33pm

@karthik:

Might be worth clarifying in your post above that the primary key in the circular_buffer_table is ((key), entry_id) where key is the partitioning column and entry_id the clustering column with DESC order (so that the queries like ... ORDER BY entry_id DESC LIMIT n) are efficient.

karthik · March 27, 2019, 4:53am

Great suggestion, done!

Topic		Replies	Views
JSONB with YCQL General	6	1026	May 21, 2020
Guarantees of YCQL counters General	5	1089	March 27, 2019
Limits on Collections General	1	1356	July 13, 2020
TTL with secondary index General	1	723	January 13, 2021
Increment Counter in a Map General	2	1095	July 10, 2020