How to store count of unread messages?

Hello. How to save the number of unread messages?

I have a chat application. A user can have many chats. And I need to display how many unread messages in each chat.

It’s also important for me to store this in the existing “users” table. I don’t want to count number of unread messages each time.

In other words, I am thinking about the JSON object. In which will be stored an array of chats and how many unread messages each of them has.

How do I better implement this? How do I better update the unread counter so that it is synchronized? For example, in one split second, a new message may arrive and the user has read the old one.

hi @ivan_s

A user can have many chats.

Are these private channels with limited number of users ?
Or public channels with hundreds/thousands of users ?

^Depending on this we can chose the best way to implement it.

I don’t want to count number of unread messages each time.

Why not ?
Do you think it will be inefficient counting the rows ?
Or will there be a lot of time between checks and the query might require to count many rows to get the total unread count.

There are private channels with a limited number of users - Yes.

Usually, a user has 20-30 chats and they almost always have unread messages.
We used to have a real-time calculation in MYSQL.

At first, we tried to read messages starting from a certain date.
Then we began to count the number of messages that were marked as “unread” (where is_unread = True).

Also, the user now has several personal accounts and we need to display the number of unread messages in each account.
We want to store all this in one field because read operations are much more common. Therefore, it is better to consider the number of unread in advance.

In addition, too many unread messages (100-1000) are accumulated in chats. Total: in one personal account there can be 30 chats with 500 unread messages (on average). This is 15,000 unread messages. Also, on average, each user has 3 cabinets. These are 45,000 unread messages which need to be counted regularly (at each page loading).

We can use sockets. And when the message arrives, add +1 to the counter. But we worry that this is unreliable, and the counter will lie.

Ok we will try something:::

Chats with few users ::: (how many new messages/second ?)
We don’t want to store all unreads into 1 row, this will increase a lot the concurrent updates/increments and is an antipattern. Keeping multiple rows will actually have better performance on read/writes, especially writes.

We create a table to keep the unreads per-channel per-user:

create table user_chat_unread(user_id, chat_id, unreads BIGINT default 0);

When sending a new message, we also increment unreads column of all participants. Here we can use UPSERT statement.
While for the user that is sending the message, we don’t increment but delete his row.

You should check how writes work in this scenario. Doing distributed write on many rows (as many as there are users) may be slow.
While reads should be very fast for a user.

Are there chats with a lot of messages / second ?

Thank you for the answer!

Why do we use the ‘BIGINT’ type and not the ‘INT’?

We don’t want to store all unreads into 1 row, this will increase a lot the concurrent updates/increments and is an antipattern.

But what about JSONB?
I will explain why I want to store it in one row.

In fact, I simplified the task for understanding.
Our application is arranged a little differently.
We have thousands of chats per user. Each chat has a number of unread messages. Also, each chat is stored in its category.
And the category should display how many unread chats it contains.
And every time we have to display all these categories.

Therefore, in order not to receive all these categories every time I wanted to store in one row.


Are there chats with a lot of messages / second ?

No, only categories can have a lot of chats /second ~ 10 chats / second max, usually 0-1-2 / second.

Suppose we store each row separately.
How do we better update the number of unread messages (±1) (if we read chat (-1 unreads) at the same time and a new message arrives in another chat (+1 unreads))?

I use BIGINT just to be always safe. In your case you can use INT.

JSONB will containe multiple key/values inside. If you have many writes (like increments) in these values it will be better to store as separate rows.

The reason:::: on each update, we LOCK the row. The other update query must wait (or abort and retry). Storing multiple rows will spread locking on multiple rows.

Example: there are cases where you lock 1 row even a lot. In those cases you have to partition the row to several ones that each can be locked separately.

Ok, so we want to also keep the number of unreads per-category (and not per-chat!), correct ?

That’s why you need to do (add message + increment unreads) inside transactions.

When you read a chat, you should delete (user_id, chat_id, unreads) row for the user that read the chat, no ?

Yes, and we have unread messages for every chat.

When you read a chat, you should delete (user_id, chat_id, unreads) row for the user that read the chat, no ?

Yes, we have to delete the chat line.
But we have categories. And we must reduce the number unread chats of the category.

But how to understand that this is already too much?

This will depend on replication factor, latency between servers, transaction size etc. You will get error Restart read required or Transaction expired, and you have to retry the write.

Then we can do the same with categories too.
If there are a lot of concurrent update for categories too, then we can partition to: (category_id,partition_id, unreads). We decide on, say, 4 partitions for a category (this too will depend on the concurrent updates).

When you do an increment/decrement, you just do it on a random partition_id for a category. When you read, you get all partitions and sum the unreads. The sum will be the true value. Makes sense ?

Wow. Partition_id is beautiful. Thank you.

Which request is better to update the ‘unreads’ value? Do I need to use a transaction (Select, after update)?
If there are two update requests, will one of them be in the queue? Or will one of them be rejected?

@dorian_yugabyte ? :slight_smile:

Tell me your queries in total (insert message, update unreads, etc) and I will say an optimal query.

Hi @ivan_s,

It might be easier to get on a call to discuss your application, I am trying to piece the details together but it seems a little hard. But here’s what I understand so far.

You have a chat application, which has:

  • users
    • Lets say these are u1, u2, …
    • There can be many users (millions)
  • categories
    • Each user can chat with other users in multiple categories. As an example, users u1 and u2 can have separate chat histories in categories c1 and c2.
    • There are a handful of categories, say 100 or so.
  • messages
    • Each chat in a category will have a list of messages. For example, (user u1, category c1) can have a chat history for (u1-u2) which (user u1, category c2) can have a separate chat history for (u1-u2)
  • Each chat history and each category would have an unread count for a user, which you are trying to make efficient.

If the above is correct, here is a proposal. Note that the following proposal uses hash and range sharding features of YugabyteDB, which helps you achieve scale and performance respectively.

Schema

Let’s go with 2 tables:

  • user_category_overview which contains the unread message counts for each user per category
  • user_category_messages which contains the list of messages, etc for each user per category
CREATE TABLE user_category_overview (
    user_id UUID,
    category_id INT, 
    unread_count INT DEFAULT 0,
    PRIMARY KEY (user_id HASH, category_id ASC)
);

CREATE TABLE user_category_messages (
    user_id UUID,
    category_id INT,
    message_id BIGINT,        # use epoch time as the message id
    is_unread INT DEFAULT 1,  # 0 if message is read and 1 for unread
    ...                       # message_body, etc.
    PRIMARY KEY (user_id HASH, category_id ASC, message_id DESC)
);

Note that the tables above have one column (user_id) hash sharded for scalability purposes, while the other columns are sorted for ordered retrieval which brings the optimal balance between scale and perf.

Inserts

Inserting a new message into this table, you would insert a row into two tables:

BEGIN TRANSACTION
    UPDATE user_category_overview 
        SET unread_count = unread_count + 1 
        WHERE user_id = '...' AND category_id = '...';
    INSERT INTO user_category_messages (...) VALUES (...);
COMMIT

If a message is read, you would perform the following:

BEGIN TRANSACTION
    UPDATE user_category_overview 
        SET unread_count = unread_count - 1 
        WHERE user_id = '...' AND category_id = '...';
    UPDATE user_category_messages
        SET is_unread = 0
        WHERE user_id = '...' AND category_id = '...';
COMMIT

If messages can be deleted when unread in your application, you would have to perform that as a two table transaction when deleting an unread message.

Selects

For unread count per category simply do:

SELECT category_id, unread_count FROM user_category_overview 
    WHERE user_id = '...';

The above will fetch a small number of rows (only the categories where the user has messages).

For total unread counts, you should add the above in the client side, or you can also do:

SELECT SUM(unread_count) FROM user_category_overview 
    WHERE user_id = '...';

Which API to use

Note that all of the above is possible in both the YSQL and the YCQL API. Pick the API depending on your needs (app needs like joins, reading from nearest datacenter/region, latency you want, etc). This section will help you decide: FAQs about YugabyteDB API compatibility | YugabyteDB Docs. When in doubt, please pick YSQL.

Happy to talk more if you need help.

1 Like

@karthik wow. Thank you for the answer!

I got it. Here’s a detailed diagram of the application:

I have a question.

Why? Rows with value 0 will be in the table. And we will get all rows including the value 0.

I need to store the number of unread dialogs in the category, not the number of unread messages.

Suppose a new message comes and we make +1 in the category and +1 in the dialog. Now another new message comes. We make +1 in a dialog and get 2 unread messages. But we don’t need to make +1 in the category, because the category shows the number of unread dialogs instead of messages.

How can I write such logic?
In other words, we should do +1 in the category if the dialog has 0 unread messages. If the dialog has 1 and more unread messages we don’t should do +1 in the category.

Can you tell me the schema for (message, category, chat) tables ? I’ll test some queries and get back to you.

Yes, and I updated my answer:
‘In other words, we should do +1 in the category if the dialog has 0 unread messages. If the dialog has 1 and more unread messages we don’t should do +1 in the category.’

Here we store the number of unread dialogs in categories:

CREATE TABLE categories_unread (
    user_id UUID,
    category_id INT, 
    unread_count INT DEFAULT 0,
    PRIMARY KEY (user_id HASH, category_id ASC)
);

Here we store the number of unread messages in dialogs:

CREATE TABLE dialogs (
    last_message_id INT,
    last_message_text TEXT,
    ... and many other fields,
    unread_count INT DEFAULT 0,
);