Yugabyte vs MySQL

Hello!

Now I use MySQL. In the next project I want to use another DB.

Does your DB replace MySQL?

Do you have a performance comparision?

Hi @ivan_s

Yes YugabyteDB is a great alternative to mysql. But performance metrics are different in a distributed db compared to a single node one. We have many benchmarks comparing to different dbs https://blog.yugabyte.com/category/performance-benchmarks/ including aws-aurora which has a similar architecture to mysql.

There may be some feature that MySQL has that PG/YugabyteDB doesn’t but I’m sure it can be expressed in another way.

You can explain your app more or the features that you need and I can guide you in the right direction!

This is the messenger. The bottleneck is message storage (100 million per day).
As far as I understand, Nosql (like Cassandra) is well suited for this.

We also need to query the columns. For example, there are users (first table), and they have contacts (second table). It is necessary to make a request on the user_id column in the table with contacts.
As far as I understand, only RDBMS can solve this.

Is your database suitable for this?
Now we are using Mysql. But it is slow and cannot be scaled easily.

How long will a new column be added to the table? Let’s say 300 million rows in a table.

Do I need to split your database into two independent databases on different clusters or can all this be combined into one solution?

But performance metrics are different in a distributed db compared to a single node one.

  • For better or worse?

Your usecase is perfect for YugabyteDB. With the right partitioning you can have efficient writes and queries with linear scaling.

Regarding contacts table, you mean select * from contatcts where user_id=x ? If not explain it in a sql query ?

Adding a new column is instant cause it doesn’t need to rewrite the whole table. I’m guessing a column that can be null.

  • re performance
    It will be better since you can use multiple nodes with more storage/memory/cpu.
    It will worse for global transactions that span multiple servers.
    It will be better for transactions that are inside 1 partition/server.

  • re Do I need to split your database into two independent databases on different clusters or can all this be combined into one solution?

This can all be in 1 cluster. Doing a cluster-per-feature is mostly needed when access-patterns,requirements,sla,hot/cold data change a lot between features . I think you can use 1 cluster at first and slowly migrate only when necessary.

You can explain more your app, write-path and read-path so I can help on how best to use a schema that horizontally scales.

Thanks!

It will worse for global transactions that span multiple servers.

  • Can you give an example?

You can explain more your app, write-path and read-path so I can help on how best to use a schema that horizontally scales.

  • I don’t quite understand what you want. What’s “write-path&read-path”?

Should I use YSQL or YCQL? How to make a choice? I have a large number of messages, which seems to be more suitable for YCQL. At the same time, I need to use the RDBMS for other tasks, which seems to be more suitable for YSQL.

Distributed transactions on every database that exists are slower compared to single node.
Suppose you have 2 channels. 1 channel is on node1, other channel on node2.
If you want to update a row on both channels, so on both nodes, in 1 transaction, it will be slower (because of network coordination) to do that compared to separated transactions for each node.

If your transaction is only targeting channel1, it will be as fast as mysql/postgresql (faster/slower depending on the query since the internals change a little).

You can use both. Or depends on what you need. Currently YCQL is a little more optimized and the client driver is cluster-aware.

While YSQL drivers need a little work to be more efficient on cluster mode (can be done manually by a custom connection pooler or using https://github.com/yugabyte/jdbc-yugabytedb in java).

But YSQL is getting optimized to be as efficient as YCQL. And in complex queries, it will be faster since it will pushdown data locally.

The whole thing is: partition your data as optimal as possible, and try to do most queries in single partitions. So you get best of both worlds. And I can help you with that.

Thank you for the answer.
“Node” is a server, I understand that.
What do you mean by “channel”?

An example if you decide to keep all messages of a channel in 1 server, so partitioning by message.channel_id.

Thank you!

Correctly I understand that if we separate transactions for each node, then when reading we can get an old record or nonexistent? Because we received from a node on which the value has not yet changed?

For example, before saving a message, we check if we have such a message or not (messages come from other messengers). Sometimes two scripts execute this at the same time and the first one can already save the message, and the second one does the check, does not receive the value and also saves the message.

write/delete/update are synchronously replicated

Do both messages share the same primary key ? When you say check, what’s the query ?

write/delete/update are synchronously replicated

If did we separate transactions for each node? Or why are we separating transactions?

When you say check , what’s the query ?

We get a message. The message has unique values ​​item_id and client_id.

Next, we need to check whether such a client exists in the table with clients:
SELECT * FROM clients WHERE service_id = $clientId;

If it does not exist, then we create entries:

  1. in the table with clients;

  2. in the table with dialogs (we create a dialog to which messages are attached).

  3. We save the message.

If the client exists, we check if the message exists:
SELECT * FROM messages WHERE service_id = $itemId;

If the message does not exist, we save it in the messages table.

What problems:

  1. The dialog does not have a unique item_id, and it can be saved twice. To save once, you need to check if the client exists.
  2. Messages have a unique item_id, but for message to be exactly unique, we need a composite index for the following columns: item_id, client_id, account_id (we did in mysql this way).

Can you read these and reply back if it’s not clear ?



This is easy: (assuming you do similar thing)

  1. select
  2. if it doesn’t exist, insert
  3. if unique-index-error, ignore
    Or can be done in 1 command with an upsert.

There is 1 dialog per client ? I’m not understanding the columns of the dialog.

Explain schema with queries. And simple if/else where things may break. And put in code blocks.

Thanks. I will study the documents and answer later. You are very loyal.