Yugabyte vs MySQL

ivan_s · October 2, 2019, 7:49am

Hello!

Now I use MySQL. In the next project I want to use another DB.

Does your DB replace MySQL?

Do you have a performance comparision?

dorian_yugabyte · October 2, 2019, 11:26am

Yes YugabyteDB is a great alternative to mysql. But performance metrics are different in a distributed db compared to a single node one. We have many benchmarks comparing to different dbs Performance Benchmarks Library | Yugabyte including aws-aurora which has a similar architecture to mysql.

There may be some feature that MySQL has that PG/YugabyteDB doesn’t but I’m sure it can be expressed in another way.

You can explain your app more or the features that you need and I can guide you in the right direction!

ivan_s · October 2, 2019, 12:34pm

This is the messenger. The bottleneck is message storage (100 million per day).
As far as I understand, Nosql (like Cassandra) is well suited for this.

We also need to query the columns. For example, there are users (first table), and they have contacts (second table). It is necessary to make a request on the user_id column in the table with contacts.
As far as I understand, only RDBMS can solve this.

Is your database suitable for this?
Now we are using Mysql. But it is slow and cannot be scaled easily.

How long will a new column be added to the table? Let’s say 300 million rows in a table.

Do I need to split your database into two independent databases on different clusters or can all this be combined into one solution?

ivan_s · October 2, 2019, 12:37pm

But performance metrics are different in a distributed db compared to a single node one.

For better or worse?

dorian_yugabyte · October 2, 2019, 1:17pm

Your usecase is perfect for YugabyteDB. With the right partitioning you can have efficient writes and queries with linear scaling.

Regarding contacts table, you mean select * from contatcts where user_id=x ? If not explain it in a sql query ?

Adding a new column is instant cause it doesn’t need to rewrite the whole table. I’m guessing a column that can be null.

re performance
It will be better since you can use multiple nodes with more storage/memory/cpu.
It will worse for global transactions that span multiple servers.
It will be better for transactions that are inside 1 partition/server.
re Do I need to split your database into two independent databases on different clusters or can all this be combined into one solution?

This can all be in 1 cluster. Doing a cluster-per-feature is mostly needed when access-patterns,requirements,sla,hot/cold data change a lot between features . I think you can use 1 cluster at first and slowly migrate only when necessary.

You can explain more your app, write-path and read-path so I can help on how best to use a schema that horizontally scales.

ivan_s · October 2, 2019, 3:24pm

Thanks!

It will worse for global transactions that span multiple servers.

Can you give an example?

You can explain more your app, write-path and read-path so I can help on how best to use a schema that horizontally scales.

I don’t quite understand what you want. What’s “write-path&read-path”?

Should I use YSQL or YCQL? How to make a choice? I have a large number of messages, which seems to be more suitable for YCQL. At the same time, I need to use the RDBMS for other tasks, which seems to be more suitable for YSQL.

dorian_yugabyte · October 2, 2019, 3:48pm

Distributed transactions on every database that exists are slower compared to single node.
Suppose you have 2 channels. 1 channel is on node1, other channel on node2.
If you want to update a row on both channels, so on both nodes, in 1 transaction, it will be slower (because of network coordination) to do that compared to separated transactions for each node.

If your transaction is only targeting channel1, it will be as fast as mysql/postgresql (faster/slower depending on the query since the internals change a little).

You can use both. Or depends on what you need. Currently YCQL is a little more optimized and the client driver is cluster-aware.

While YSQL drivers need a little work to be more efficient on cluster mode (can be done manually by a custom connection pooler or using GitHub - yugabyte/jdbc-yugabytedb: JDBC Driver for Yugabyte SQL (YSQL) in java).

But YSQL is getting optimized to be as efficient as YCQL. And in complex queries, it will be faster since it will pushdown data locally.

The whole thing is: partition your data as optimal as possible, and try to do most queries in single partitions. So you get best of both worlds. And I can help you with that.

ivan_s · October 2, 2019, 4:03pm

Thank you for the answer.
“Node” is a server, I understand that.
What do you mean by “channel”?

dorian_yugabyte · October 2, 2019, 4:08pm

An example if you decide to keep all messages of a channel in 1 server, so partitioning by message.channel_id.

ivan_s · October 2, 2019, 4:35pm

Thank you!

Correctly I understand that if we separate transactions for each node, then when reading we can get an old record or nonexistent? Because we received from a node on which the value has not yet changed?

For example, before saving a message, we check if we have such a message or not (messages come from other messengers). Sometimes two scripts execute this at the same time and the first one can already save the message, and the second one does the check, does not receive the value and also saves the message.

dorian_yugabyte · October 2, 2019, 4:50pm

write/delete/update are synchronously replicated

Do both messages share the same primary key ? When you say check, what’s the query ?

ivan_s · October 2, 2019, 5:17pm

write/delete/update are synchronously replicated

If did we separate transactions for each node? Or why are we separating transactions?

When you say check , what’s the query ?

We get a message. The message has unique values item_id and client_id.

Next, we need to check whether such a client exists in the table with clients:
SELECT * FROM clients WHERE service_id = $clientId;

If it does not exist, then we create entries:

in the table with clients;
in the table with dialogs (we create a dialog to which messages are attached).
We save the message.

If the client exists, we check if the message exists:
SELECT * FROM messages WHERE service_id = $itemId;

If the message does not exist, we save it in the messages table.

What problems:

The dialog does not have a unique item_id, and it can be saved twice. To save once, you need to check if the client exists.
Messages have a unique item_id, but for message to be exactly unique, we need a composite index for the following columns: item_id, client_id, account_id (we did in mysql this way).

dorian_yugabyte · October 2, 2019, 5:47pm

Can you read these and reply back if it’s not clear ?

This is easy: (assuming you do similar thing)

select
if it doesn’t exist, insert
if unique-index-error, ignore
Or can be done in 1 command with an upsert.

There is 1 dialog per client ? I’m not understanding the columns of the dialog.

Explain schema with queries. And simple if/else where things may break. And put in code blocks.

ivan_s · October 2, 2019, 7:13pm

Thanks. I will study the documents and answer later. You are very loyal.

Topic		Replies	Views
Which one has better write performance between YugabyteDB cluster and single node General	5	1041	March 13, 2023
YugaByte DB for instant messaging app General	9	921	July 25, 2022
YSQL vs YCQL performance General	5	2179	December 10, 2019
Query performance on table with indices General	10	1347	March 18, 2022
How to Optimize YugabyteDB for Better Performance General	2	582	July 25, 2024

Yugabyte vs MySQL

Related topics