Database schema in YugabyteDB for storing statistics collected periodically

Hello everyone,

We are porting our application from PostgreSQL to YugabyteDB, and I have a database schema design question:

Our original application running on PostgreSQL collects statistics data every 15 minutes and stores them in a database “stats” table, which is partitioned (via inheritance partitioning) such that (via some database triggers we have written) a partition is created for each day. Also, at the end of each month, we aggregate the data from the oldest partitions that span a month and store it into a separate “old_stats” database table, which is also partitioned (with a partition created for each month), and we drop the oldest partitions of “stats” table that were aggregated (one of the reasons being to avoid database bloating in PostgreSQL).

In YugabyteDB, since database bloating does not seem to apply, and since database table inheritance is not supported, we have so far ported this application by keeping the two aforementioned statistics tables, but without partitions. Upon aggregation of oldest data that spans a month, we just delete the corresponding database rows.

In a prototype of this porting we are running, we are seeing a large increase in the use of memory over time, even though we are collecting data for a very small number of objects (500, instead of around 50000 in our original PosgtreSQL application). Although we started out with a single tablet per cluster node per table, after a week of the prototype running we have reached 5 tablets per cluster node per table.

So, I am wondering if we are using the appropriate database schema when running with YugabyteDB, or if we need to make some changes or use a different approach to get this running efficiently under YugabyteDB.

Your input is greatly appreciated.

Thank you

Hi @vlst

Can you paste your table schema? And your most popular queries (or the ones that are slow)?

So you just have 2 tables?

Hello @dorian_yugabyte,
Here is our table schema (with the names changed for the sake of simplicity):

# \d stats
                         Table "public.stats"
            Column            |            Type             | Collation | Nullable | Default 
------------------------------+-----------------------------+-----------+----------+---------
 name                         | character varying           |           | not null | 
 timestamp                    | timestamp(3) with time zone |           | not null | 
 sampleperiod                 | numeric(20,0)               |           |          | 
 unknownperiod                | numeric(20,0)               |           |          | 
 actualtimestamp              | timestamp(3) with time zone |           |          | 
 actualperiod                 | numeric(20,0)               |           |          | 
 statistic_1                  | numeric(20,0)               |           |          | 
 statistic_2                  | numeric(20,0)               |           |          | 
...
 statistic_N                  | numeric(20,0)               |           |          | 
Indexes:
    "stats_pkey" PRIMARY KEY, lsm (name ASC, "timestamp" DESC)
    "stats_timestamp_idx" lsm ("timestamp" DESC, name ASC)

# \d old_stats
                         Table "public.old_stats"
            Column            |            Type             | Collation | Nullable | Default 
------------------------------+-----------------------------+-----------+----------+---------
 name                         | character varying           |           | not null | 
 timestamp                    | timestamp(3) with time zone |           | not null | 
 sampleperiod                 | numeric(20,0)               |           |          | 
 unknownperiod                | numeric(20,0)               |           |          | 
 actualtimestamp              | timestamp(3) with time zone |           |          | 
 actualperiod                 | numeric(20,0)               |           |          | 
 statistic_1                  | numeric(20,0)               |           |          | 
 statistic_2                  | numeric(20,0)               |           |          | 
...
 statistic_N                  | numeric(20,0)               |           |          | 
Indexes:
    "old_stats_pkey" PRIMARY KEY, lsm (name ASC, "timestamp" DESC)
    "old_stats_timestamp_idx" lsm ("timestamp" DESC, name ASC)

We actually have altogether 4 pairs of such “stats” and “old_stats” tables. The value of N ranges from 9 in one pair of such tables (that has the least columns) to 118 in another pair (that has the most).

Right now our test is focused on performing INSERTs into these tables every 15 minutes, and once a day data aggregation occurs, that involves SELECTs/DELETEs/INSERTs. The problem we are seeing so far is not that these database operations are slow, but that the memory usage is quite high for the amount of data we are processing.

Thank you

How many tables, indexes, tablets do you have per server? Screenshot from http://master-ip:7000/tablet-servers.

And what’s the hardware of the servers?

If you’re not in production, can you check with packed columns 1 million rows insert - DEV Community to see if it has lower memory?

So you keep a table-per-day, and at the end of the month, drop all single-day-tables and aggregate them in 1 table for the whole month, correct?

And how do you translate this to yugabytedb? You keep just 1 table and do delete instead of dropping? Can you be more clear?

Hello @dorian_yugabyte ,

For some unknown reason I cannot copy-paste my screen shot for the tablet-servers into this reply, so I manually converted it to text and formatted it as best as I could (leaving out the server UUIDs for the sake of readable format):

Tablet Servers
==============
                   Time                               User Tablet-            Num    Total     Uncompressed                                              System      Active
Server             since          Status &               Peers/      RAM      SST   SST Files     SST        Read    Write                               Tablet-     Tablet-
                  heartbeat       Uptime                 Leaders     Used    Files    Size      Files Size  ops/sec ops/sec  Cloud   Region      Zone  Peers/Leaders  Peers
----------------- ---------  -----------------------  ------------  -------  -----  --------   -----------  ------- -------  ------  ----------- ----- ------------- -------
10.127.5.242:9000  0.5s      ALIVE: 15days, 23:52:35    1459/486    9.90 GB   1053   4.33 GB    48.59 GB      0       0      cloud1  datacenter1 rack1   27/9         1486
10.127.5.241:9000  0.7s      ALIVE: 15days, 23:52:39    1459/486    9.62 GB   1071   4.33 GB    48.59 GB      0       0      cloud1  datacenter1 rack1   27/9         1486
10.127.5.243:9000  0.3s      ALIVE: 15days, 23:52:37    1459/487    10.10 GB  1044   4.33 GB    48.62 GB      0       0      cloud1  datacenter1 rack1   27/9         1486

*Placement policy, Preferred zones, and Node Blacklist will affect the Peer and Leader distribution.
Cluster Load is Balanced

Tablet-Peers by Availability Zone
=================================

Cloud   Region       Zone  Total Nodes  User Tablet-Peers/Leaders  System Tablet-Peers/Leaders  Active Tablet-Peers
------  -----------  ----- -----------  -------------------------  ---------------------------  -------------------
cloud1  datacenter1  rack1 3            4377/1459                  81/27                        4458

I should clarify that of course our database does not have only the 4 pairs of tables for which I presented their definition, but I only mentioned those since they are the ones into which data is being inserted during our tests. Our database actually has a total of 318 tables and 624 indices, out of which very few are used in our test.

Each server is a VM with 8 CPUs and 16 GB RAM.

Our application will ultimately run in production, so we are using the YugabyteDB latest stable version (2.14) for our tests. I assume that the packed columns feature is available in 2.13 and 2.15, but not in 2.14, correct?

Thank you

The approach we had used in PostgreSQL was based on the principle of aggregating data from the appropriate table partitions of the “stats” table, inserting the aggregated results into the “old_stats” table, and dropping the table partitions of the “stats” table that were aggregated.

In YugabyteDB, since we do not have table partitions, at the end of each month we aggregate the appropriate rows from the “stats” table, insert the aggregated results into the “old_stats” table, and delete the rows from the “stats” table that were aggregated.

It’s in 2.15, should be in 2.16 stable.

You mean it auto-split, correct? In this case, can you start with just 1 tablet (not 1 tablet per node)?
That should be enough, because even with splitting you just have 3 servers for now. The same with the index.

The DELETE here is very expensive. Can you do the same with yugabytedb and dropping the tables?
Or, you can have, say, 40 tables pre-created. And you write on 30 of them. At the end of the month, you start writing on table 31, (9 are just for buffer), you do aggregate, and you TRUNCATE the tables where you want to remove the data. TRUNCATE should be as fast as DROP TABLE.

Thank you for your reply.

We bring up the yb-master and yb-tserver processes with the --enable_automatic_tablet_splitting=true command line option (per advise from this forum), which according to the YugabyteDB documentation (yb-master configuration reference | YugabyteDB Docs) will cause the tables to begin with 1 tablet per node. If there is a way to start with only 1 tablet for the entire cluster (not per node), please let me know.

When you mention that the DELETE is “very expensive”, are you referring to the memory usage, disk space usage, or the time that is taken to perform this operation?

When you TRUNCATE a table, I assume that all its data is removed both from disk and from memory, correct?

If instead of performing TRUNCATE on a table we were to DROP it and re-CREATE it, is the concern that we could run into transaction/data inconsistency issues?

Your input is greatly appreciated.
Thank you

CREATE TABLE [YSQL] | YugabyteDB Docs and CREATE INDEX statement [YSQL] | YugabyteDB Docs

Note that range-sharded tables/indexes will have only 1 tablet by default.

CPU & memory. Because deletes are actually inserts underneath. This is true also on PostgreSQL.

Yes. But it’s file-drop like drop table.

No. Just that DROP+CREATE is heavier operation compared to TRUNCATE.

Thank you very much for your advise, I will look into implementing the schema with the multiple tables you suggest and hopefully we will see substantial improvement in terms of memory usage.

Those create tablets, which may allocate memory and use CPU (at least to send hearbeats). If they are not going to be large tables, better colocate them.

Indeed, most of the tables we have in our database are not large tables. I had thought about using colocated tables for our small tables quite a while ago, but I had seen in the YugabyteDB documentation that “Colocated tables are not currently recommended for production, as backup-restore is not yet fully supported for colocated tables.” We are running our tests with YugabyteDB version 2.14 (i.e., latest stable), since our application is targeted to run in a production environment. Does the statement in the documentation still hold true?

When you TRUNCATE a table, what happens to its tablets, do they still remain in existence?

If they do, then since our stats tables are range partitioned using name and timestamp, over the course of time as we aggregate data in one of the stats “sub-tables” (i.e., one of the 40 pre-created tables you suggest), insert it into the old_stats table and TRUNCATE that “sub-table”, I would expect that the number of tablets for the “sub-table” to increase, with many of them being empty but nevertheless taking up CPU and memory resources. Or is it the case that YugabyteDB can perform compaction for such empty tablets to keep the number of tablets as low as possible (and if so, are there any configuration parameters that we need to set for this)?

Your input is greatly appreciated.
Thank you

Colocated tables are not currently recommended for production

About colocated tables, I’ll check the current status. You may not need backups (which are used for snapshot-based point-in-time recovery) because the cluster is protected by replication and a better strategy for point-in-time recovery (tables with large historical data that doesn’t change) may be pg_dump of partitions.

When you TRUNCATE a table, what happens to its tablet
Truncate leaves the tablet but memory is released. But in your case, you will drop the partitions, right? This will remove the tablet.

Back to understand the issue. In your output, I see 1459 tablets (in a RF=3 with 3 nodes each node has a tabelt peer). You said 318 tables and 624 indexes, which is 942 tablets at least, which matches the many (non colocated) empty tables with one table, and the partitioned ones. I see only 10GB RAM used per server which may be ok for a 16GB RAM server. However, that number of tablets is probably too much for 8 vCPUS.

But those are just guesses, not enough information here. If we want to be sure, you can collect statistics with ybstats

About general ideas on partitioning and whether you need it on YugabyteDB, this depends on two things:

  • the size of the tables. If the database is in terabytes, yes partitioning, in addition to the automatic sharding, may help. Especially with old partitions being read-only so that they can easily get backed up, indexed, by partition. If in hundred of gigabytes, you may not need it and sharding will be sufficient
  • the queries. Sharding allows global secondary indexes. With partitioning, each partition has its index. So if you have queries reading over months of data, and with selective filtering, using index access here will mean read all index partitions.

It may also be a combination of those, maybe use partitioning but with yearly partitions instead of months. Need more detail about the size and access patterns.

I see that you have a primary key as (name ASC, "timestamp" DESC) and a secondary index on ("timestamp" DESC, name ASC). I guess you have queries name= and not range of names, so maybe better have the primary key as (name HASH, "timestamp" DESC). Of, on the other hand, if you stay with ASC you may not need the secondary index thanks to skip scan it depends on how many distinct name’s you have.

Only sstabels & memtables are cleared. The tablets will still be there.

Table merging is still not available.

Can you paste your top popular/heavy queries? Do you need the extra index on (timestamp, name)? Or are you querying on (name, timestamp)?

Thank you for your replies.

I should also mention that the memory usage issue we are trying to overcome includes increased use of swap (up to 6.8 GB). From what I have seen, CPU usage on the YugabyteDB nodes is not an issue (1.4 out of 8 CPUs). However, these are all with respect to the data collection tests with YugabyteDB for only 500 objects, when our target is to perform our tests with 50000 objects as in our original PosgtreSQL application.

I will need to investigate why we have both indices (name ASC, “timestamp” DESC) and (“timestamp” DESC, name ASC) on the stats tables. They are based on corresponding indices that have been in our PostgreSQL database schema for years. In our user interface we allow the user to filter and/or specify ranges on both name and timestamp fields. I will need to find more information about whether users would actually specify ranges for name field, to determine whether we could use “HASH” for that field instead. We currently have 500 distinct values for name field, and our target is 50000.

When you refer to “pg_dump of partitions”, what exactly do you mean with the term “partitions”?

Your input is greatly appreciated.

Thank you

He means you can run a pg_dump / ysql_dump to dump a single or group of tables/partitions. Because on colocated dbs, snapshots are still not supported.

The memory usage comes from the big number of tablets. And from using DELETE instead of truncate/drop-table.

Please do. How much data do you expect to have per-day when you’ll store all the “names” ?

Thank you for the clarifications.

Regarding how much data we expect to have per day: For our target of 50000 objects, if we take as an example our table that has 118 statistics columns, for each object there are 4 distinct values for “name” for this table (per our application business logic). Since we collect data every 15 minutes (= 4 times within each hour), at the end of the day we expect:
(24 * 4) * (4 * 50000) rows = 19200000 rows, where in each row there are a total of 121 fields of type NUMERIC(20, 0), 3 fields of type TIMESTAMP, and 1 CHARACTER VARYING field (the “name”).

According to this amount of data, should we use the “partitioning” scheme with the multiple tables you suggested, or would the sharding provided by YugabyteDB suffice in this case?

Thank you

How much is that in MB? And how many inserts/second at peak usage?

Were you able to check the index on (name, timestamp) if name needs to be sortable?

The best way is with dump/truncate, as that will have lower memory usage.

Partitions are tables underneath.