Database schema in YugabyteDB for storing statistics collected periodically

I need to investigate the indices further, I will let you know when I have found the reason for their existence.

With respect to the amount of data inserted at peak times:

I have taken as example the table that has 118 statistics columns. Currently we are running our initial test modified such that in each 15 minute interval a total of 4080 rows are inserted in that table (we have also upgraded our VMs from 16 GB to 32 GB). From the PostgreSQL logs on each of our 3 YugabyteDB server nodes I found the following:

  • Node 1: 1441 rows in 18.963 sec = 75.99 rows/sec (90 MB, see below)
  • Node 2: 1257 rows in 18.876 sec = 66.59 rows/sec (30 MB)
  • Node 3: 1382 rows in 18.869 sec = 73.24 rows/sec (30 MB)

Which is the most appropriate method for measuring the amount of data in bytes that is INSERTed? What I used was the http://NODE_IP:9000/tables URL and extracted the “SST Files Uncompressed” info for our table, just before the INSERTs and just before after them, performed the subtractions and came up with the numbers in parentheses above. Please let me know what we should use for this purpose.

Thank you

You should flush the table before inserting and after inserting Is there a manual way to force a flush and a major (i.e. full) compaction of a table in YugaByte DB? - #3 by kannan to count the size on disk.

@vlst with hundreds of columns, using packed rows (--ysql_enable_packed_row=true) can make a huge difference. You can set it in 2.14 we will confirm that it is supported for production. If not, the workaround may be to store all those values in one JSONB

I checked, packed rows are unfortunately still beta in 2.14. Will probably be GA in 2.16

Thank you for providing the tip for flushing tables. Based on this, here is the amount of data for each of our 3 nodes during the 18+ seconds when the data insertion is taking place:

  • Node 1: 40 MB
  • Node 2: 10 MB
  • Node 3: 30 MB

As far as the name/timestamp indices are concerned: Our original database schema in PostgreSQL had two indices: (name, timestamp) and (timestamp). However, back in March of this year when I had started working on porting to YugabyteDB, in the second index I added the name column, based on the “Query performance on table with indices” topic of this YugabyteDB forum. So according to our current application functionality it looks like we need both indices. If were to change the way our application works such that queries based on name ranges or wildcards are not allowed, then we could revisit the indices; but this is a decision that obviously as a developer alone I cannot make.

Can name be HASH here? Do you need range queries on name?

There are big differences in the way indexes are used in PostgreSQL and YugabyteDB

  • In PostgreSQL all indexes are secondary. In YugabyteDB, the table is stored in the primary index. Then choosing the primary index is critical for distribution but also for fast access. (name, timestamp) makes sense there
  • In PostgreSQL there’s no loose index scan also known as skip scan (using the index even with no predicate on its first column). YugabyteDB does that if this first column is ASC, not HASH. Then, the primary key on (name ASC, "timestamp" DESC) can still be used as Index Scan for queries without a predicate on name. And this can be faster than using a secondary index on (timestamp) because this one will have to go to the table for each row.

This depends on the number of rows retrieved by the queries, and the number of names to skip when using skip scan. So the decision of having this secondary index or not should be taken after some tests. You can use hints to select a specific index and compare.

I tried the approach of keeping only the primary key on (name ASC, “timestamp” DESC) and not defining the secondary index that includes “timestamp”, but unfortunately this significantly slows down a query that our application performs by default when running a report, which selects from the stats table without any predicate on the name and with “timestamp” greater than that of 24 hours ago ordered by “timestamp” DESC LIMIT 50. I do not know if there is anything else I need to do to achieve the “skip scan” that you mentioned, if so, please let me know.

When I put back the secondary index the query is fast. I observed that if I keep only the “timestamp” column in the secondary index the query is as fast as when the “name” column was also part of the secondary index (as 2nd column), so it looks like I do not need the “name” column in the secondary index.

Yes, with an ORDER BY "timestamp" DEC, nothing will beat the index on (“timestamp” DESC) because the Skip Scan doesn’t return the rows in this order, so it has to return all rows to the SQL layer, sort them, and only them apply the LIMIT.

So, if I understand correctly, Skip Scan uses the primary key index (name ASC, “timestamp” DESC) to find the rows when there is no predicate on the name column instead of performing sequential scan on the table, but it will not return them sorted by “timestamp” DESC even though that is what is specified in the primary key index? If that is the case, in what order does Skip Scan return the rows? I guess it is not clear to me how Skip Scan works, your input on this would be greatly appreciated.
Thank you

Let’s take an example with primary key (X asc, Y asc):

X Y
---
A 1
A 2
A 3
A 4
B 1
B 2
B 3
B 4
C 1
C 2
C 3
C 4

When you select where X='B' and Y<=3 order by Y, it is a range scan. Going to the beginning of the range (B,1) and stopping when over (B,3). This returns:

B 1
B 2
B 3

which is in the right order. So if you add a limit 2, it just reads up to (B,2)

But when you select where Y<=3 order by Y without a predicate on X, this is where skip scan happens. It goes to the start (A,1) and stops at (A,3). Then it skips to (B,1), reads and stops at (B,3), then skips to next… This returns:

A 1
A 2
A 3
B 1
B 2
B 3
C 1
C 2
C 3

then it has to sort it on Y:

A 1
B 1
C 1
A 2
B 2
C 2
A 3
B 3
C 3

and only then it can apply limit 2 to get:

A 1
B 1

For a Top-N query you need an ASC or DESC index that starts with the column that defines the range, if there’s a filter, and then the column you want the order. This will be physically ordered (in the LSM tree where the tables and indexes are stored) and return rows ordered without additional sorting. Then, the LIMIT can apply befire reading all.

Thank you for the excellent clarification with the example you provided.
So, based on this example, a case where the skip scan would return the rows in the desired order without need for any additional sort would be for a query with WHERE clause of “WHERE y <= 1”. In other words, relying on the skip scan alone without a secondary index may be optimal, depending on the data in the table and the query WHERE clause, correct?

I am now performing a test with database defined with COLOCATED = TRUE, where out of our 318 tables only 11 tables have been defined with COLOCATED = FALSE (they are 10 “stats” tables plus 1 table we have for logging various messages). What I have noticed is that the CPU usage has dropped significantly compared to the previous tests where none of our tables were colocated, but the memory usage is now only slightly lower. Does this seem in line with what you would expect, or would you also anticipate a larger drop in memory usage (and if so, is there anything else we should specify/configure to achieve this apart from just defining the database as colocated = true)?

Thank you

Yes, I would expect a drop on memory usage, but that also depends how we measure it.
In the tserver port 9000 endpoint there is a memory breakdown tserver:9000/mem-trackers

Thank you for pointing out the URL for tserver memory breakdown.

I looked at the “root” summary row that this URL reports on each of our 3 YugabyteDB server nodes, and I see that it is about 1 GB less than what the Linux “top” command reports in the “RES” column for the yb-tserver process. I do not know what extra memory “top” is detecting.

However, we are concerned about the total memory usage on each YugabyteDB server node, and besides the yb-tserver process we have a yb-master process running, the postgres server (that was brought up by the yb-tserver), as well as a postgres process for each connection to the database. In our case, we have about 70 connections to the database on each YugabyteDB server node. For example, on one of our YugabyteDB server nodes the total memory used by the postgres server and all the postgres processes is over 4 GB.

Is there any URL that reports the total memory usage by the above mentioned postgres processes?

Thank you

Ok, so 1GB for tablets seems good. Then for connexions, the best is to use a connection pool to limit them. Pinging @Frits_Hoogland for memory usage analysis

I would just like to note that we are indeed using a connection pool. We are performing a stress test with a large number of threads, where each thread is using a connection from the pool.

What language are you using in your client?
How many client servers?
How many vcpus on a client-server?
Looking at the table in this pgbouncer link, would you need Session Pooling or Transaction Pooling?

To answer your first question: there currently is no view or URL that specifies the YSQL/postgres process memory size.

For memory usage, make sure the tablet server and master server (if running on the same node) do not take too much memory. To asses this, look at the parameter default_memory_limit_to_ram_ratio.
Then take an assumed amount of memory that YSQL/postgres is allowed to take, and add approximately 20-25% of memory to “linux”, which are the linux kernel, background processes and the linux page cache.

The good news is that you can do some assessment of postgres usage on a running system by using the smem utility, which is available in EPEL. Using the utility, you can determine the actual amount of space in use, such as (in my tiny test VM):

$ sudo smem -P postgres -t -k
  PID User     Command                         Swap      USS      PSS      RSS
 1477 yugabyte postgres: logger                7.1M     2.1M     2.8M     7.4M
 7884 root     sudo smem -P postgres -t -k        0     1.9M     2.8M     8.7M
 1540 yugabyte postgres: stats collector       6.8M     2.3M     3.2M     8.2M
 1539 yugabyte postgres: checkpointer          7.1M     2.3M     3.4M     8.7M
 1482 yugabyte postgres: YSQL webserver        7.0M     2.8M     4.6M    12.8M
 7887 root     /usr/libexec/platform-pytho        0     9.2M    10.7M    15.1M
 1309 yugabyte /opt/yugabyte/yugabyte-2.15     7.2M    18.6M    26.2M    41.8M
 7851 yugabyte postgres: yugabyte yugabyte     6.6M    34.4M    40.5M    54.6M
-------------------------------------------------------------------------------
    8 2                                       41.8M    73.7M    94.2M   157.2M

You have to look at the PSS (proportional set size) figure total.

The tablet server and master server do not take their memory at startup, but rather take memory when they need, which makes memory assessment not easy. Because of how the master and tablet server work (with threads), the RSS size of these can be used (and is very close to PSS).

A last caveat: in YugabyteDB versions before 2.15.3.0, the postgres backends do not release memory that has been allocated and freed by postgres. This is because we use the tcmalloc memory allocator, which is optimised with the idea that memory is quickly allocated again by a thread. So postgres operations taking a lot of memory for processing will carry that allocations, not being available for other processes.

Our application is written in Java running within Wildfly (i.e., the descendant of JBoss) framework. We are using a Wildfly datasource for the connection pool, with max-pool-size configuration parameter set to 750.

Thank you for the tip with smem, I definitely will use that as a guide for PostgreSQL memory usage.