YugaByte DB Redis API now supports a native TIME SERIES data type

Introduction to time series data

We have spoken to several developers that have a need to model time series like data (for use cases such as metrics, stock quote feed, or user-activity) in Redis. Time series data has the following characteristics:

  1. There are a number of metric “time series”, for example cpu_usage and memory_usage are two separate time series.
  2. Each data point in one time series is a (timestamp, value) pair which denotes the observation recorded at a point in time.
  3. Generally, data arrives in increasing timestamp order and just needs to be appended to the end of the time series, but sometimes data points may arrive out of order and need insertion in the middle.
  4. Data is usually read by specifying a range of timestamps.
  5. There is often a time-to-live associated (TTL) with the data points, this depends on the timestamp of the data point. For example, for a given series, we may want to purge each data point in the time series if 2 days has elapsed since the timestamp of that data point.

Modeling time series data with vanilla Redis

There are predominantly two ways this is achieved today in Redis - using Sorted Sets and plain key-values. Both of these get complicated to model, and are not performant. Sorted sets have the following drawbacks:

  1. One way to model the data in Sorted Sets would be to use the score as the timestamp for the event and the value as the measurement at that time. For example:

    ZADD cpu_usage 201708110501 "70%"

    The problem with this approach is that sorted sets keep unique values and as a result if there is a measurement later on with the value 70%, then the older value will be overwritten. As a result, a common workaround to this is to add the timestamp to the value field:

    ZADD cpu_usage 201708110501 "201708110501:70%"

    Although, this approach has the downside of using extra space for the value and also some custom serialization and de-serialization overhead for the value.

  2. Sorted Sets have the overhead of keeping two maps and updating both of them as the data is mutated:

    a. We need a map from the score to all associated values to support operations like ZRANGEBYSCORE and ZREMRANGEBYSCORE, since Sorted Sets are sorted based on the score.
    b. We need a map from the value to the score to support operations like ZSCORE, ZREM and ZRANK.

    For a time series like workload, we would use ZADD to add new data points (which would write to both maps), ZRANGEBYSCORE to get data for a range of timestamps that we are interested in (incurs a lookup for a single map) and ZREMRANGEBYSCORE to remove old timestamps (this would have to delete the entry from both maps).

    As you can see we have an overhead for writing and deleting data if we model time series data on Sorted Sets. Ideally for a write heavy workload like time series, we would like these operations to be cheap.

Additionally, lack of finer-grained TTL in Sorted Sets make purging data tedious. Application driven purging logic to delete expired entries adds further load to the system.

Recognizing these issues in modeling time series data on Redis, we have worked closely with various customers to address this use-case in Redis based on their requirements.

Time Series datatype support in YugaByte

We are happy to announce that YugaByte Redis now supports a native time series (TS) data type that can handle all the above requirements with ease of data modelling and very high performance. Also, since YugaByte’s Redis offering is “Redis as a elastic database”, you do not need to worry about persisting your data in a separate data store!

TS is essentially a sorted map from int64 to a string/value, and can be implemented as a single timestamp_to_value_map: which maps from a “timestamp” to a single object/value.

Sample commands:

// In this example, the timestamp is encoded as an integer in yyyymmddhhmm format.
> TSAdd cpu_usage 201708110501 “80%” 201708110502 “60%” 201708110503 “90%”
> TSGet cpu_usage 201708110501
> TSRangeByTime cpu_usage 201708110501 201708110503
1) 201708110501 
2) “80%”
3) 201708110502
4) “60%”
5) 201708110503 
6) “90%”
> TSRem cpu_usage 201708110501
> TSGet cpu_usage 201708110501

We’ve recently finished implementing these commands and details can be found on our Open Source github repository:

Stay tuned for a more detailed blog post and documentation on this new type!

1 Like