Advanced Redis Data Types

Advanced Redis Data Types

May 27, 2023 · 17 min read

This chapter introduces the Set, Sorted Set, Bitmap, and HyperLogLog data types.

Sets

A Set in Redis is an unordered collection of distinct Strings - it is not possible to add repeated elements to a Set. Internally, a Set is implemented as a hash table, which is the reason that some operations are optimized.

The Set memory footprint will be reduced if all the members are integers, and the total number of elements can be as high as the value of set-max-intset-entries configuration.

The maximum number of elements that a Set can hold is 2^32-1.

Some use cases for Sets are:

  • Data Filtering: For example, filtering all flights that depart from a given city and arrive on another.
  • Data Grouping: Grouping al users who viewed similar products(for example, recommendations on Amazon.com).
  • Membership Checking: Checking whether a user is on a blacklist.

Basics

The command SADD is responsible for adding one or many members to a Set. SADD ignores members that already exist in a Set and returns the member of added members.

1jedis.sadd("user:max:favorite", "Arcade Fire", "Arctic Monkeys", "Belle & Sebastian", "Lenine");
2jedis.sadd("user:hugo:favorite", "Daft Punk", "The Kooks", "Arctic Monkeys");

The command SINTER expects one or many Sets and returns an array with the members that belong to every Set.

1Set<String> sinter = jedis.sinter("user:max:favorite", "user:hugo:favorite"); // [ "Arctic Monkeys" ]

The command SDIFF expects one or many Sets. It returns an array with all members that belong to the first Set that do not exist in the Sets that follow it.

In this command, the key name order matters. Any key that does not exist is considered to be an empty Set.

1Set<String> sdiff1 = jedis.sdiff("user:max:favorite", "user:hugo:favorite"); // [ "Belle & Sebastian", "Lenine", "Arcade Fire" ]
2Set<String> sdiff2 = jedis.sdiff("user:hugo:favorite", "user:max:favorite"); // [ "The Kooks", "Daft Punk" ]

The SUNION command expects one or many Sets. It returns any array with all members of all Sets. The results has not repeated members.

1Set<String> sunion = jedis.sunion("user:max:favorite", "user:hugo:favorite"); // [ "The Kooks", "Daft Punk", "Belle & Sebastian", "Lenine", "Arcade Fire", "Arctic Monkeys" ]

The command SRANDMEMBER returns random member from a Set. Because Sets are unordered, it is not possible to retrieve elements from a given position.

1String srandmember = jedis.srandmember("user:max:favorite"); // "Lenine"

The command SISMEMBER checks whether a member exists in a Set.

1boolean sismember = jedis.sismember("user:max:favorite", "Arctic Monkeys"); // true

The command SREM removes and returns members from a Set. The command SCARD returns the number of members in a Set(also known as cardinality.

1long srem = jedis.srem("user:max:favorite", "Arctic Monkeys"); // 1
2long sismember = jedis.sismember("user:max:favorite", "Arctic Monkeys"); // false
3long cardinality = jedis.scard("user:max:favorite"); // 3

The command SMEMBERS returns an array with all members of a Set.

1Set<String> smembers = jedis.smembers("user:max:favorite"); // [ "Belle & Sebastian", "Lenine", "Arcade Fire" ]

Deal Tracking System

Yipit, Groupon, and LivingSocial are examples of websites that send daily e-mails to users. These e-mails contain a Set of deals(coupons and discounts) that users are interested in.These deals are based on the area in which they live, as well their preferences.

This section will show how to create functions to mimic the features of these websites:

  • Mark a deal as sent to user
  • Check whether a user received a group of deals
  • Gather metrics from the sent deals Define the functions as follows:
 1/**
 2 * Add a userId to a deal Set
 3 */
 4private void markDealAsSent(Jedis client, String dealId, String userId) {
 5    client.sadd(dealId, userId);
 6}
 7
 8/**
 9 * Check whether a user id belongs to a deal Set. And sends a deal to a user only if it was not already sent
10 */
11private void sendDealIfNotSent(Jedis client, String dealId, String userId) {
12    boolean isSent = client.sismember(dealId, userId);
13    if (isSent) {
14        logger.info("Deal {} was already sent to user {}", dealId, userId);
15    } else {
16        logger.info("Sending {} to user {}", dealId, userId);
17        // code to send the deal to user
18        markDealAsSent(client, dealId, userId);
19    }
20}
21
22/**
23 * @return all userIds that exist in all the deal Sets specified
24 */
25private Set<String> showUsersThatReceivedAllDeals(Jedis client, String... dealIds) {
26    return client.sinter(dealIds);
27}
28
29/**
30 * @return all userIds that exist in any of the deal Sets specified
31 */
32private Set<String> showUsersThatReceivedAtLeastOneDeal(Jedis client, String... dealIds) {
33    return client.sunion(dealIds);
34}

Use these functions to handle deal metrics.

 1markDealAsSent(jedis, "deal:1", "user:1");
 2markDealAsSent(jedis, "deal:1", "user:2");
 3markDealAsSent(jedis, "deal:2", "user:1");
 4markDealAsSent(jedis, "deal:2", "user:3");
 5
 6sendDealIfNotSent(jedis, "deal:1", "user:1");
 7sendDealIfNotSent(jedis, "deal:1", "user:2");
 8sendDealIfNotSent(jedis, "deal:1", "user:3");
 9
10Set<String> receivedAll = showUsersThatReceivedAllDeals(jedis, "deal:1", "deal:2"); // [ "user:1", "user:3" ]
11Set<String> receivedAny = showUsersThatReceivedAtLeastOneDeal(jedis, "deal:1", "deal:2"); // [ "user:2", "user:1", "user:3" ]

Sorted Sets

A Sorted Set is very similar to a Set, but each element of a Sorted Set has an associated score. In other words, a Sorted Set is a collection of non-repeating Strings sorted by score. It is possible to have elements with repeated scores. In this case, the repeated elements are ordered lexicographically.

Sorted Set operations are fast, but not as fast as Sets operations, because scores need to be compared. Adding, removing, and updating an item in a Sorted Set runs in logarithmic time, O(log(N)), where N is the number of elements in a Sorted Set.

Internally, Sorted Sets are implemented as two separated data structure:

  • A skip list with a hash table. A skip list is a data structure that allows fast search within a ordered sequence of elements.
  • A ziplist, based on the zset-max-ziplist-entries and zset-max-ziplist-value configurations.

Sorted Sets could be used to:

  • Build a real time waiting list for customer service
  • Show a leaderboard of a massive online game that displays the top players, user with similar scores, or the scores of your friends
  • Build an autocomplete system using millions of words

Basics

The ZADD command adds one or many members to a Sorted Set. ZADD ignores members that already exist in a Sorted Set. It returns the number of added members.

1long resultAlice = jedis.zadd("leader", 100, "Alice"); // 1
2long resultZed = jedis.zadd("leader", 100, "Zed"); // 1
3long resultHugo = jedis.zadd("leader", 102, "Hugo"); // 1
4long resultMax = jedis.zadd("leader", 101, "Max"); // 1

There is a family of commands that can fetch ranges in a Sorted Set: ZRANGE, ZRANGEBYLEX, ZRANGEBYSORE, ZREVRANGE, ZREVRANGEBYLEX, ZREVRANGEBYSORE. The only different is in how their result are sorted:

  • ZRANGE returns elements from the lowest to the highest score, and it uses ascending lexicographical order if a score tie exists
  • ZREVRANGE returns elements from the highest to the lowest score, and it uses descending lexicographical order if a score tie exists

Both of these commands expect a key name, a start index, and an end index. The indices are zero-based and can be positive or negative values.

1List<String> zrange = jedis.zrange("leader",0, 2); // [ "Alice", "Zed", "Max" ]
2List<Tuple> zrangeWithScores = jedis.zrangeWithScores("leader", 0, -1); // [ "Alice:100.0", "Zed:100.0", "Max:101.0", "Hugo:102.0" ]
3List<String> zrangeByScore = jedis.zrangeByScore("leader", 100, 101); // [ "Alice", "Zed", "Max" ]

When all the elements in a sorted set are inserted with the same score, in order to force lexicographical ordering, the ZRANGEBYLEX command returns all the elements in the sorted set at key with a value between min and max.

1List<String> zrangeByLex = jedis.zrangeByLex("myzset", "[aaa", "(g"); // [ "b", "c", "d", "e", "f" ]

If the elements in the sorted set have different scores, the returned elements are unspecified. See more details. {: .prompt-warning }

It is possible to retrieve a score or rank of a specific member in a Sorted Set using the commands ZSCORE and ZRANK/ZREVRANK.

1List<Tuple> allWithScores = jedis.zrangeWithScores("leader", 0, -1); // [[Alice,100.0], [Zed,100.0], [Max,101.0], [Hugo,102.0]]
2Double zscore = jedis.zscore("leader", "Hugo"); // 102.0
3Long zrank = jedis.zrank("leader", "Hugo"); // 3

The ZREM command removes a member from a Sorted Set.

1long zrem = jedis.zrem("leader", "Hugo"); // 1

Leaderboard System

In this section, we are going to build a leaderboard application that can be used in an online game. The application has the following features:

  • Add and remove users
  • Display the details of a user
  • Show the top x users
  • Show the users who are directly ranked about and below a given user

We will create a class Leaderboard and implement methods to add & remove user from the leaderboard and fulfill the rest 3 requirements.

 1@RequiredArgsConstructor
 2public static class Leaderboard {
 3    private final Jedis client;
 4    private final String key;
 5
 6    /**
 7     * Add user to leaderboard
 8     */
 9    public void addUser(String username, double score) {
10        this.client.zadd(this.key, score, username);
11    }
12
13    /**
14     * Remove user from leaderboard
15     */
16    public void removeUser(String username) {
17        this.client.zrem(this.key, username);
18    }
19
20    /**
21     * @return score and rank for given user
22     */
23    public List<Number> getUserScoreAndRank(String username) {
24        double score = this.client.zscore(this.key, username);
25        long rank = this.client.zrevrank(this.key, username);
26        return List.of(score, rank);
27    }
28
29    /**
30     * @return top {@code n} users in the leaderboard
31     */
32    public List<Tuple> showTopUsers(int n) {
33        return this.client.zrevrangeWithScores(this.key, 0, n - 1);
34    }
35
36    /**
37     * @return {@code n} users around given user
38     */
39    public List<Tuple> getUsersAroundUser(String username, int n) {
40        long rank = this.client.zrevrank(this.key, username);
41        var startOffset = Math.max(rank - n / 2 + 1, 0);
42        var endOffSet = startOffset + n - 1;
43        return this.client.zrevrangeWithScores(this.key, startOffset, endOffSet);
44    }
45}

Let’s try out these functions.

 1Leaderboard leaderboard = new Leaderboard(jedis, "game-score");
 2leaderboard.addUser("Arthur", 70);
 3leaderboard.addUser("KC", 20);
 4leaderboard.addUser("Maxwell", 10);
 5leaderboard.addUser("Patrik", 30);
 6leaderboard.addUser("Ana", 60);
 7leaderboard.addUser("Felipe", 40);
 8leaderboard.addUser("Renata", 50);
 9leaderboard.addUser("Hugo", 80);
10
11leaderboard.removeUser("Arthur");
12
13List<Number> scoreAndRank = leaderboard.getUserScoreAndRank("Maxwell"); // [ "10.0", "6" ]
14List<Tuple> top3Users = leaderboard.showTopUsers(3); // [ "Hugo:80.0", "Ana:60.0", "Renata:50.0" ]
15List<Tuple> usersAround = leaderboard.getUsersAroundUser("Felipe", 5); // [ "Ana:60.0", "Renata:50.0", "Felipe:40.0", "Patrik:30.0", "KC:20.0" ]

Bitmaps

A Bitmap is not a real data type in Redis. Under the hood, a Bitmap is a String. We can also say that a Bitmap is a set of bit operations on a String. However, we are going to consider them as data types because Redis provides commands to manipulate Strings as Bitmaps. Bitmaps are also known as bit arrays or bitsets.

A Bitmap is a sequence of bits where each bit can store 0 or 1. The Redis documentation refers to Bitmap indices are offset.

Bitmaps are memory efficient, support fast data lookups and can store up to 2^32 bits.

Memory Efficiency

In order to see how a Bitmap can be memory efficient, we are going to compare a Bitmap to a Set. The comparison scenario is an application that needs to store all user IDs that visited a website on a given day( The Bitmap offset represents a user ID). We assume that our application has 5 million users in total, but only 2 million users visited the website on that day, and that each user Id can be represented by 4 bytes(32 bits, which is the size of an integer in a 32-bit computer).

The following table compare how much memory, in theory, a Bitmap and a Set implementation would take to store 2 million user IDs.

Redis KeyData typeAmount of bits per userStored usersTotal memory
visits:2015-01-01:bitmapBitmap1 bit5 million1 * 5,000,000 bits = 625 KB
visits:2015-01-01:setSet32 bit2 million32 * 2,000,000 bits = 8 MB

The worst-case scenario for Bitmap implementation is represented in the preceding table. It had to allocate memory for the entire user base even though only 2 million users visited the page.

Updating 5 million records might take a few minutes. {: .prompt-warning }

In reality, the difference is even more significant.

 1for (var i = 0; i < 5_000_000; i++) {
 2    jedis.setbit("visits:2015-01-01:bitmap", i, i % 2 == 0);
 3}
 4
 5for (var i = 0; i < 2_000_000; i++) {
 6    jedis.sadd("visits:2015-01-01:set", String.valueOf(i));
 7}
 8
 9double bitmapMemInMB = jedis.memoryUsage("visits:2015-01-01:bitmap") / 1024.0 / 1024; // 1.0000686645507812 MB
10double setMemInMB = jedis.memoryUsage("visits:2015-01-01:set") / 1024.0 / 1024; // 92.2940673828125 MB

However, Bitmaps are not always memory efficient. If we change the previous example to consider only 100 visits instead of 2 million, assuming the worst-case scenario again, the Bitmap implementation would not be memory-efficient.

Redis KeyData typeAmount of bits per userStored usersTotal memory
visits:2015-01-01:bitmapBitmap1 bit5 million1 * 5,000,000 bits = 625 KB
visits:2015-01-01:setSet32 bit10032 * 100 bits = 3.125 KB

Let’s try with a real example.

 1for (var i = 0; i < 10; i++) {
 2    jedis.setbit("visits:2015-01-01:bitmap:sparse", 5_000_000 - i, true);
 3}
 4
 5for (var i = 0; i < 100; i++) {
 6    jedis.sadd("visits:2015-01-01:set:sparse", String.valueOf(i));
 7}
 8
 9double bitmapMemInBytes = jedis.memoryUsage("visits:2015-01-01:bitmap:sparse") / 1024.0; // 1024.0859375 Bytes
10double setMemInBytes = jedis.memoryUsage("visits:2015-01-01:set:sparse") / 1024.0; // 0.2734375 Bytes

Bitmaps are a great match for application that involve real-time analytics, because they can tell whether a user performed an action(that is, “Did user X pperform action Y today?”), or how many times an event occurred(that is, “How many users performed action Y this week?”).

Basics

The SETBIT command is used to give a value to Bitmap offset and it accepts only 1 or 0. If the Bitmap does not exist, it creates it.

1jedis.setbit("visits:2015-01-01", 10, true);
2jedis.setbit("visits:2015-01-01", 15, true);
3
4jedis.setbit("visits:2015-01-02", 10, true);
5jedis.setbit("visits:2015-01-02", 11, true);
6
7boolean value10 = jedis.getbit("visits:2015-01-01", 10); // true
8boolean value15 = jedis.getbit("visits:2015-01-02", 15); // false

The BITCOUNT command returns the number of bits marked as 1 in a Bitmap.

1long count20150101 = jedis.bitcount("visits:2015-01-01"); // 2
2long count20150102 = jedis.bitcount("visits:2015-01-02"); // 2

The BITOP command requires a bitwise operation, a destination key, and a list of keys to apply to that operation and store the result in the destination key.

1long bitOp = jedis.bitop(BitOP.OR, "totalUser", "visits:2015-01-01", "visits:2015-01-02"); // 2
2long totalUserCount = jedis.bitcount("totalUser"); // 3

Web Analytics

This section creates a simple web analytics system to save and count daily user visits to a website and then retrieve user IDs from the visits on a given date.

Define functions for analysis.

 1/**
 2 * Mark user with given {@code userId} as visited for given date, {@code date}(in format {@code yyyy-MM-dd})
 3 */
 4public void storeDailyVisit(Jedis client, String date, long userId) {
 5    var key = getKey("visits:daily:" + date);
 6    client.setbit(key, userId, true);
 7}
 8
 9/**
10 * @return number of visits on given date
11 */
12public long countVisits(Jedis client, String date) {
13    var key = getKey("visits:daily:" + date);
14    return client.bitcount(key);
15}
16
17/**
18 * @return a list of user IDs that visited on given date
19 */
20public List<String> showUserIdsFromVisit(Jedis client, String date) {
21    var key = getKey("visits:daily:" + date);
22    String value = client.get(key);
23    byte[] bytes = value.getBytes(StandardCharsets.UTF_8);
24    List<String> userIds = new ArrayList<>();
25    for (var byteIdx = 0; byteIdx < bytes.length; byteIdx++) {
26        byte b = bytes[byteIdx];
27        for (var bitIdx = 7; bitIdx >= 0; bitIdx--) {
28            int visited = b >> bitIdx & 1;
29            if (visited == 1) {
30                userIds.add(String.valueOf((byteIdx * 8) + (7 - bitIdx)));
31            }
32        }
33    }
34    return userIds;
35}

Demonstrate how the functions be used.

1storeDailyVisit(jedis, "2015-01-01", 1);
2storeDailyVisit(jedis, "2015-01-01", 2);
3storeDailyVisit(jedis, "2015-01-01", 10);
4storeDailyVisit(jedis, "2015-01-01", 55);
5
6long nVisits = countVisits(jedis, "2015-01-01"); // 4
7List<Integer> userIds = showUserIdsFromVisit(jedis, "2015-01-01"); // [ "1", "2", "10", "55" ]

HyperLogLogs

A HyperLogLog is not actually a real data type in Redis. Conceptually, a HyperLogLog is an algorithm that uses randomization in order to provide a very good approvimation of the number of unique elements that exist in a Set.

It is fascinating because it only runs in O(1), constant time, and uses a very small of memory - up to 12kB of memory per key.

Although technically a HyperLogLog is not a real data typ, we are going to consider it as one because Redis provides specific commands to manipulate Strings in order to calculate the cardinality of a set using the HyperLogLog algorithm.

The HyperLogLog algorithm is probabilistic, which means that it does not ensure 100 percent accuracy. The Redis implementation of the HyperLogLog has a standard error of 0.81 percent. In theory, there is no practical limit for the cardinality of the sets that can be counted.

The HyperLogLog algorithm was described originally in the paper HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm in 2007. {: .prompt-info }

Counting Unique Users

This section compares how much memory a HyperLogLog and a Set would need to count the unique visits to a given website per hour.

Let’s look at the following scenario: a website has an average of 100,000 unique visits per hour. Each user who visits the page is identified by a UUID, which is represented by a 32-byte string.

In order to store all unique visitors, a Redis key is created for every hour of a day. This means that in a day, there are 24 keys, and int a month there are 720(24*30) keys.

The following table shows how much memory, in theory, each data type would need to store 100,000 unique user visits in an hour, a day, and a month.

Data typeMemory in an hourMemory in a dayMemory in a month
HyperLogLog12 KB12 KB * 24 = 288 KB288 KB * 30 = 8.4 MB
Set32 bytes * 100000 = 3.2 MB3.2 MB * 24 = 76.8 MB76.8 MB * 30 = 2.25 GB

Let’s how it actually works in reality per hour. Let’s how it actually works in reality.

1for (var count = 0; count < 100_000; count++) {
2    String user = "userId:" + count;
3    jedis.pfadd("HyperLogLogKey", user);
4    jedis.sadd("SetLogLogKey", user);
5}
6
7double hyperLogLogMemoryInKB = jedis.memoryUsage("HyperLogLogKey") / 1024.0; // 12.5546875 KB
8double setMemoryInKB = jedis.memoryUsage("SetLogLogKey") / 1024.0; // 5442.359375 KB

Basics

The command PFDD adds one or many strings to a HyperLogLog and returns 1 if cardinality was changed and 0 if it remains the same.

1long pfadd1 = jedis.pfadd("visits:2015-01-01", "carl", "max", "hugo", "arthur"); // 1
2long pfadd2 = jedis.pfadd("visits:2015-01-01", "max", "hugo"); // 0
3long pfadd3 = jedis.pfadd("visits:2015-01-02", "max", "kc", "hugo", "renata"); // 1

The command PFCOUNT accepts one or many keys as arguments. When a single argument is specified, it returns the approximate cardinality. When multiple keys are specified, it returns the approximate cardinality of the union of all unique elements.

1long count20150101 = jedis.pfcount("visits:2015-01-01"); // 4
2long count20150102 = jedis.pfcount("visits:2015-01-02"); // 4
3long countAll = jedis.pfcount("visits:2015-01-01", "visits:2015-01-02"); // 6

The command PFMERGE requires a destination key and one or many HyperLogLog keys are arguments. It merges all the specified HyperLogLogs and stores the result in the destination key.

1String pfmerge = jedis.pfmerge("visits:total", "visits:2015-01-01", "visits:2015-01-02"); // "OK"
2long countMerge = jedis.pfcount("visits:total");  // 6

If the destination variable exists, it is treated as one of the source sets and its cardinality will be included in the cardinality of the computed HyperLogLog. {: .prompt-warning }

Counting and Retrieving Unique Website Visits

This section extends the previous example and adds an hour as granularity. Later, it merges the 24 keys that represent each hour of a day into a single key. Define functions to register visits, count visits and merge multiple visits for a specific date.

 1/**
 2 * Register a unique visit
 3 *
 4 * @param date the date can be in {@code yyyy-MM-dd} or {@code yyyy-MM-ddTh} format.(for example, 2015-01-01, or 2015-01-01T2)
 5 */
 6public void addVisit(Jedis client, String date, String user) {
 7    String key = getKey("visits:" + date);
 8    client.pfadd(key, user);
 9}
10
11/**
12 * @return the number of unique visits on specific dates
13 */
14public long count(Jedis client, String... dates) {
15    List<String> keys = Arrays.stream(dates).map(date -> getKey("visits:" + date)).toList();
16    return client.pfcount(keys.toArray(new String[0]));
17}
18
19/**
20 * Merge the visits on a given data
21 */
22public void aggregateDate(Jedis client, String date) {
23    String dateKey = getKey("visits:" + date);
24    List<String> hourKeys = new ArrayList<>();
25    for (var i = 0; i < 24; i++) {
26        hourKeys.add(getKey("visits:" + date + "T" + i));
27    }
28    client.pfmerge(dateKey, hourKeys.toArray(new String[0]));
29}

Now let’s simulate 200 users visiting the page 1,000 times in a period of 24 hours.

 1for (var i = 0; i < 1_000; i++) {
 2    String username = "user_" + Math.floor(1 + Math.random() * 200);
 3    var hour = Math.floor(Math.random() * 24);
 4    addVisit(jedis, "2015-01-01T" + hour, username);
 5}
 6
 7long count1 = count(jedis, "2015-01-01T0"); // 32
 8long count2 = count(jedis, "2015-01-01T5", "2015-01-01T6", "2015-01-01T7"); // 88
 9aggregateDate(jedis, "2015-01-01");
10
11long count3 = count(jedis, "2015-01-01"); // 198