Postgres ID Generator with Safe JSON Numbers
Before reading this post, I recommend you to read A Better ID Generator For PostgreSQL written by Rob Conery. To recap briefly, their post had discussed:
- The problem you will face in the rapid grow of your systems if you used GUID as the primary key.
- How Twitter generating auto-incrementing keys with Snowflake.
- Create a functional Snowflake equivalent for PostgresSQL.
And the third point above will be the focus of discussion in this post.
The Algorithm
create schema shard_1;
create sequence shard_1.global_id_sequence;
CREATE OR REPLACE FUNCTION shard_1.id_generator(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
-- the id of this DB shard, must be set for each
-- schema shard you have - you could pass this as a parameter too
shard_id int := 1;
BEGIN
SELECT nextval('shard_1.global_id_sequence') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
select shard_1.id_generator();This algorithm will generate an integer in 64 bits consisting of:
+----------------+---------------+-------------+
| Timestamp (41) | Shard ID (13) | Seq ID (10) |
+----------------+---------------+-------------+
|<-------------- Unique ID (64) -------------->|| Part | Bits | Desc. |
|---|---|---|
| Timestamp | 41 | Max valid date should be timestamp_to_date((2^41 - 1 + 1314220021721)/1000), i.e. 2081-04-30 |
| Shard ID | 13 | Support having 2^13 = 8192 shards |
| Seq ID | 10 | Max 2^10 = 1024 ops/ms |
Here we can expand Seq ID part to gain the ablility of supporting more operations per ms. But as a trade off, the Shard ID will be shrunk.
For example, if we used 10 bits for Shard ID and 13 bits for Seq ID, we will allow 8192 write operations per ms. Which should be 8M ops/s. However, as a reminder, this 8M ops/s is evenly distributed over 1s. Which means when your system had a writing spike with over 8192 writes in the same millisecond, it can fail.
Safe Number Problem in JSON
The above algorithm generates a 64-bit integer. In practice, it's not "JSON comaptible". Because in JSON world, number is a "double-like" type. And there's in fact a limitation at JavaScript/ECMAScript level of precision to 53-bit for integers. The maximum safe integer in JavaScript is 2^53 - 1. You can check it by running the following JS code in your browser:
Number.MAX_SAFE_INTEGER; // 9007199254740991
Number.isSafeInteger(9007199254740992); // falseSo, How we work with 64-bit integers in our API and JavaScript?
I have two solutions here:
- Encode a 64-bit integer to text by using binary-to-text encoding methods, e.g. Base58, etc.
- Generate "safe" integers, which only occupies the lower order 53-bits.
Let's talk about the second solution by tweaking the algorithm above to generate "safe" unique IDs.
Generate "JSON-Safe" 53-bit Unique IDs for PostgresSQL
| Opt. | Bits (time,shard,seq) | Epoch Offset | Timestamp | Shard | Seq / TPS |
|---|---|---|---|---|---|
| 1 | (41, 3, 9) | 946656000000 | Max 2069-09-06 |
8 | 512 ops/ms |
| 2 | (41, 4, 8) | 946656000000 | Max 2069-09-06 |
16 | 256 ops/ms |
| 3 | (41, 5, 7) | 946656000000 | Max 2069-09-06 |
32 | 128 ops/ms |
| 4 | (31, 5, 17) | 946656000 | Max 2068-01-19 |
32 | 131072 ops/s |
| 5 | (31, 6, 16) | 946656000 | Max 2068-01-19 |
64 | 65536 ops/s |
| 6 | (31, 7, 15) | 946656000 | Max 2068-01-19 |
128 | 32768 ops/s |
| 7 | (32, 5, 16) | 0 | Max 2106-02-07 |
32 | 65536 ops/s |
| 8 | (32, 6, 15) | 0 | Max 2106-02-07 |
64 | 32768 ops/s |
Here are some points you need to consider:
- Use
sormsfor the timestamp?- Quick Answer:
msfor tighter distributions of write operations, andsis more flexible to handle write spikes.
- Quick Answer:
- Set epoch offset or not?
- Quick Answer: It's a must if using
msfor timestamp. Otherwise optional. Better not.
- Quick Answer: It's a must if using
- How many shards should I have?
- Quick Answer: 16 is sufficient for small and medium applications.
- TPS considerations?
- Quick Answer: think of how many write operations your application needs, and the performance (TPS) of your database/shard.
Here's the code for the 7th option:
CREATE SEQUENCE public.global_id_sequence;
CREATE OR REPLACE FUNCTION public.id_generator(OUT result bigint) AS $$
DECLARE
now_seconds bigint;
shard_id int := 1;
seq_id bigint;
BEGIN
SELECT nextval('public.global_id_sequence') % 65536 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM CLOCK_TIMESTAMP())) INTO now_seconds;
result := now_seconds << 21; -- Shard(5) + Seq(16)
result := result | (shard_id << 16); -- Seq(16)
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
SELECT public.id_generator();