Postgres ID Generator with Safe JSON Numbers
Before reading this post, I recommend you to read A Better ID Generator For PostgreSQL written by Rob Conery. To recap briefly, their post had discussed:
- The problem you will face in the rapid grow of your systems if you used GUID as the primary key.
- How Twitter generating auto-incrementing keys with Snowflake.
- Create a functional Snowflake equivalent for PostgresSQL.
And the third point above will be the focus of discussion in this post.
The Algorithm
create schema shard_1;
create sequence shard_1.global_id_sequence;
CREATE OR REPLACE FUNCTION shard_1.id_generator(OUT result bigint) AS $$
DECLARE
our_epoch bigint := 1314220021721;
seq_id bigint;
now_millis bigint;
-- the id of this DB shard, must be set for each
-- schema shard you have - you could pass this as a parameter too
shard_id int := 1;
BEGIN
SELECT nextval('shard_1.global_id_sequence') % 1024 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM clock_timestamp()) * 1000) INTO now_millis;
result := (now_millis - our_epoch) << 23;
result := result | (shard_id << 10);
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
select shard_1.id_generator();
This algorithm will generate an integer in 64
bits consisting of:
+----------------+---------------+-------------+
| Timestamp (41) | Shard ID (13) | Seq ID (10) |
+----------------+---------------+-------------+
|<-------------- Unique ID (64) -------------->|
Part | Bits | Desc. |
---|---|---|
Timestamp | 41 | Max valid date should be timestamp_to_date((2^41 - 1 + 1314220021721)/1000) , i.e. 2081-04-30 |
Shard ID | 13 | Support having 2^13 = 8192 shards |
Seq ID | 10 | Max 2^10 = 1024 ops/ms |
Here we can expand Seq ID
part to gain the ablility of supporting more operations per ms. But as a trade off, the Shard ID
will be shrunk.
For example, if we used 10
bits for Shard ID
and 13
bits for Seq ID
, we will allow 8192
write operations per ms. Which should be 8M ops/s
. However, as a reminder, this 8M ops/s
is evenly distributed over 1s. Which means when your system had a writing spike with over 8192 writes in the same millisecond, it can fail.
Safe Number Problem in JSON
The above algorithm generates a 64-bit integer. In practice, it's not "JSON comaptible". Because in JSON world, number is a "double-like" type. And there's in fact a limitation at JavaScript/ECMAScript level of precision to 53-bit for integers. The maximum safe integer in JavaScript is 2^53 - 1
. You can check it by running the following JS code in your browser:
Number.MAX_SAFE_INTEGER; // 9007199254740991
Number.isSafeInteger(9007199254740992); // false
So, How we work with 64-bit integers in our API and JavaScript?
I have two solutions here:
- Encode a 64-bit integer to text by using binary-to-text encoding methods, e.g. Base58, etc.
- Generate "safe" integers, which only occupies the lower order 53-bits.
Let's talk about the second solution by tweaking the algorithm above to generate "safe" unique IDs.
Generate "JSON-Safe" 53-bit Unique IDs for PostgresSQL
Opt. | Bits (time,shard,seq) | Epoch Offset | Timestamp | Shard | Seq / TPS |
---|---|---|---|---|---|
1 | (41, 3, 9) | 946656000000 | Max 2069-09-06 |
8 | 512 ops/ms |
2 | (41, 4, 8) | 946656000000 | Max 2069-09-06 |
16 | 256 ops/ms |
3 | (41, 5, 7) | 946656000000 | Max 2069-09-06 |
32 | 128 ops/ms |
4 | (31, 5, 17) | 946656000 | Max 2068-01-19 |
32 | 131072 ops/s |
5 | (31, 6, 16) | 946656000 | Max 2068-01-19 |
64 | 65536 ops/s |
6 | (31, 7, 15) | 946656000 | Max 2068-01-19 |
128 | 32768 ops/s |
7 | (32, 5, 16) | 0 | Max 2106-02-07 |
32 | 65536 ops/s |
8 | (32, 6, 15) | 0 | Max 2106-02-07 |
64 | 32768 ops/s |
Here are some points you need to consider:
- Use
s
orms
for the timestamp?- Quick Answer:
ms
for tighter distributions of write operations, ands
is more flexible to handle write spikes.
- Quick Answer:
- Set epoch offset or not?
- Quick Answer: It's a must if using
ms
for timestamp. Otherwise optional. Better not.
- Quick Answer: It's a must if using
- How many shards should I have?
- Quick Answer: 16 is sufficient for small and medium applications.
- TPS considerations?
- Quick Answer: think of how many write operations your application needs, and the performance (TPS) of your database/shard.
Here's the code for the 7th option:
CREATE SEQUENCE public.global_id_sequence;
CREATE OR REPLACE FUNCTION public.id_generator(OUT result bigint) AS $$
DECLARE
now_seconds bigint;
shard_id int := 1;
seq_id bigint;
BEGIN
SELECT nextval('public.global_id_sequence') % 65536 INTO seq_id;
SELECT FLOOR(EXTRACT(EPOCH FROM CLOCK_TIMESTAMP())) INTO now_seconds;
result := now_seconds << 21; -- Shard(5) + Seq(16)
result := result | (shard_id << 16); -- Seq(16)
result := result | (seq_id);
END;
$$ LANGUAGE PLPGSQL;
SELECT public.id_generator();