Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider supporting FIFO + message keys #294

Open
fggrtech opened this issue Aug 9, 2024 · 17 comments
Open

Consider supporting FIFO + message keys #294

fggrtech opened this issue Aug 9, 2024 · 17 comments
Labels
enhancement New feature or request

Comments

@fggrtech
Copy link

fggrtech commented Aug 9, 2024

Firstly, this project is real neat.

Any thoughts on supporting FIFO queues with message key values, similar to SQS FIFO + MessageGroupId ?

Riffing off your existing work, here are the prototype functions which illustrate the idea:

CREATE TYPE message_record AS (
    msg_id BIGINT,
    read_ct INTEGER,
    enqueued_at TIMESTAMP WITH TIME ZONE,
    vt TIMESTAMP WITH TIME ZONE,
    group_id UUID,
    message JSONB
);

CREATE FUNCTION create_queue(queue_name TEXT)
RETURNS void AS $$
BEGIN

  EXECUTE FORMAT(
    $QUERY$
    CREATE TABLE IF NOT EXISTS %s (
        msg_id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
        read_ct INT DEFAULT 0 NOT NULL,
        enqueued_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,
        vt TIMESTAMP WITH TIME ZONE NOT NULL,
        group_id UUID NOT NULL,
        message JSONB NOT NULL
    )
    $QUERY$,
    queue_name
  );

  EXECUTE FORMAT(
    $QUERY$
    CREATE INDEX IF NOT EXISTS %s_vt_idx ON %s (vt ASC);
    $QUERY$,
    queue_name, queue_name
  );

  EXECUTE FORMAT(
    $QUERY$
    CREATE INDEX IF NOT EXISTS %s_group_id_order ON %s (group_id, msg_id);
    $QUERY$,
    queue_name, queue_name
  );

END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION enqueue(
    queue_name TEXT,
    group_id UUID,
    msg JSONB,
    delay INTEGER DEFAULT 0
) RETURNS SETOF BIGINT AS $$
DECLARE
    sql TEXT;
BEGIN
    sql := FORMAT(
        $QUERY$
        INSERT INTO %s (vt, group_id, message)
        VALUES ((clock_timestamp() + interval '%s seconds'), '%s'::uuid, $1)
        RETURNING msg_id;
        $QUERY$,
        queue_name, delay, group_id
    );
    RETURN QUERY EXECUTE sql USING msg;
END;
$$ LANGUAGE plpgsql;

CREATE FUNCTION dequeue(
    queue_name TEXT,
    vt INTEGER,
    qty INTEGER
)
RETURNS SETOF message_record AS $$
DECLARE
    sql TEXT;
BEGIN
    sql := FORMAT(
        $QUERY$

        WITH cte0 AS 
        (
          SELECT group_id, MIN(msg_id) AS msg_id
          FROM %s
          GROUP BY group_id
        ),
        cte1 AS
        (
          SELECT t1.msg_id AS msg_id
          FROM %s AS t1
          JOIN cte0 AS t2
            ON t1.group_id = t2.group_id
            AND t1.msg_id = t2.msg_id
          WHERE vt <= clock_timestamp()
          ORDER BY msg_id ASC
          LIMIT $1
          FOR UPDATE SKIP LOCKED
        )

        UPDATE %s m
        SET
            vt = clock_timestamp() + interval '%s seconds',
            read_ct = read_ct + 1
        FROM cte1
        WHERE m.msg_id = cte1.msg_id
        RETURNING m.msg_id, m.read_ct, m.enqueued_at, m.vt, m.group_id, m.message;
        $QUERY$,
        queue_name, queue_name, queue_name, vt
    );
    RETURN QUERY EXECUTE sql USING qty;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM create_queue('test');

-- Two enqueued messages with the same group id. These should respect FIFO rules.
SELECT * FROM enqueue('test', '0470da73-c6a4-4273-aa31-f48b05555539', '{"email-address": "[email protected]"}');
SELECT * FROM enqueue('test', '0470da73-c6a4-4273-aa31-f48b05555539', '{"email-address": "[email protected]", "phone-number": 5551112222}');

-- Third enqueued message with different group id.
SELECT * FROM enqueue('test', gen_random_uuid(), '{"email-address": "[email protected]}");

-- This should return message id 1 and 3. Message id 2 will NOT be available until 1 has been deleted.
SELECT * FROM dequeue('test', 5, 5);

Thoughts?

@ChuckHend
Copy link
Member

Hi @fggrtech , thank you. I'd like to see features like this make it into the project too.

I'll have to think a bit about how this would impact the existing queue APIs and tables. I don't think we'd want to break the simple queue structure that currently exists and ideally we could find a way to add this functionality without adding overhead or requiring the usage of the new keys, but worst case we add a new API specifically for FIFO.

@tazmaniax
Copy link

tazmaniax commented Sep 20, 2024

I'm looking for exactly this functionality. Does the group ID need to be a UUID or can it be just some arbitrary text?

@fggrtech
Copy link
Author

I'm looking for exactly this functionality. Does the group ID need to be a UUID or can it be just some arbitrary text?

I think that is one of difficult questions about this feature request. A message id means different things based on the use context. My personal case is a uuid, but i would assume it would be more generic like a Kafka message key - bytes or a string field.

@nimrodkor
Copy link
Contributor

nimrodkor commented Oct 31, 2024

Hey @ChuckHend !

We've run into this too. I like the approach of not breaking existing queues, so I'd like to suggest a new create_fifo_queue, send_fifo_message and read_fifo_message methods that will support the message group ID and ordering requirements.
WDYT?

Agreeing with the above that the group ID should be TEXT, allowing the implementor to choose whether he should stringify his UUID or put in some other value

Up for a contribution?

CC: @gruebel

@v0idpwn
Copy link
Collaborator

v0idpwn commented Oct 31, 2024

Hi.

That's me (and not PGMQ), but I feel like we should keep the same interface for send. The main reason is that we'd like to support exchanges in the future, and exchanges should be able to push to all queue types. Currently, all queue types have the same interface to push, so there's no problem doing that.

A way to achieve that would be:

  1. Adding headers support (in our list for v1.5.0)
  2. Using a special header for FIFO, e.g.: "pgmq_fifo_key"
  3. Having a custom read function which respects header-based FIFO (we can also make the key header parameterizable here, and let people use any header for that in step 2)

If we really wanted, we could also keep the same interface for read. But this would imply an additional check (for queue type) on every read. This is a bit hard to write in a performant way.

@ChuckHend
Copy link
Member

But this would imply an additional check (for queue type) on every read. This is a bit hard to write in a performant way.

@v0idpwn , @nimrodkor - yes this would be a subsequent read to pgmq.meta on every pgmq.read() call. Are there ANY scenarios where someone might want to do a regular pgmq.read() from a FIFO queue? If not, then I don't mind if we add a function for pgmq.read_fifo().

We already have pgmq.create_unlogged(), so adding a pgmq.create_fifo() doesn't seem so bad, but we may want to refactor both of those to pgmq.create() in v2.0.

@v0idpwn
Copy link
Collaborator

v0idpwn commented Nov 1, 2024

Proposed api:

-- We create the queue in the same way. It's a regular queue
select pgmq.create('myqueue');

-- When sending messages, we can add a special header
select pgmq.send('myqueue', '"hello"'::jsonb, '{"x-pgmq-fifo": "ordering-key"}'::jsonb);
select pgmq.send('myqueue', '"fifo"'::jsonb, '{"x-pgmq-fifo": "ordering-key"}'::jsonb);

-- We can get multiple messages of the same ordering key on a query
select msg_id, msg, headers from pgmq.read_fifo('myqueue', 10, 10);
| msg_id | msg       | headers                           |
+--------+-----------+-----------------------------------|
|      1 | '"hello"' | '{"x-pgmq-fifo": "ordering-key"}' |
|      2 | '"fifo"'  | '{"x-pgmq-fifo": "ordering-key"}' |

 
select pgmq.send('myqueue', '"bye"'::jsonb, '{"x-pgmq-fifo": "ordering-key"}'::jsonb);
select pgmq.send('myqueue', '"world"'::jsonb, '{"x-pgmq-fifo": "another-ordering-key"}'::jsonb);

-- One the second select, we don't get the `"bye"` message because it's on FIFO with the `"ordering-key"` key, which still
-- has undeleted messages, even if they are not visible
select msg_id, msg, headers from pgmq.read_fifo('myqueue', 10, 10);
| msg_id | msg       | headers                                   |
+--------+-----------+-------------------------------------------+
|      4 | '"world"' | '{"x-pgmq-fifo": "another-ordering-key"}' |

SELECT PG_SLEEP(10);

-- After the VT is expired or messages are deleted, we can read the messages behind in the queue
select msg_id, msg, headers from pgmq.read_fifo('myqueue', 10, 10);
| msg_id | msg       | headers                                   |
+--------+-----------+-------------------------------------------+
|      1 | '"hello"' | '{"x-pgmq-fifo": "ordering-key"}'         |
|      2 | '"fifo"'  | '{"x-pgmq-fifo": "ordering-key"}'         |
|      3 | '"bye"'   | '{"x-pgmq-fifo": "ordering-key"}'         |
|      4 | '"world"' | '{"x-pgmq-fifo": "another-ordering-key"}' |

@ChuckHend
Copy link
Member

@fggrtech , @tazmaniax , @nimrodkor - what are your thoughts on the API proposed above? Do you think this will meet your requirements?

@v0idpwn
Copy link
Collaborator

v0idpwn commented Nov 1, 2024

Important notes:

  • if you pgmq.read instead of pgmq.read_fifo, you may get out of order messages
  • messages without a fifo key should be treated as if they had a default order. key So if you have messages 1, 2, 3 and 4 without a header, read 1 and 2, you can't read 3 and 4 until 1 and 2 are "completed"

@fgasperino
Copy link

@ChuckHend Works for me.

@kevbook
Copy link

kevbook commented Nov 10, 2024

This makes sense. Correct if I'm wrong below:

If I have a user-A (fifo-group-id = user-A), and the user has 10 messages, and I'm reading them 1 by 1 because ordering matters:

  • On any read, I'm only going to get 1st message (or no message from this fifo-group-id = user-A) since visibility timeout has not passed, and hence maintaining FIFO
  • If message 1 processing fails (ie. visibility timeout has passed, or I've set it to be 0 via set_vt when code throws an error), and we re-read, we'll get 1st message?

@v0idpwn
Copy link
Collaborator

v0idpwn commented Nov 10, 2024

That's right, @kevbook!

@tazmaniax
Copy link

tazmaniax commented Nov 23, 2024

@ChuckHend , the API looks good. Will it work similarly to SQS FIFO in that if fewer than the read quantity of messages are available for the same message group ID, a read may include messages from other message group IDs in the same batch, but each group retains FIFO order?

Do you have some idea when this might be available and published to dbdev for easy deployment to Supabase? FIFO with group id is critical for my use case. Let me know if there is something I can do to help.

@ChuckHend
Copy link
Member

Will it work similarly to SQS FIFO

The intention I think is to make it similar to SQS, though I think that is still TBD. What would be your preference?

Regarding dbdev, perhaps @olirice or someone from the Supabase team could chime in. I am not sure who is responsible for keeping https://database.dev/plpgsql/pgmq up to date.

@olirice
Copy link
Contributor

olirice commented Dec 2, 2024

@tazmaniax if you upgrade your instance you'll find that Supabase quietly ships pgmq ao last week
if you keep an eye on launch week 13 (this week), who knows, maybe it'll come up!

if you have previously installed pgmq via dbdev you'll want to remove/drop that out before upgrading or the schema names will clash when you try to enable it on your new instance with create extension pgmq

@olirice
Copy link
Contributor

olirice commented Dec 2, 2024

https://database.dev/plpgsql/pgmq

the user plpgsql isn't anyone on the supabase team, that is a public account. anyone can upload packages so if you're not happy with the frequency that its being updated you could maintain one

though in this case it would probably be most reliable to copy/paste the SQL as dbdev is still in flux

@tazmaniax
Copy link

tazmaniax commented Dec 5, 2024

@olirice I've just done the upgrade, and yes, pgmq is there; that's great and thanks for the heads up.
@ChuckHend receiving multiple group-id messages in a batch read, respecting any FIFO restrictions, would be ideal and keeping parity with SQS is a plus for portability. This project is amazing and a big thanks to all of the contributers!
CleanShot 2024-12-05 at 15 54 45@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants