Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling multiple functions together #374

Open
ccleve opened this issue Aug 11, 2023 · 7 comments
Open

Compiling multiple functions together #374

ccleve opened this issue Aug 11, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@ccleve
Copy link

ccleve commented Aug 11, 2023

I have a fairly large pgrx extension that implements an index access method. I'd love to run it on AWS RDS or Aurora, but can't because they don't support untrusted extensions.

Theoretically, I could get rid of the unsafe code, publish it as a crate, and include it as a dependency in one or more PL/Rust functions. The difficulty is that I have a large number of pg_extern functions, and I expect that each would have to be added as a PL/Rust function that just calls a function in the crate. How would this work, though, if each PL/Rust function gets compiled independently? Is there some provision for sharing dependencies among multiple functions, and compiling them all together into a single large extension?

@workingjubilee workingjubilee changed the title Big ask: convert pgrx extension to pl/rust Compiling multiple functions together Aug 11, 2023
@workingjubilee
Copy link
Contributor

@workingjubilee workingjubilee added the enhancement New feature or request label Aug 11, 2023
@eeeebbbbrrrr
Copy link
Contributor

Yeah. I have a big extension that implements the IAM API too. I can relate.

The tl;dr version is that this isn’t going to work as you’re not going to be able to eliminate 100% of the unsafe blocks. Just defining the amhandler function would need to be unsafe. So it’s kinda DOA.

What might be interesting is for plrust the extension to somehow expose an IAM wrapper that could be implemented safely with “LANGUAGE plrust” functions.

I haven’t put any thought into what that’d look like or how practical it would be, but it seems like a tractable idea. The IAM API is pretty simple.

How/where does your index store data?

Maybe this is an idea we can discuss more. I cannot predict what AWS might allow but that shouldn’t stop us from thinking about this more.

@ccleve
Copy link
Author

ccleve commented Aug 11, 2023

I write pages directly into the indexrel using some C code that calls ReadBuffer / BufferGetPage. Yes, that would be another problem to solve. AWS shouldn't object to it, though, because it doesn't write to disk directly. I posted this a while ago, but never followed up: pgcentralfoundation/pgrx#294

I note that pg_tle does allow for hooks: https://github.com/aws/pg_tle/blob/main/docs/04_hooks.md, which presumably means that AWS does not object to callbacks in general. An amhandler is just another callback.

I do wonder what Aurora has done under the hood, and how much of the underlying data access code they have swapped out for something else. That may limit how deep they'll let us go.

@eeeebbbbrrrr
Copy link
Contributor

AWS shouldn't object to it, though, because it doesn't write to disk directly

I can't speak for them but I think they'd object to anything that can't be declared through plain-text DDL. No way they're gonna allow "some C code" on an RDS instance.

I haven't spent any time with pg_tle, so I'm completely unfamiliar with what it does and how it works.


I had a thought trying to get to sleep last night. Are you familiar with (what used to be) multicorn (https://multicorn.org)? I wonder if we built something similar as a pgrx-based extension to allow implementing the IAM API using any "CREATE FUNCTION" function, regardless of language. That's kinda an evolution of my idea above to build something into plrust.

There'd be hurdles around sharing the IAM argument/return type definitions in a cross-function/cross-language way, but it seems doable. And we'd definitely need to get safe wrappers in pgrx around all of Postgres internal Buffer/Page management stuff.

@ccleve
Copy link
Author

ccleve commented Aug 11, 2023

I can't speak for them but I think they'd object to anything that can't be declared through plain-text DDL. No way they're gonna allow "some C code" on an RDS instance.

No, I wouldn't think so. Pgrx would have to wrap it, as you suggest. The tricky thing is that we might have to add a higher-level function around the Buffer/Page stuff, because the pg_guard overhead of calling from Rust to C is significant. Here is one of the functions that I use:

/*
 * Append a page to the end of the file and write the data to it.
 */
uint32_t rdb_append_page(Relation rel, char *data) {
    LockRelationForExtension(rel, ExclusiveLock);

    Buffer buf = ReadBuffer(rel, P_NEW);
    BlockNumber newblk = BufferGetBlockNumber(buf);

    LockBuffer(buf, BUFFER_LOCK_EXCLUSIVE);

    START_CRIT_SECTION();
    Page page = BufferGetPage(buf);
    memcpy(page, data, BLCKSZ); // overwrite page with data
    MarkBufferDirty(buf);
    UnlockReleaseBuffer(buf);
    END_CRIT_SECTION();

    /* another method:
    // number of first block past end
    BlockNumber blocknum = smgrnblocks(rel->rd_smgr, MAIN_FORKNUM);
    smgrwrite(rel->rd_smgr, MAIN_FORKNUM, blocknum, data, false);
    */

    UnlockRelationForExtension(rel, ExclusiveLock);
    return newblk;
}

I'm happy to write a bunch of wrapper functions and do a PR, if you like.

I haven't spent any time with pg_tle, so I'm completely unfamiliar with what it does and how it works.

I'm confused -- I thought that was what PL/Rust was based on?
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/PostgreSQL_trusted_language_extension.html
https://github.com/aws/pg_tle

I'm not familiar with multicorn. What does it do that is different?

@ccleve
Copy link
Author

ccleve commented Aug 11, 2023

I'm not familiar with multicorn. What does it do that is different?

Never mind. I get it. My only hesitation with this approach is the performance overhead. If we can do a zero-cost abstraction, fine, but if it slows down any function that needs to get called thousands of times per query then it will be a problem. amhandler->amgettuple would be such a function.

@eeeebbbbrrrr
Copy link
Contributor

I'm happy to write a bunch of wrapper functions and do a PR, if you like.

This is probably something large enough to warrant sketching out a bit first before writing a bunch of code that we might not end up merging. I haven't put a lot of thought into safe wrappers around buffers and pages and such, so I don't even have a clue as to what it might look like.

I'm confused -- I thought that was what PL/Rust was based on?

pg_tle is an AWS thing for packaging pure SQL as an "extension". I suppose the technical reason it exists is because RDS users don't have access to write their extension.sql file to the $(pg_config --sharedir)/extension/ directory. And I guess it provides some SQL-level hooks for other certain things.

For plrust, RDS installs the actual plrust compiled extension -- it's unrelated to pg_tle. Tho I suppose you can use pg_tle to package and manage a SQL extension that uses LANGUAGE plrust functions.

We do have some contacts there and some of them hang out on our discord (I think you're a member, yeah?). So it might be beneficial to bring this idea up in one of the #plrust channels. They've given us a few PRs here for plrust.

if it slows down any function that needs to get called thousands of times per query then it will be a problem

Sure. I mean, initially, that could conflict with "I'd love to run it on AWS RDS or Aurora" tho. Getting something working is probably step one. Then we can sort out bottlenecks like per-call overhead.

I feel like if someone were to invent a solid pgrx-based extension that provides an abstract (and safe) API for implementing an IAM through external functions (regardless of language), then the cloud providers would take notice. At that point they'd either adopt it as-is or start offering help to make it ready for their hosted environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants