Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nixos/postgresql: deprecate ensurePermissions, fix ensureUsers for postgresql15 #266270

Merged
merged 8 commits into from
Nov 17, 2023

Conversation

Ma27
Copy link
Member

@Ma27 Ma27 commented Nov 8, 2023

The following things need to be done:

  • Resolve merge conflict
  • Ensure commits don't introduce unexpected default database name change and gate them behind stateVersion
  • Write release notes
  • Ensure all tests running on master are fine
  • Discuss the overall approach: technically it's a breaking change, not only because of ensurePermissions being removed, but also because of the behavioral changes.
  • Discuss alternatives
    • rolling back the default postgres - it was apparently merged in without checking the impact properly

Closes #216989

First of all, a bit of context: in PostgreSQL, newly created users don't have the CREATE privilege on the public schema of a database even with ALL PRIVILEGES granted via ensurePermissions which is how most of the DB users are currently set up "declaratively"[1]. This means e.g. a freshly deployed Nextcloud service will break early because Nextcloud itself cannot CREATE any tables in the public schema anymore.

The other issue here is that ensurePermissions is a mere hack. It's effectively a mixture of SQL code (e.g. DATABASE foo is relying on how a value is substituted in a query. You'd have to parse a subset of SQL to actually know which object are permissions granted to for a user).

After analyzing the existing modules I realized that in every case with a single exception[2] the UNIX system user is equal to the db user is equal to the db name and I don't see a compelling reason why people would change that in 99% of the cases. In fact, some modules would even break if you'd change that because the declarations of the system user & the db user are mixed up[3].

So I decided to go with something new which restricts the ways to use ensure* options rather than expanding those[4]. Effectively this means that

  • The DB user must be equal to the DB name.
  • Permissions are granted via ensureDBOwnerhip for an attribute-set in ensureUsers. That way, the user is actually the owner and can perform CREATE.
  • For such a postgres user, a database must be declared in ensureDatabases.

For anything else, a custom state management should be implemented. This can either be initialScript, doing it manual, outside of the module or by implementing proper state management for postgresql[5], but the current state of ensure* isn't even declarative, but a convergent tool which is what Nix actually claims to not do.

Regarding existing setups: there are effectively two options:

  • Leave everything as-is (assuming that system user == db user == db name): then the DB user will automatically become the DB owner and everything else stays the same.

  • Drop the createDatabase = true; declarations: nothing will change because a removal of ensure* statements is ignored, so it doesn't matter at all whether this option is kept after the first deploy (and later on you'd usually restore from backups anyways).

    The DB user isn't the owner of the DB then, but for an existing setup this is irrelevant because CREATE on the public schema isn't revoked from existing users (only not granted for new users).

[1] not really declarative though because removals of these statements
are simply ignored for instance: #206467
[2] services.invidious: I removed the ensure* part temporarily
because it IMHO falls into the category "manage the state on your
own" (see the commit message). See also
#265857
[3] e.g. roundcube had "DATABASE ${cfg.database.username}" = "ALL PRIVILEGES"; [4] As opposed to other changes that are considered a potential fix, but
also add more things like collation for DBs or passwords that are
never touched again when changing those.
[5] As suggested in e.g. #206467

@github-actions github-actions bot added 6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: module (update) This PR changes an existing module in `nixos/` labels Nov 8, 2023
@ofborg ofborg bot added the 2.status: merge conflict This PR has merge conflicts with the target branch label Nov 8, 2023
@Ma27 Ma27 marked this pull request as draft November 8, 2023 15:40
@ambroisie
Copy link
Contributor

What about making ensureDBOwnerhip a list, and asserting that there is no overlap across users?

This would make the migration easier for users that do have a different DB name and DB owner (one example I've seen multiple times, including in my own config, is having gitea use the git user to make the SSH address "prettier").

@Ma27 Ma27 force-pushed the postgresql-ownership-15 branch from 65fe7b2 to 02ae527 Compare November 9, 2023 22:35
@ofborg ofborg bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10 and removed 2.status: merge conflict This PR has merge conflicts with the target branch labels Nov 9, 2023
@Ma27
Copy link
Member Author

Ma27 commented Nov 10, 2023

You don't need to migrate existing systems, just remove ensureUsers and ensurePermissions. The removal of those options is ignored (part of why they're broken in the first place).

Anyways, I'll be gone for two weeks now, so the call on what to do has to make someone else anyways.

Copy link
Contributor

@bendlas bendlas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on, it has also been bugging me!

So, if this were to go in as-is, I'd request that we do it after release, to give the ecosystem time to stablilize.

However, I'd prefer to stay backwards-compatible (with out-of-tree users of services.postgres) anyway, and if we turn usage of ensurePermissions into a warning, I'd be more than OK with trying to squeeze this into the release ...

@ambroisie
Copy link
Contributor

You don't need to migrate existing systems, just remove ensureUsers and ensurePermissions. The removal of those options is ignored (part of why they're broken in the first place).

I kind of disagree, if possible we should make it so people are able to actually migrate their current configuration to one that has the same semantics.

Otherwise we could just remove the ensure* options and tell everybody that this is state that has to be managed by hand.

Having it be in the configuration makes it easier to e.g: redeploy from a backup. And it isn't really any harder to make it so the new option can accommodate existing setups for migration.

@bendlas
Copy link
Contributor

bendlas commented Nov 10, 2023

What about making ensureDBOwnerhip a list, and asserting that there is no overlap across users?

Since a postgres DB can have only one owner, this would also be an opportunity to instead have ensureDatabases.ensureOwner = "user";. (Except ensureDatabases is a list of strings.)

Which leads to the next point: Is it worth (re-)thinking how it should look like to ensure preconditions for postgres in nixos? Ensure preconditions for databases in nixos? Ensure preconditions in nixos?

This would make the migration easier for users that do have a different DB name and DB owner (one example I've seen multiple times, including in my own config, is having gitea use the git user to make the SSH address "prettier").

IMO having a system user be the owner of the database with same name is standardized enough, that I'd consider it default. Even to the point that having a different configuration could be considered a security yellow-flag, because that's what GRANT is for. Thinking about it, that actually validates ensureDBOwnership being a boolean.

But to address the elephant (haha, postgres, get it?), I think we should:

  • put most of the existing ensure* infrastructure on the legacy track
  • start with the "escape hatch" (except it's the DSL foundation)
  • continue with exploring a good DSL design (potentially even out-of-tree until stabilized)
    • potentially also look at the broader use case for ensure - semantics at NixOS level:
      • systemd.tmpfiles.rules, StateDirectory, ...
      • services.mysql and other schema-bearing DBs
      • Is this a case for extensibility through a lens type (read current state -> decide necessary updates -> write current state)?
  • ask users to inform the new DSL design, whenever they need to use the low-level extension point

@RaitoBezarius
Copy link
Member

Still reading here and thinking.

So, if this were to go in as-is, I'd request that we do it after release, to give the ecosystem time to stablilize.

Just FWIW, if we don't merge something, we will merge a broken set of NixOS modules w.r.t. to the new Postgres release, so doing nothing is not an option, so if we block this, someone has to propose a proper solution (so far, only Ma27 worked on the matter as one of the maintainers around…)

@RaitoBezarius
Copy link
Member

You don't need to migrate existing systems, just remove ensureUsers and ensurePermissions. The removal of those options is ignored (part of why they're broken in the first place).

I kind of disagree, if possible we should make it so people are able to actually migrate their current configuration to one that has the same semantics.

Otherwise we could just remove the ensure* options and tell everybody that this is state that has to be managed by hand.

Having it be in the configuration makes it easier to e.g: redeploy from a backup. And it isn't really any harder to make it so the new option can accommodate existing setups for migration.

Well, in practice, anyone saying this should probably look into fixing properly ensure options but I didn't find a lot of folks wanting to work on that yet and most people tend to do very partial proposals, see #206467, NixOS/rfcs#155.

I would be willing to hand hold someone who is willing to make a dent to this complicated class of problems but there is a long term and a short term. Short term is that people didn't maintain ensure options enough and we are in the situation where they are too broken for us to offer sane guarantees and reinvent them on a short notice. Long term is that we should reintroduce them while thinking at the big picture and by bringing proper engineering to those problems.

It is unfortunate, I like them too, but we cannot play around with data safety of databases, our users are the type of not doing backups and pushing the risk on "well you were supposed to have backups" doesn't seem an acceptable tradeoff IMHO.

The goal here is to drop sufficient options to have a workable ecosystem, if you believe you can re-introduce ensurePermissions with enough testing and proper stuff, please send such a PR, but be mindful of the engineering we need to test it properly.

@RaitoBezarius
Copy link
Member

You don't need to migrate existing systems, just remove ensureUsers and ensurePermissions. The removal of those options is ignored (part of why they're broken in the first place).

I kind of disagree, if possible we should make it so people are able to actually migrate their current configuration to one that has the same semantics.

Also, the right way to do this in such situation is to provide the equivalent snippet of code to do it, people who are expert enough with PostgreSQL very much knows how to use initialScript to perform the equivalent of an ensurePermissions, leaving users who doesn't know how to use PostgreSQL vulnerable.

We don't have silver bullets for those users, normally, the community would have worked towards solution in time to avoid this scenario.

From a release management perspective, what I see is the following:

  • Users who knows PostgreSQL are fine albeit maybe a bit annoyed they have to do manual things now
  • Users who doesn't know PostgreSQL either:
    • are installing a new machine and pick up a tutorial mentioning those directives → they don't exist anymore, they are confused and ragequit/are not happy about NixOS
    • are upgrading to this new version of NixOS → they don't exist anymore but we tell them to just remove the directives and it will work because they already installed the database and we warn them of migrating if they rely on those expressions.

Can we make this better in time for the branch-off?

From my POV, documentation, release notes and snippets to migrate away simple cases can be provided, but I don't believe we should offer an automatic solution because we cannot apply engineering practices to stabilize it in time for the branch-off, it is too late.

So I am in favor of:

  • using mkRemovedOption on it, pointing at some documentation explaining how to migrate
  • write a blurb in unstable
  • write a blurb in release notes for 23.11
  • write documentation in the manual on how to perform permission bestowing using initialScript
  • offer to anyone who is unhappy about that to do better for 24.05 and re-introduce it

Keeping it as a deprecated option does not seem useful to me because it is broken at the moment and no one offered a fix for it. Keeping a broken deprecated option seems weird to me, you can just remove it and fix it this way. Also, it forces people to take ownership of their convergence logic for their databases via initialScript.

In an ideal world, someone would have written tool I asked for multiple years that is to offer automatic ways to migrate deprecation / removals by new snippets, but it doesn't exist so users will have to do it manually, that's unfortunate but I hope I won't do it by myself someday :P.

@bendlas
Copy link
Contributor

bendlas commented Nov 10, 2023

Just FWIW, if we don't merge something, we will merge a broken set of NixOS modules w.r.t. to the new Postgres release

Oh, right, there was that (been a long week, sorry) .... for the record, I'm still in favor of just downgrading to 14 for this release, as per #262741 (review), to buy us more time to figure it out

  • merge this (to not block anyone wishing to update to postgres 15), but leave the existing options alone for now, for anyone stuck on 14. To me that sounds like it could minimize the collateral damage ..

do you find this agreeable if we revert #254268? cc @nbdd0121 @thoughtpolice @yu-re-ka

@nbdd0121
Copy link
Contributor

nbdd0121 commented Nov 10, 2023

do you find this agreeable if we revert #254268?

A full revert (including downgrade all packages depending on libpq), or just revert the nixos module part? Would

-base = if versionAtLeast config.system.stateVersion "23.11" then pkgs.postgresql_15
+base = if versionAtLeast config.system.stateVersion "24.05" then pkgs.postgresql_15

be sufficient?

(In an ideal world where we have #61580 then we can just revert the postgresql version without mass rebuild)

@ambroisie
Copy link
Contributor

Well, in practice, anyone saying this should probably look into fixing properly ensure options but I didn't find a lot of folks wanting to work on that yet and most people tend to do very partial proposals, see #206467, NixOS/rfcs#155.

Sorry, I didn't mean to imply that the proper fix is easy. I agree with you that there is a lot of work to be done in this area if we want a full-featured system.

Short term is that people didn't maintain ensure options enough and we are in the situation where they are too broken for us to offer sane guarantees and reinvent them on a short notice.

I don't think we need to re-invent them, approaches like this MR are good enough IMO to tide us over with the v15 update.

pushing the risk on "well you were supposed to have backups" doesn't seem an acceptable tradeoff IMHO.

👍

The goal here is to drop sufficient options to have a workable ecosystem

👍

if you believe you can re-introduce ensurePermissions with enough testing and proper stuff, please send such a PR, but be mindful of the engineering we need to test it properly.

That's not what I meant to imply, sorry if that's how I came across. I totally respect the fact that this is a hard problem and needs more engineering for a proper solution.

I simply wanted to high-light a part of the analysis in this MR which led to a design that was more restrictive than seemed necessary IMO, when making it more open does not add much complexity to the module.

If this were to be merged in as-is, I think it would indeed be good enough for 99% of cases.

I also agree with your analysis for the upcoming release, I think this is probably the best we can do in that timeframe.

Once again, I apologize if I appeared dismissive, that was not my intention in commenting on this MR. Thank you everybody for trying to solve the issue.

@bendlas
Copy link
Contributor

bendlas commented Nov 10, 2023

do you find this agreeable if we revert #254268?

A full revert (including downgrade all packages depending on libpq), or just revert the nixos module part? Would

-base = if versionAtLeast config.system.stateVersion "23.11" then pkgs.postgresql_15
+base = if versionAtLeast config.system.stateVersion "24.05" then pkgs.postgresql_15

be sufficient?

I believe that the issue only affects postgresql server, so a revert of the module default should be sufficient, since libpq is backwards compatible, right?

erikarvstedt added a commit to erikarvstedt/nix-bitcoin that referenced this pull request Dec 1, 2023
Since PostgreSQL 15, DB users need to be DB owners to be able to create tables.

We can't use the new `ensureDBOwnerhip` NixOS option [1] to set this up,
because it requires the PostgreSQL user name and the database name to be
identical, which is not the case for btcpayserver.

Instead, we manually issue a PostgreSQL admin statement similar to the one
used by `ensureDBOwnerhip`.

This method of setting up the user is also compatible with older
PostgreSQL versions that come with older NixOS `system.stateVersion`s.

[1] NixOS/nixpkgs#266270
erikarvstedt added a commit to erikarvstedt/nix-bitcoin that referenced this pull request Dec 1, 2023
Since PostgreSQL 15, DB users need to be DB owners to be able to create tables.

We can't use the new `ensureDBOwnerhip` NixOS option [1] to set this up,
because it requires the PostgreSQL user name and the database name to be
identical, which is not the case for btcpayserver.

Instead, we manually issue a PostgreSQL admin statement similar to the one
used by `ensureDBOwnerhip`.

This method of setting up the user is also compatible with older
PostgreSQL versions that come with older NixOS `system.stateVersion`s.

[1] NixOS/nixpkgs#266270
erikarvstedt added a commit to erikarvstedt/nix-bitcoin that referenced this pull request Dec 2, 2023
Since PostgreSQL 15, DB users need to be DB owners to be able to create tables.

We can't use the new `ensureDBOwnerhip` NixOS option [1] to set this up,
because it requires the PostgreSQL user name and the database name to be
identical, which is not the case for btcpayserver.

Instead, we manually issue a PostgreSQL admin statement similar to the one
used by `ensureDBOwnerhip`.

This method of setting up the user is also compatible with older
PostgreSQL versions that come with older NixOS `system.stateVersion`s.

[1] NixOS/nixpkgs#266270
wentasah added a commit to wentasah/nixpkgs that referenced this pull request Dec 3, 2023
Recent PR 266270[1] modified an assertion related to database settings
of the redmine service. There are two problems with that change:

1. Assert message was not updated to reflect the change in the assert
   condition.

2. The new condition applies only to postgresql, not the default
   mysql. Therefore, the assertion breaks existing mysql-based
   installations without any reason.

This commit fixes these by 1) reverting the modified assertion to the
previous value, making the message match the condition and 2) adding a
new assertion that applies only to postgresql.

[1]: NixOS#266270
github-actions bot pushed a commit that referenced this pull request Dec 4, 2023
Recent PR 266270[1] modified an assertion related to database settings
of the redmine service. There are two problems with that change:

1. Assert message was not updated to reflect the change in the assert
   condition.

2. The new condition applies only to postgresql, not the default
   mysql. Therefore, the assertion breaks existing mysql-based
   installations without any reason.

This commit fixes these by 1) reverting the modified assertion to the
previous value, making the message match the condition and 2) adding a
new assertion that applies only to postgresql.

[1]: #266270

(cherry picked from commit 8667baf)
WilliButz added a commit to nix-community/authentik-nix that referenced this pull request Dec 10, 2023
erikarvstedt added a commit to erikarvstedt/nix-bitcoin that referenced this pull request Dec 12, 2023
Since PostgreSQL 15, DB users need to be DB owners to be able to create tables.

We can't use the new `ensureDBOwnerhip` NixOS option [1] to set this up,
because it requires the PostgreSQL user name and the database name to be
identical, which is not the case for btcpayserver.

Instead, we manually issue a PostgreSQL admin statement similar to the one
used by `ensureDBOwnerhip`.

This method of setting up the user is also compatible with older
PostgreSQL versions that come with older NixOS `system.stateVersion`s.

[1] NixOS/nixpkgs#266270
999eagle added a commit to 999eagle/nixpkgs that referenced this pull request Dec 17, 2023
This makes sure we don't need any workarounds for running Invidious with a local
PostgreSQL database.
Changing the default user should be fine as the new init script for PostgreSQL automatically
creates the new user and changes the existing database's owner to the new user. The old user
will still linger and must be removed manually.
See also: NixOS#266270
zhaofengli added a commit to zhaofengli/attic that referenced this pull request Dec 18, 2023
Lainera pushed a commit to Lainera/nixpkgs that referenced this pull request Dec 20, 2023
This makes sure we don't need any workarounds for running Invidious with a local
PostgreSQL database.
Changing the default user should be fine as the new init script for PostgreSQL automatically
creates the new user and changes the existing database's owner to the new user. The old user
will still linger and must be removed manually.
See also: NixOS#266270
@icewind1991
Copy link
Contributor

#287788 extends ensureDBOwnership to allow it to cover some more of the cases that could previously be handled with ensurePermissions.

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Feb 12, 2024
...effectively what was planned already in NixOS#266270, but it was too late
because the branches were restricted and didn't allow any breaking
changes anymore.

It also suffers from the same issue that we already had when discussing
this the last time[1] when `ensureDBOwnership` was ultimately introduced
as band-aid fix: newly created users don't get CREATE permission on
the `public` schema anymore (since psql 15), even with `ALL PRIVILEGES`.

If one's use-case is more sophisticated than having a single owner, it's
questionable anyways if this module is the correct tool since
permissions aren't dropped on a change to this option or a removal which
is pretty surprising in the context of NixOS.

[1] NixOS#266270
uninsane added a commit to uninsane/nixpkgs that referenced this pull request Dec 22, 2024
fixes fallout from <NixOS#266270>.

a common idiom is to run the git server as user `git`, instead of
`gitea`, with configuration like this:

```nix
services.gitea.user = "git";
services.gitea.database.user = "git";
```

after NixOS#266270, this requires setting
`services.gitea.database.createDatabase = false` (as recommended by the
assertion). however, this causes a few other fields relevant to database
connection to no longer be set, and so that upgrade path would lead to a
failed connection:

```
gitea-pre-start: cmd/migrate.go:40:runMigrate() [F] Failed to initialize
    ORM engine: pq: password authentication failed for user "git"
```

instead, preserve the old connection settings (socket path) to make this
upgrade path work.
uninsane added a commit to uninsane/nixpkgs that referenced this pull request Dec 22, 2024
fixes fallout from <NixOS#266270>.

a common idiom is to run the git server as user `git`, instead of
`gitea`, with configuration like this:

```nix
services.gitea.user = "git";
services.gitea.database.user = "git";
```

after NixOS#266270, this requires setting
`services.gitea.database.createDatabase = false` (as recommended by the
assertion). however, the module then plumbs defaults which no longer
make sense into the gitea config causing a failed connection at runtime:

```
gitea-pre-start: cmd/migrate.go:40:runMigrate() [F] Failed to initialize
    ORM engine: pq: password authentication failed for user "git"
```

instead, don't default any of the connection settings when
`createDatabase == false`: error at eval time (instead of runtime) if
the user hasn't explicitly configured the remaining connection settings.
uninsane added a commit to uninsane/nixpkgs that referenced this pull request Jan 5, 2025
fixes fallout from <NixOS#266270>.

a common idiom is to run the git server as user `git`, instead of
`gitea`, with configuration like this:

```nix
services.gitea.user = "git";
services.gitea.database.user = "git";
```

after NixOS#266270, this requires setting
`services.gitea.database.createDatabase = false` (as recommended by the
assertion). however, the module then plumbs defaults which no longer
make sense into the gitea config causing a failed connection at runtime:

```
gitea-pre-start: cmd/migrate.go:40:runMigrate() [F] Failed to initialize
    ORM engine: pq: password authentication failed for user "git"
```

instead, don't default any of the connection settings when
`createDatabase == false`: error at eval time (instead of runtime) if
the user hasn't explicitly configured the remaining connection settings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: nixos Issues or PRs affecting NixOS modules, or package usability issues specific to NixOS 8.has: changelog 8.has: documentation This PR adds or changes documentation 8.has: module (update) This PR changes an existing module in `nixos/` 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin 10.rebuild-linux: 1-10
Projects
None yet
Development

Successfully merging this pull request may close these issues.

postgresql_15 requires granting permissions on schema public, ensureUsers insufficient