Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing selda #201

Closed
dhess opened this issue Nov 19, 2021 · 10 comments
Closed

Replacing selda #201

dhess opened this issue Nov 19, 2021 · 10 comments
Assignees
Labels
blocked/need-info ❌ Blocked, need more information database Database/persistence issue priority: high This issue has high priority

Comments

@dhess
Copy link
Member

dhess commented Nov 19, 2021

I'm having second thoughts about my choice of Selda. (This is squarely on me as I'm the one who wrote the database code in Primer):

  1. It hasn't been updated since March 2021
  2. It's not clear that it handles migrations very well, nor are there any particularly useful examples: How to handle migrations selda valderman/selda#171
  3. Judging from the GitHub tracker, it doesn't appear to get much use, nor is its maintainer particularly responsive to questions. For example, the question asked in the migration example issue cited above has gone unaddressed since 1 October. Here's another example of a question that has gone unanswered for over a year: fromSql for invalid SQL column value valderman/selda#155

If we're going to switch, and I am very much leaning in that direction, now's the time to do it. Database schema changes are imminent, and we don't need to worry about preserving compatibility with our existing DBs, so we could effectively just toss what we have now and rewrite the (currently) simple, small DB engine from scratch with a new library.

So what are the viable alternatives? I think we should be pragmatic and choose something that, ideally, has the following qualities, in rough order of importance:

  1. PostgreSQL support, including transactions and migrations. As far as I'm aware, PostgreSQL is the clear choice over MariaDB/MySQL for serious production use.
  2. It's actively maintained (as in, the maintainer fixes issues, keeps it working with recent GHCs, and responds to questions).
  3. It is used in production at at least one or two other Haskell shops.
  4. It is actively developed (as in, the maintainer is adding new features, improving performance, etc.).
  5. It's reported to have good performance, or at least, nobody .
  6. It has a healthy community, for some reasonable Haskell-level definition of "community."

(Note that 3. and 5. might be difficult to verify. Performance, especially, will not be something we can really measure until we have a realistic workload. The good news is that our asynchronous design shields us a bit from performance concerns, or at least makes us more burst-tolerant.)

I purposely didn't mention "nice Haskell-y API" in that list because, as I said earlier, I think we should choose pragmatism here over elegance. However, it would be nice to have. Perhaps we could use it as a tie-breaker.

Many (most?) Haskell SQL libraries use TH. I've previously expressed my preference to avoid TH where possible, but I think that it should be a low-priority selection criterion. I don't think the rest of the team share my concerns about TH, in any case.

I am no expert in this area, and there are probably at least a few Haskell SQL bindings I've not heard of, but these seem to me to be the leading contenders:

Opaleye

In the past, I've tried Opaleye and found it pretty difficult to use, and this was the main reason I reached for Selda when I wrote our initial DB engine. However, I last tried Opaleye quite a few years ago, so maybe things have changed. I have seen the maintainer responding to questions of ergonomics in various Haskell sub-Reddit threads over the years, though I can't cite any specific examples offhand. In any case, the author does seem quite keen to make users of the package happy.

Speaking of the Haskell sub-Reddit, Opaleye is often cited there by different people as the best-in-class SQL library for Haskell. It is actively maintained, the issue tracker is pretty lively, the maintainer appears to be responsive, it is claimed to have commercial support, and, unlike any other GitHub project I've encountered, it even has a documented handoff plan in case the maintainer goes missing: https://github.com/tomjaguarpaw/haskell-opaleye#backup-maintainers

Opaleye definitely has a high-level API and is not just a wrapper around a low-level connector, although this API was my source of frustration with it previously. (I believe, at the time, that you had to write quite a lot of boilerplate and similar-but-not-quite-the-same types for your SQL and Haskell data representations.)

One drawback to switching to Opaleye is the lack of support for SQLite: tomjaguarpaw/haskell-opaleye#528. This would probably mean we'd need to drop SQLite ourselves, as I don't want to maintain 2 engines with different APIs. This, in turn, would have the following impacts on the project:

  1. We might need to spin up a local PostgreSQL instance for testing on our local machines.
  2. It raises the bar of difficulty/hassle for contributors who want to hack on Primer, or at least the server-y aspects of it.

Re: point 1, I've been thinking about this a lot recently, anyway, because I think we eventually need to be testing, even "locally," in an environment that's as close to production as is practical. So I think we can address this point reasonably well, and in fact will probably need to do so regardless of whether we keep SQLite support.

Re: point 2, this is unfortunate, because it's likely that whatever system we put together for our own use to solve point 1 will not be available to contributors, or will at least be impractical for them to implement on their own. One solution to this problem might be to encourage early contributors who want to help to write a SQLite adapter for primer-db, or whatever name we end up choosing for the post-Selda Primer database package.

Hasql

Hasql is more of an ecosystem than a package. The base package is pretty close to the PostgreSQL metal, but there are many other packages that build upon it that provide higher-level APIs in what appears to be a kind of pick 'n mix approach to writing a database connector.

I have no experience using Hasql directly, but I've heard generally good things about it, and it seems to be well-regarded. It's also supposedly pretty fast, for a Haskell database driver, anyway. The author is highly regarded in the Haskell community. I believe we use some of his packages already.

In an ideal world, Hasql's flexible approach might be the most appealing of all our options, but I do worry that it'll take us too much time to figure out how to glue the pieces together into the right abstraction for our needs. As we're not database experts, we'd probably be better off with a more opinionated library like Opaleye, where the authors have made these choices for us based on their expertise. On the other hand, if we started off conservatively and tried to stay relatively low-level and close to native SQL, it would probably be easier to switch from Hasql to something higher level once we're in a better position to measure performance, we understand how to write efficient queries, etc.

Hasql appears to be reasonably active, though not as much as Opaleye. It's not clear whether it has any commercial users, at least not directly. (Hasql is a dependency of Rel8; see below.)

Hasql does not support SQLite: nikita-volkov/hasql#29

Persistent & Esqueletto

These are typically used together, I believe. I get the impression they're used pretty widely, perhaps more than any other Haskell SQL solution. They support both PostgreSQL and SQLite, which is a plus. They both appear to be actively developed, especially Persistent. They're maintained by FP Complete, so presumably are used in their own offerings and production-proven.

It's said that Esqueletto is pretty close to "SQL in Haskell" and that it's relatively easy to translate back-and-forth. That sounds like it could be helpful if we want to understand exactly what a query is doing and how to make it efficient.

On the other hand, apparently it's possible to write queries in it that crash at runtime, or at least, it used to be: https://www.reddit.com/r/haskell/comments/8qxvir/a_comparison_among_various_database_edsls_selda/e0o2wad/. And we don't currently have the expertise to make SQL go fast, anyway, so it might make more sense for us to choose something where the authors have tried to optimize common use cases for us by presenting us with higher-level abstractions.

I don't know much about it other than that, partly because I've previously been a bit spooked by the complaints and have never given it a try.

Beam

I don't know much about it, but have previously read mostly positive things about it. It supports both PostgreSQL and SQLite, seems pretty active, and the maintainer responds to most GitHub issues.

There do appear to be a number of fairly long-standing bugs with no apparent progress, however:

(This is a bit unfair, as I haven't done a similar analysis on the other projects I've mentioned.)

It's not clear who is using Beam, nor whether it's being used in production.

Others

I don't really know anything about these options, but I'll try to state the obvious about each, at least.

Note that I have not considered here any very low-level libraries that simply wrap a C library in Haskell. I don't want to use something like that as we don't have the skillset to pull it off, IMO, and need something more opinionated.

Squeal

  • Seems very active (1650 commits).
  • By the maintainer's own admission, has a learning curve. (To be fair, I think the same could be said of at least Opaleye, as well.)
  • Supports SQLite.
  • Unknown whether it's being used in production.

Rel8

  • Love child of Hasql and Opaleye
  • Therefore, only PostgreSQL support — no SQLite.
  • Possibly being used by CircuitHub? (edit: yes, it is.)
  • Actively maintained, though only had its first release in June of this year, so obviously still early days.

Here is a rather gushing thread about Rel8 on the Haskell sub-Reddit, though this is only in response to the ZuriHac 2021 presentation, not a reaction from users of the package: https://www.reddit.com/r/haskell/comments/o3cep5/video_ollie_charles_rel8_a_new_database_access/

Ollie Charles, the maintainer, says that Rel8 and PostgreSQL generate faster queries than CircuitHub can write by hand, which is encouraging.

References

@dhess dhess added blocked/need-info ❌ Blocked, need more information database Database/persistence issue primer-selda priority: high This issue has high priority labels Nov 19, 2021
@dhess dhess changed the title Concerns about selda Replacing selda Nov 19, 2021
@dhess
Copy link
Member Author

dhess commented Nov 20, 2021

Having written all that, at the moment I'm leaning towards Opaleye, which seems to tick most of the boxes except for SQLite support. Rel8 and Hasql are intriguing, but I'm concerned about the newness of the former and, ironically, the flexibility of the latter.

(Upon further reading, it sounds like maybe Rel8 adds the "opinionated" to the flexibility and performance of Hasql: circuithub/rel8#82)

@dhess dhess self-assigned this Jan 10, 2022
@dhess
Copy link
Member Author

dhess commented Jan 12, 2022

I'm finally getting around to this now.

Upon review, Rel8 and Opaleye seem like the obvious choices to replace Selda. Neither supports SQLite, which means that, for testing, even locally, we'll always need access to a PostgreSQL server.

Given the changes I am planning to make to how we deploy & test Primer (a containerized solution, probably running on Kubernetes, or possibly Fly.io if that proves too complicated), I don't think the lack of SQLite support will be an issue for us, because it isn't really suited for a container environment, anyway. But before I proceed, I want to get some input from @brprice and @georgefst on how put out you would be if we dropped SQLite support.

(I am currently imagining that our local testing setup would require a Docker instance running on our local machines, where the local PostgreSQL and Primer containers would run.)

@brprice
Copy link
Contributor

brprice commented Jan 13, 2022

I would be ok with this. It would be nice to not require spinning up a Postgresql server for "small" tests. I.e. it would be nice to not take the performance hit of containers just to run a cabal test since (almost?) all of those tests are short-lived and don't care about the DB. I wonder if our NullDB backend is adequate for this?

@dhess
Copy link
Member Author

dhess commented Jan 13, 2022

Agreed, and if our in-memory DB isn't sufficient already for this use case (I can't imagine why it wouldn't be), we can make sure that it is.

@dhess
Copy link
Member Author

dhess commented Jan 25, 2022

Between our null DB implementation and #230, I’m no longer concerned about losing SQLite support in the backend, so I believe this removes any obstacles to switching to Opaleye or Rel8.

@dhess
Copy link
Member Author

dhess commented Jan 25, 2022

Ironically, I discovered this evening while doing a bit more research that neither Opaleye nor Rel8 supports database migrations, which was one of the reasons my eye starting wandering from Selda. Oh well — I am still confident that either of them will be a better choice in the long run than Selda.

On the subject of migrations, Dennis Gosnell created this interesting Haskell DB comparison and addressed the migration issue by pointing towards a few services/utilities that help with this process:

https://gist.github.com/cdepillabout/a570d673b2341f109d30a91d243125dd

None of them are Haskell-specific, but perhaps we'd be better off using a well-tested migration service than trying to roll our own in Haskell code, anyway.

Sqitch looks particularly interesting as it makes it possible to sync code changes in Git with migrations:

https://github.com/sqitchers/sqitch

The author is also heavily involved in PostgreSQL extension development, so we'd probably be in good hands.

@dhess
Copy link
Member Author

dhess commented Jan 31, 2022

There is a Haskell PostgreSQL migration library for Hasql, which Rel8 uses under the hood. It doesn't appear to be maintained, however: https://github.com/tvh/hasql-migration

@dhess
Copy link
Member Author

dhess commented Feb 2, 2022

Will be closed by #240 Actually, no, I think I'll do these PRs one at a time, incrementally, and close this issue once primer-service has completely switched over.

@dhess
Copy link
Member Author

dhess commented Feb 6, 2022

Closed by #240 and #241.

@dhess dhess closed this as completed Feb 6, 2022
@dhess
Copy link
Member Author

dhess commented Jun 7, 2022

Given the changes I am planning to make to how we deploy & test Primer (a containerized solution, probably running on Kubernetes, or possibly Fly.io if that proves too complicated), I don't think the lack of SQLite support will be an issue for us, because it isn't really suited for a container environment, anyway.

Some recent events have changed my mind on this. See #499. Also, I said in the OP of this issue that I didn't want to maintain 2 different database backends, but the potential benefits of deploying containers with local (replicated) SQLite databases may be sufficient to change my mind on this.

Anyway, I'll continue this discussion in #499, but I wanted to make a note of this here for posterity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked/need-info ❌ Blocked, need more information database Database/persistence issue priority: high This issue has high priority
Projects
None yet
Development

No branches or pull requests

2 participants