Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: About image building process #159

Open
marcbria opened this issue Mar 16, 2022 · 22 comments
Open

RFC: About image building process #159

marcbria opened this issue Mar 16, 2022 · 22 comments
Labels
community Community requirement or interaction

Comments

@marcbria
Copy link
Collaborator

TL;DR: We need faster image building for images if we like to use them in testing process.
We need unique names and frozen/immutable content for images.
Post opens the discussion to the community to decide how we are going to build&name images in future.


The issue #155 is related with a discussion we had with Andrew and @asmecher so long ago about images "unicity" and image naming/tagging.

Sorry in advance for the extension, but it's an important subject that we need to talk in deep.

We all agree that, if you run a 2.4.1-3 image today, you need to know for sure this image is exactly the same you tested 5 years ago... but now we can't guarantee this because we are rebuilding our images via compose, so they are changing on every new building (in other words, dockerhub images created years ago, aren't the same we got in gitLab created this month).

Just in case somebody, start blaming, I need to say that this project was in the "exploring" phase and tagged as BETA... but if we like to help improve PKP testing process, we need to move this it to a more solid and reliable level.

And YES... we are refactoring all this to improve the images (multi-stage, logs in stdout, better autobuild...) but it's time to decide how to properly tag the images and how to keep them frozen.

How?

Yesterday Andrew proposed to use the official tarballs to build images based on the released packages.
Even those distros sometimes could change, it will be much more solid than the compose-building method.

Another important positive effect of this tarball-based-image-building will be much faster process (5 minutes vs 30 seconds) and this will help a lot to use those images in final product testing or to plugin developers.

The negative side? We will only be able to generate images of packaged versions, so those images won't be useful testing the pkp core on every dev commits.

The buildScript-based-image-building has other positive effects vs images created from tarball like:
a) if gitLab/Hub build images for us, we won't need to manually package it anymore (less effort, less potential human errors)
b) it will let us release the same version dev team used during the dev testing (more tested).

Yes, 5 minutes vs 30 seconds is too much, but we still have some options to speed up the buildScript process, like improve the dockerimage or play with pipeline caches, etc.
Anyway, compose won't never be as fast as downloading the tarball release package.

About the "frozen images" issue, we also have some solutions to take in consideration (most of them complementary):
a) Add a postfix to each image build to be sure you got the same version (gitLab is doing this in our behalf).
b) Guarantee compose and dockerfile refers to the exact version of each piece of soft.
c) Avoid recreating old images.

So take this post as a kind of RFC where you can comment or propose new solutions.

Take care,
m.

@jonasraoni
Copy link

This is the context, right?

I'm going to use a new image provider, and I want the images to be exactly the same as the ones I built for Docker Hub some years ago.

If yes, given that dependencies might be removed, bad maintained (new code gets pushed using the same version), etc, I agree that copying the released package from somewhere safe, with everything already baked-in, sounds better.

Or are we rebuilding and overwriting stable tags (e.g. ojs-3_3_0-10) for some reason?!

@mgax
Copy link

mgax commented Mar 16, 2022

Here's my two cents; I don't know much about how images are used in testing, but I'm running OJS in production from Docker images – #158 (comment).

It makes sense to have a fast build process. Installing from release tarballs is reasonable, that's the official version after all. And GitHub will generate a tarball on the fly from any branch in the repo, e.g. https://github.com/pkp/ojs/archive/refs/heads/main.tar.gz for the main branch, so maybe that's useful for dev versions.

Are you building images automatically using docker-compose? I don't think that makes sense. From what I've seen, projects that build multiple image flavours have some tooling that generates Dockerfiles for each flavour/version and runs docker build on each of them.

Or are we rebuilding and overwriting stable tags (e.g. ojs-3_3_0-10) for some reason?!

Sometimes you want to rebuild a stable image, to fix a security issue in the base image. It's a complex topic, but here's how Bitnami handles it: https://docs.bitnami.com/tutorials/understand-rolling-tags-containers/#rolling-tags.

@asmecher
Copy link
Member

asmecher commented Mar 16, 2022

GitHub will generate a tarball on the fly from any branch in the repo, e.g. https://github.com/pkp/ojs/archive/refs/heads/main.tar.gz for the main branch, so maybe that's useful for dev versions.

We can't use that for dev versions as we make heavy use of submodules, unfortunately. Our distribution tarballs also include all dependencies (npm and composer) and the results of build steps (e.g. compiling vue.js), which don't come with direct downloads from github.

Hoping to review points of consensus rather than reopening an old, bigger conversation, we need two kinds of Docker images, I think:

  • Based on specific releases (from git tags or tarballs -- consensus seems to be that tarballs are better to work with)
  • Based on branches (main for dev/testing purposes; e.g. stable-3_3_0 for "latest stable"); these would need to come from via git

If we pick tarballs over git tags, then unfortunately it'll mean coding two different mechanisms for getting an image built -- but that may turn out to be the best way.

@jonasraoni
Copy link

jonasraoni commented Mar 16, 2022

I agree with tags above.
main = latest build
stable-3_3_0 = latest stable
stable-3_3_0-10 = based on static content [ed by @asmecher: this probably should be the name of the tag, e.g. 3_3_0-10]

@marcbria
Copy link
Collaborator Author

Ok... my original post was too wide, so different subjects emerged here. Thanks Alec to keep the focus.

Yes, the main subject is the building process, and I missed saying that I also think that we need a mixed approach for old images (based on official tarballs) and dev ones (based on buildScript or see next point...).

@asmecher ~ If we pick tarballs over git tags, then unfortunately it'll mean coding two different mechanisms for getting an image built -- but that may turn out to be the best way.

Modify the script that creates all dockerfiles it is not a big deal... but we have two options here:

  1. Remake the script to be smarter: use tarballs for old images, except for "latest" (main) and "latest-stable" (stable-3_3_0) that will be compose-generated...
  2. Create two independent scripts: Fist to create production images from tarballs and the second for developers (based on compose).

I suggest going with 2 scripts... although once we got a more stabilized Dockerfile, I would encourage to add a Dockerfile in each OxS "main" repo root. As far as I can see in the tested CI/CD tools, (to build the latest/last-stable, to facilitate the CI/CD and the auto packaging) we will find this is the easiest way.

BTW, to ensure the tarballs are always reachable, safe and immutable, probably make sense to keep them somewhere apart from the pkp website? (ie: gitLab or gitHub under "releases"?)
It will also help to keep a backup of all the packages released and attach unique ids to each tarball.

To close, let me clarify this post is only about "rethink how we manage image builds".

Please open a new issue if you like to talk about "collateral" subjects such as:

  • About CI/CD services (aka. gitLab CI/CD vs gitHub Actions).
  • What to expect from our official docker images (production vs dev images? underlaying LAMP stack? versions to be updated?)
  • Upgrade official docker images (dealing with plugins?)

@diegoabadan
Copy link

I believe that for the main uses, the ideal would be ready-to-use images. I thought of something like:

  1. Provide ready-to-use images - eliminating the need for the user to build them

Example: latest stable of each supported version (3.2.1-4, 3.3.0-10...)

It could be nice to have a night build or, preferably, every commit where it passed the tests in

  • supported branches (like stable-3_3_0)
  • Main branch (developer version)
  1. Keep the build not so fast for those who need something different (old version tag, get a recent commit of a certain branch where there is no image available, build from another git repository...)

So we have quick ready-to-use images for the main scenarios we map and a dynamic build for exceptional uses.

The build time for exceptional cases would probably be close to what developers find today without using docker, right?

@marcbria
Copy link
Collaborator Author

Gracias Diego. You know that feeling of "not finding the word"?
Your summary is just what I had in mind but couldn't express as clearly and simply. Thank you.

The proposal is clear and feasible but, there is a point that stills annoys me, that is "living images security updates".

And this is not a problem with OJS itself... it happens because our images include a LAMP (hopefully some day with different flavors as alpine/debian or mysql/postgres or php7/php8...), because we like to offer images for dev and production (both as close as possible to avoid "it worked in my machine" errors) and means we need to periodically upgrade the underlying LAMP stack (at least) in the production images.

And I'm not worried about ancient images (2.x) with discontinued tools (php 5.x) that we can freeze (as far as we are only offering them for testing or upgrade)... my concern is with "active" images that need to be upgraded every time alpine, apache, mariadb or php report a security issue.

So, looks like here we will need to offer two kinds "ready-to-use" images: one for discontinued products (frozen) and other for active versions (with a changing LAMP)...

So first question here is "should we ensure the security for all the active version images"?
My answer is "no"... just keep an eye over the "production images" that will be limited to stable, LTS and last release.

And second question is related with naming conventions for each new build (sorry to revisit this, but we need a decision), that is just what was trying to avoid (because simpler names means simpler model).

Sooo long ago, Alec suggested attaching a postfix with the commit to the end of the name (now it sounds better than before).
gitLab tools add a sequential name on every build (I like best because it's easier to read.

Any thoughts on this?

Finally, we want to take advantage of refactoring to move to multi-stage.

Splitting the Docker file into parts will allow us to better separate the infrastructure part from the OJS build (and that will make maintenance easier) but the final image is going to be a whole and will still need a different name with each change.

The compile time for exceptional cases would probably be close to what developers encounter today without using docker, right?

With docker it will probably be faster, due to caching... and brainless (letting the devs focus on code instead of infrastructure). :-)

@diegoabadan
Copy link

Nice, Marc. :)

I have ideas for these questions:

The proposal is clear and feasible but, there is a point that stills annoys me, that is "living images security updates".
(...)
So first question here is "should we ensure the security for all the active version images"?

As for the LAMP* update, I suggest we leave it with the Linux distribution. We generate an image with the updates available at that time and let the system update itself periodically regarding the security of everything that is not the PKP product.

So we can just focus on OJS/OPS/OMP updates or major updates (new Alpine version, a Debian version, etc)

*LAMP or similar stack.

And second question is related with naming conventions for each new build (sorry to revisit this, but we need a decision), that is just what was trying to avoid (because simpler names means simpler model).

Who knows something like:

  • Versions from a tag use the same pattern as the official PKP package (like ojs-3.3.0-10)
  • Versions from a branch: use the branch name as a suffix plus a sequential number or ID of the last commit. (like stable-3_3_0-2, stable-3_3_0-b2* or stable-3_3_0-5c2082f**).

*Added a "b" prefix to differentiate from official PKP versions.
** I think it's good for developers, but it might not be as intuitive for sysadmins.

Finally, we want to take advantage of refactoring to move to multi-stage.

Perfect. We can help with that.

@diegoabadan
Copy link

I talked to @pablovp86 about this name postfix pattern:

stable-3_3_0-5c2082f
It could be reserved for docker images created from a commit, by developers.

@marcbria
Copy link
Collaborator Author

As for the LAMP* update, I suggest we leave it with the Linux distribution. We generate an image with the updates available at that time and let the system update itself periodically regarding the security of everything that is not the PKP product.

Sorry, I thought it was implicit in the proposal I made. Thanks again to make it clear.
I think a multi-stage approach will be really helpful here.

If you like to add your "two-cents" one question is still missing ;-)

So first question here is "should we ensure the security for all the active version images"?

As said, doing this for all active versions will be crazy (too much work and mostly useless).
I think a good middle ground would be keep an eye on potential "production images"... and it means stables, LTS and last release (less than 10 images).

@diegoabadan
Copy link

Ideally, yes for all supported versions.

On the other hand, because of the work that can involve keeping so many versions working well (e.g. supported version of PHP that may depend on old distribution), I would prefer to start with:

  • Development version (Main, future 3.4.x)
  • Current version (today 3.3.0.x)
  • Current LTS version (for now, it's the same as above)

Where a docker image would automatically update when:

  • created a tag in the branch, following the same model used to create official packages (for stable versions, like ojs-3.3.0-10).
  • every commit in the Main branch, as long as it passes the automated tests (development version)

Then we could expand the scope according to the demand and work capacity of those involved.

For example, I would be happy to have, for stable branches, update the images for each additional commit that passes the tests. It would be something like ojs-3.3.0-10u3 ojs-3.3.0-11-beta3). Here we have a version with additional fixes planned for ojs-3.3.0-11, but not yet officially released.

Usually the penultimate stable version is also heavily used, so it could be something to think about.

Images from previous versions would not be kept up to date, and interested parties could build them as they wish (from a branch or tag).

@diegoabadan
Copy link

About this point:

We need faster image building for images if we like to use them in testing process.

Let's propose a "docker way" approach:

  • Use docker images for everything possible, one for PHP, another for Node, another for composer...
  • Multistage, generate at the end a light image for the application execution.

Initially the focus will be an image to test a PKP application from a local code (example, obtained from git). I believe that it will be easy to adapt it for different purposes. Do you agree?

We will try to bring you a quick result (a proof of concept) in the next few days.

@marcbria marcbria added the community Community requirement or interaction label Apr 10, 2022
@jonasraoni
Copy link

In general a functional and simple to understand image, with a neat installation, is already a great deal for those who want to extend or just copy parts of it for their own needs :)

Given that it's tough to please everyone (development/production/testing environment, nginx/apache, ...), I agree with the statement above, about focusing on a specific type of image for now.

@marcbria
Copy link
Collaborator Author

With @jonasraoni we have been talking about using debian or alpine, and it's an interesting dilemma: A "functional and simple to understand" image points to debian, while "secure and tiny" points to alpine.

We can say "ok, one image for development and other for production" but I feel in my bones this is a bad idea, because we want to test the image from the earliest moment. If there are no differences between prod and dev (ie: same base that we can extend with dev tools) we will be detecting errors that may otherwise go unnoticed.

In other words, if dev team feels it's better working over debian, I would scarify the benefits of alpine if it means we all can use the same image.

@jonasraoni
Copy link

I've seen people complaining about issues with Alpine/Docker (then a heavier OS would be a safer choice for the long run), but if Alpine has been working fine, inclusive with the extra tools that a journal might need (e.g. pdftotext), then why should it be discarded?

About adding extra features/image types, you're the best person to choose a good balance between features and maintenance for this repository :)

@marcbria
Copy link
Collaborator Author

marcbria commented Apr 27, 2022

If I recall well, I only got a real issue with alpine from Andrew (from PennState University) but I think he finally managed to make it work.

All the other issues are more related with changes in library names and alpine package manager syntax (alpine call it apk instead of apt, but is more or less the same).

Let's see what have Andrew to say about this.

@marcbria
Copy link
Collaborator Author

BTW, "pdftotext" is included in "poppler-utils":
https://pkgs.alpinelinux.org/contents?branch=edge&name=poppler-utils&arch=x86_64

@AndrewGearhart
Copy link

AndrewGearhart commented Apr 27, 2022

Alpine and Version Pinning Complications

The complication with Alpine that we've had has been that they keep a very tight ship with respect to which packages are available for which versioned releases of Alpine. Unfortunately, this can result in builds that break unexpectedly. This week, for example, we had Alpine version 3.15 shipping a container with php that was pinned to php7.4.28. php7.4.29 has been released. Since Alpine integrated php7.4.29 into 3.15, they have dropped php7.4.28 from their package manager. As a result, the image fails to build as it attempts to integrate php7.4.28 when only php7.4.29 is available. The errors related to this are ... not exactly straight forward... leading to quite a bit of head scratching for some of the developers that hadn't experienced the error previously. This is not ideal in terms of how we want our team to be upgrading packages: "oh no! our build stopped working!? What happened? Oh... php updated a minor version" ... but at the same time, we don't want to be releasing a cached version of php because we generically installed php 7 or that we didn't know that php updated and then we find a problem in php7.4.62 that didn't exist in php7.4.61 (hypothetically). Version pinning is good, but it would be nice if the package managers had a bit more leeway. I believe other distributions do... but I might be wrong about that. It may be that in our many other projects, we haven't version pinned quite as aggressively as we did in these projects that were using Alpine.

So, my comment about Alpine to @marcbria had simply been that we should consider other alternatives.

Alpine is very frequently chosen as a result of it's insanely small initial size. However, that initial benefit is often blown away when builds have to add 500-750mb of packages to handle things such as git pulls or vim... or when something like Python has to be worked with and many of the tool's package management tools can't utilize common build/make/compilation tools. Most distributions now offer a "slim" version for use in Docker containers that offer the minimal package set that can then be build upon to ensure that you only are shipping what you need to run.

In terms of security, I would not consider Alpine as "more" or "less" secure than any other distribution. While it limits the attack surface by slimming down the number of base packages installed, it's inherent method of doing so (with busybox) makes it difficult for many other container security tools (snyk, trivvy, etc) to determine what is actually running in the container... and what sort of CVEs might apply. Transparency for another security tool is far more important to our group than saving some bytes in the final image. If we were doubling/tripling the image size by going with a debian or whatever... I could completely understand the bend toward a less transparent Alpine. However, I don't think that we would see that sort of discrepancy. The only way to know for sure is for someone to try it... an adventure that I have yet to have the time to engage in... ;-)

@marcbria
Copy link
Collaborator Author

marcbria commented Apr 28, 2022

Thank you Andrew for your feedback, accurate and precise as always.

Going back to the beginning of the conversation I think we are all clear that we have to maintain images for two different profiles ("production" and "development") so I would be tempted to offer images with different distros (ie: debian for DEVs and alpine for PRODs)... but I think it would be a mistake. Let me explain it...

One of the great advantages of docker is that testing can start in the development phase... and if we do it on different bases, we wouldn't be testing the same thing. Someone might argue that the DEV and PROD image won't be exactly the same (DEV will include a lot more tools/libraries) but same-distro images will always be closer.

Also, I hadn't taken into account that the use of alpine can complicate the work of snyk, trivvy, etc and those tools are becoming more and more important, so (unless someone argues otherwise) you have convinced me that we should use the same base for DEVs and PRODs.

To be clear here: let's forget about apline and focus on debian.

@diegoabadan
Copy link

Considering that using Alpine can be considered an image size optimization, I like the idea of going Debian now, both for development and production.

@marcbria
Copy link
Collaborator Author

marcbria commented May 6, 2022

NOTICE
We started this thread on Mar 16th, so (if nobody is against it) May 16th sounds like a nice date to close it and start cooking...

Back to the topic... about the images:
@jonasraoni said a few days ago that "Given that it's tough to please everyone".
@AndrewGearhart pointed today in slack that "we need to map out who our users are for the docker images".

It's kind of implicit from the comments we made BUT (as in the zen of python) "explicit is better than implicit", so let's try to map the different users (that are scenarios at the same time):

DEVELOPERS:

  • OJS core developers: Mainly from PKP team, they need the "main" branch version, but also the versions with active support.
    • Quote: "I need the cutting-edge version of the code with some example dataset and all of the levers in place"
  • OJS plugin developers: Mainly from outside PKP team, they will usually develop against the "last release", but also will need "main" and "active" versions.
    • Quote: "Give me a stable OJS that just works and that I don't have to worry about."

Both will develop locally (probably with Docker Desktop, docker-compose or similar tooling). Would need images with extra tooling (git, npm, xdebug, cypress...)

ADMINISTRATORS:

  • Admins of self-hosted servers: They manage manually a few servers, with docker and docker-compose (all behind a reverse proxy). The availability of resources is limited. They will upgrade from LTS to next LTS.
    • Quote: "I need it KISS, rock solid and secure".
  • Admins of cloud infrastructure (ie: K8s servers): They manage a complete infrastructure of K8s (or something similar and usually in the cloud) with dozens of tools to cover the entire software lifecycle. The availability of resources is enormous. They will prefer the latest stable code.
    • Quote: "We need scripts that allow us to automate all processes".

Both will need stable/optimized/secure PROD images with all the libraries required to run OJS and it's modules.
For security reasons related with the underlying stack (LAMP), they require daily updated PROD images with late security patches.

Do you visualise the 4 actors (and scenarios) in a similar way? Comments? Nuances?

@marcbria
Copy link
Collaborator Author

marcbria commented Feb 1, 2023

Hi all,

Sorry for the long silence. It's not easy to keep up the pace we like working in our spare time. :-/

With Andrew we have written a first draft summarizing all the feedback we got from this post.
This draft was after pre-revise with Diego, Mathias and Alec and we think the document is mature enough to open it to the community, so any feedback you have is welcome.

https://docs.google.com/document/d/1hl3c6PYQgOZWWtwHk2siBTUj3WC6fzrv9hCp7F1jDGQ/edit?usp=sharing

Thank you for your support,!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community requirement or interaction
Projects
None yet
Development

No branches or pull requests

6 participants