-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrity field for downloaded repositories #198
Comments
I'm sorry for being this obtuse, but after a night's sleep:
By this reasoning, the whitelist is the only thing that is useful. |
Here's a scenario I have in mind: someone hacks a GitHub account and changes the content of a repository used in a crate. If we don't have a hash of the content, no one will be able to detect that the sources have been tampered with. In this case the whitelist is not enough. Another scenario: I write a crate and host it in a git repo on my private server with my domain name. The hash is here to guaranty that the sources that have been checked out are really the ones specified by the author of the crate. Which makes me also think that it should be forbidden to change the hash of a version already published in a crate. |
Thanks for these examples. They help me focus the argument.
It's my understanding that the commit hash we use as origin is enough to detect this during cloning.
Same as previous one; you cannot change/serve sources that do not match the commit hash, and the commit hash is coming from our trusted index.
Which is what the origin commit hash is doing (and that's why we added it for source archives that lack it). Unless I have some fundamental misunderstanding on how git works, I don't see these as factible attacks. Now, my only doubt is whether we need to perform an additional At present we are only using commit hashes in origins (for this very concern); if we allowed branches or tags that would be another matter.
Yes, that seems advisable. |
Just to be clear, we are talking about the origin field that is provided in the manifest:
Ok, I didn't know that branches and tags were forbidden. It would be convenient to have them, but if this solves the hash problem then it is a good deal. Then I would say that we have to check if git's sha1 can be trusted, I remember reading about forged sha1 collisions. @senier do you mind giving us your opinion on the matter? |
Yes, that's the field. While we await senier's input, I read a bit about git and the sha1 attack and the general consensus seemed to be that git is not readily vulnerable due to some extra fields in its internal structures. They're working on superseeding sha1 too (not sure on the state of this). |
Since git 2.13.0 also a hardened version of SHA-1 is in use. So in theory, referencing a full SHA-1 commit ID should be safe. You never know which future attacks will be found that cannot be mitigated, though. There are also those statements that the additional metadata in git prevent an attack, but honestly, I'm skeptical and wouldn't base my security decisions on that. Contrary, the initial |
My concerns (probably minor, but I wouldn't want to spend time on a non-solution):
|
I was suspecting something like your first point and I see the issues you bring up in the other points. Thinking about it, a git-specific solution sounds problematic. Either, we restrict upstream repos to few, blessed VCS' or we have the same security discussion for every new VCS. Some VCS' may just be unsuitable. I'd try to avoid those issues. How about a completely format agnostic content hashing? The shell pseudo-code would look like that: $ your_favorite_vcs pull http://some.where/source/reference/ output_dir
$ cd output_dir
$ tar -cp . | sha512sum The only system I've a closer insight to, which does this kind of content security, is the Genode OS Framework. They don't use tar and sha2, but its similar: $(_DST_HASH_FILE): $(HASH_INPUT) $(MAKEFILE_LIST)
@$(MSG_GENERATE)$(notdir $@)
$(VERBOSE)cat $(HASH_INPUT) | sha1sum | sed "s/ .*//" > $@ The benefit of this scheme is that it's transport/VCS agnostic and works with anything that gets the source onto your disk. Also, we separate fetching and validating the source and have the security discussion only once. Caveat: I have no idea how well the |
Is it possible to configure git locally or temporarily to disable the For the |
It should be possible to override it using |
The directory hash function will be usable for all VCSs, it's just that git has a special case during checkout because of this |
Yes, I agree that's simple enough to go that way. I'm exclusively thinking of the end-of-line issue. I actually don't know what |
I like the idea of calculating the hash in an Ada function. Regarding the CR/LF issue: How about converting every occurrence of CR/LF to LF for the purpose of hashing (inside this Ada function)? This would avoid special treatment/configuration of the various VCS'. |
We feared that this might open the door to some attack... if this is safe it would be certainly straightforward. For reference, I found this about svn/hg: http://svnbook.red-bean.com/en/1.7/svn.advanced.props.file-portability.html In short, SVN has something similar to git. The Mercurial website is down for me at the moment. |
We discussed this once more and I now feel uneasy about translating CR/LF to LF. While we see no specific attack, giving an attacker the capability to insert a CRs arbitrarily before LFs sounds exactly like a thing that's going to blow up in our face later. The unproblematic, but annoying solution is to provide two hashes, one for systems with CR/LF and one for systems with LF. You could either check both or select the one that is appropriate for the current platform. Of cause, the tools then should enable the maintainer to create the hash for the other platform easily (e.g. I never use Windows but my Ada software very likely also works there). This unfortunately brings as back to the point where we exactly need to know how to configure the VCS. @mosteo You've looked into other package managers very closely. I wonder, how do other systems that run on Windows and UNIX-like systems cope with that? Are there even any that are cross-plattform and care about integrity? cargo? |
I still think that disabling the |
From what I've seen, both opam and cargo rely on downloading archives instead of checking out commits (but note that some sites like github provide a way of downloading an archive for a particular commit -- this is used extensively in opam). opam allows supplying several hashes, instead of only one as we have now. Still, it's common to see only md5 (in old packages, I guess). Cargo uses a single sha256 from what I've been able to find. Here's a survey of other PMs trust features (I wasn't aware until today, very interesting): package-community/discussions#5 This one I guess is a must read too: rust-lang/crates.io#75 Which lead me to: https://theupdateframework.github.io/ As a sort summary from my cursory read of these issues, it seems the only entirely safe way is to go with GPG signatures, and not many packagers are doing it already... and sanely managing that is what the above project addresses. I still have to read through the website in detail, though. |
I just re-read the Will there be any issue on Windows with UNIX-style line endings? |
There might be if one opens source files. Depending on the editor you might see no line breaks but a single huge line. If we go this route, I'd propose to check out once for hash computation, reset, and check out normally; time should be negligible compared to download time. I'll experiment with this. |
And additionally I just saw that |
Very interesting links, especially the first one. After skimming over it, it seems like everything is leading to the Update Framework. Coincidentally, I met someone at a conference last Sunday who ported a complex system (Genode) to TUF. I could get in touch with him if we have any practical questions. Regarding signatures versus central trusted index: IIRC we discussed that earlier and deliberately left out signatures for simplicity reasons. Eventually, a TUF-like approach where developers can opt-in to provide a signature would be nice. While some people don't bother or want to avoid the hassle, we would certainly sign our own software. |
Maybe this is a dumb question: Is ALIRE meant to be a tool I use during development (then, the above would be an issue) or do you consider pulling via git just a step in the software distribution process (and no human touch the code thereafter). If the latter is true, wouldn't the only issue be that GNAT, gprbuild etc. can cope with the wrong line endings on Windows? |
Unfortunately this approach is susceptible to a TOCTOU attack. If an attacker can change the source location between hashing and the real checkout, we could check the correct version but then use a modified one. We must hash a downloaded copy that is under our control. I'm unsure whether whether the checkout/reset/checkout procedure is safe in that respect. It depends on whether we check out from a trusted local copy. This makes me think of a modification that should be safe: $ git clone --bare https://remote.url/repo local_repo
$ check_hash local_repo/
$ git checkout local_repo local_workdir The |
Actually if we disable |
I don't think it's the case; with
So you mean to hash the git internal structures instead of the actual source files? I wonder if compression/optimizations that git does on blocks can affect that (I will experiment). We could even reuse the bare repository (you can simply do a checkout in it after hashing and we'd be done).
So it seems, and it seems a hard wheel to reinvent. I still have to read through that, but IIUC with TUF we would ensure decentralized index integrity, which in turn guarantees that hashes for the local check are also trustable, right?
When I started working on it, this was certainly one of the use cases: that's the reason that when you do an Perhaps the To conclude something from all this: I will experiment with bare cloning, overriding crlf settings and the other ideas that have arisen, and will post back my findings. I guess the TUF stuff (haha) is a more long-term issue. |
I'm back with some results. It doesn't matter if done in bare or normal repo, doing this:
produces the same hash in both linux and windows ( |
Awesome! It's even simpler than I thought. |
Cool! Good to know the |
See last conclusions in #66. The idea is to use
git archive | sha512sum
or equivalent.Open questions:
git -c core.autocrlf=false
hg --config extensions.eol=! checkout -C
).svn/pristine
)The text was updated successfully, but these errors were encountered: