Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commiting files larger than 4 GB #1063

Closed
1 task done
elmorisor opened this issue Feb 15, 2017 · 39 comments
Closed
1 task done

Commiting files larger than 4 GB #1063

elmorisor opened this issue Feb 15, 2017 · 39 comments

Comments

@elmorisor
Copy link

  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
$ git --version --build-options

git version 2.11.1.windows.1
built from commit: 1c1842bcba45569a84112ec64f72b08eb2d57c68
sizeof-long: 4
machine: x86_64
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
$ cmd.exe /c ver

Microsoft Windows [Version 6.1.7601]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Path Option: Cmd
SSH Option: OpenSSH
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
Enable Builtin Difftool: Enabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

no

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

CMD

git commit
del filename
git checkout .

I was managing my large files with the git-lfs-extension. Some of them were more than 4GB in size. After deleting one of those files from my working tree and do a normal git checkout I ended up with a somehow crippled file with a size of only 46 MB left.

For testing reasons I tried to commit a 4,3 GB file to my git repository without the LFS extension.
After deleting that file from the working tree and checking out again, I expected the 4,3 Gb file to
be present again. Intead I ended up with the same small file.
Seems like the file was never committed correctly. The .git directory is about 100 MB in size.
Reinstalling Git and changing machines did not change the issues.
Files smaller than 4GB are not affected.
After that I tried to search the gitconfig for some settings realted to 64-bit. I found the core.gitPackedLimit, which should default to 8GB on 64-bit systems. I manually set it to 8g myself. Git told me that the value is out of range. Only after setting it to a value smaller than 4 GB I could use git normally again.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

Issue is not repository-specific

@mfriedrich74
Copy link

mfriedrich74 commented Feb 16, 2017 via email

@elmorisor
Copy link
Author

elmorisor commented Feb 16, 2017

I used the 64-bit installer from git-for-windows... thought I'd get a 64-bit git with it:
see my output: machine: x86_64

@whoisj
Copy link

whoisj commented Feb 16, 2017

That was my mistake, I misread the output of the git --version --build-options. I realized it nearly immediately (hence the deletion of the post). Apologies about that. 😔

@dscho
Copy link
Member

dscho commented Feb 17, 2017

@elmorisor @whoisj the red herring was sizeof long: 4. It is perfectly legitimate for 64-bit compilers to define the long type as 32-bit, and that is the case for GCC on Windows (which Git for Windows uses to compile the source code).

The problem is the Git source code, which uses unsigned long in places where size_t would be correct. I think that that is the issue here.

@obe1line
Copy link

I have also hit this problem.
Originally I raised it as a BitBucket support issue, then Git-LFS (git-lfs/git-lfs#2434).
size_t may also be platform specific - __uint64 or longlong?

@dscho
Copy link
Member

dscho commented Jul 25, 2017

size_t may also be platform specific

Yes, it most certainly is. On 32-bit platforms, for example, you simply cannot map 4GB files into memory via mmap() (or Windows' equivalent). On those platforms, you still can read and write such large files, of course, using off_t as data type.

@polygonica
Copy link

Would really appreciate a fix for this.

@dscho
Copy link
Member

dscho commented Nov 7, 2017

@polygonica I warmly welcome you to work on this. (While it may seem convenient to expect others, including myself, to fulfill your wishes, it rarely works.)

There is already code to stream large objects (so that they do not have to be mapped into memory), and it should be possible to at least fall back to that option in git add.

@isometric
Copy link

Just to confirm, this is only reproducible on 32 bit builds correct?

@dscho I can take a stab at it this weekend

@dscho
Copy link
Member

dscho commented Nov 7, 2017

Just to confirm, this is only reproducible on 32 bit builds correct?

@isometric I am not really sure, as I have not followed the recent developments on the size_t vs unsigned long issue closely. It used to be the case, and may still be the case, that Git internally handles memory buffers using unsigned long (which is 32-bit even in 64-bit Windows). If that has not changed, you will likely run into the same issues with 64-bit Git for Windows.

As a quick first glance, you may want to run git grep EXPENSIVE in git.git's t/ subdirectory (I usually do that with the -O option to see the files in the pager, so that I can scroll back and forth to see more context). Some of them expensive tests work on large files. Other prereqs related to large files are LONG_IS_64BIT and TAR_HUGE.

It is always worth a look to see whether Git's test suite has something related, because then it is relatively easy and quick to run a test to validate possible fixes (or prove that they don't fix the issue).

I also stumbled across the plug_bulk_checkin() function yesterday, which you may want to have a closer look at when you take that stab this weekend (for which I am very grateful!). I could imagine that it solves at least part of the problem reported in this ticket.

@isometric
Copy link

Had a number of power issues in the neighbourhood this weekend so didn't get a chance to take a look. I'll try to find some time next weekend.

@congyiwu
Copy link

congyiwu commented Dec 5, 2017

BTW, I was able to hash-object/cat-file a 5GB blob successfully w/ git in Ubuntu on Windows. It turns out that hash-object produces the same (correct?) result for but linux/Windows, but cat-file fails on Windows only. I'm using 64-bit for both versions of Git.

I used this repro instead:

git hash-object -w --no-filters M:\tmp\5gb.bin
ecc0720b2a71b74c0980dbdf31556097355883ef

git cat-file -p ecc0720b2a71b74c0980dbdf31556097355883ef > m:\tmp\ignore2.bin
error: bad object header
fatal: Not a valid object name ecc0720b2a71b74c0980dbdf31556097355883ef

@derrickstolee
Copy link

derrickstolee commented Dec 6, 2017

It looks like this error boils down to unpack_object_header_buffer() which uses unsigned longs everywhere instead of size_t's. In Windows, this is 32-bit but in Linux it is 64, hence the difference.

There are many places in Git that use unsigned long for a size.

@dscho
Copy link
Member

dscho commented Jan 3, 2018

It looks like this error boils down to unpack_object_header_buffer() which uses unsigned longs everywhere instead of size_t's.

Right. And Git also uses unsigned long in other places where off_t would be appropriate. There is no excuse for that, really.

@ksulli
Copy link

ksulli commented Apr 24, 2018

Can I do anything to help? Is this a bug in git for windows, or does it need to be fixed upstream? This bug prevents me not only committing 4GB files directly, but also trying to use LFS.

@dscho
Copy link
Member

dscho commented Apr 25, 2018

Can I do anything to help?

@ksulli help is always welcome. Please note that there had been a couple patches flying about on the mailing list, to try to address the unsigned long vs off_t/size_t issue (which is mostly at play here).

However, for the concrete purpose of resolving this here issue, I think there is some sort of streaming mode available in the internal Git API. That would make it possible to, say, generate an object larger than 4GB via git hash-object -w --stdin and everything should work correctly. The trick will then be to activate that mode automatically when calling git add.

How's your C fu?

I see that there is already something called big_file_threshold in index_fd():

git/sha1_file.c

Lines 1900 to 1922 in 918fa5c

int index_fd(struct object_id *oid, int fd, struct stat *st,
enum object_type type, const char *path, unsigned flags)
{
int ret;
/*
* Call xsize_t() only when needed to avoid potentially unnecessary
* die() for large files.
*/
if (type == OBJ_BLOB && path && would_convert_to_git_filter_fd(path))
ret = index_stream_convert_blob(oid, fd, path, flags);
else if (!S_ISREG(st->st_mode))
ret = index_pipe(oid, fd, type, path, flags);
else if (st->st_size <= big_file_threshold || type != OBJ_BLOB ||
(path && would_convert_to_git(&the_index, path)))
ret = index_core(oid, fd, xsize_t(st->st_size), type, path,
flags);
else
ret = index_stream(oid, fd, xsize_t(st->st_size), type, path,
flags);
close(fd);
return ret;
}

So the trick would be to first test whether it works now, and if it does not, investigate in that code (possibly using a debugger and/or inserting debug statements) where things go south.

If you need to debug this, that is really easy: install the Git for Windows SDK (it'll clone about half a gig worth of Git objects, though), then call sdk cd git, edit the Makefile therein and delete the -O2 from the CFLAGS, then run make -j15 install. After that, you should be able to test this using gdb.

Please let me know when/where you get stuck.

@ksulli
Copy link

ksulli commented Apr 27, 2018

Thanks for the pointers, I'm a bit rusty with compiled languages in general but this issue really irks me so I'll do my best.

@dscho
Copy link
Member

dscho commented Apr 27, 2018

Thanks! As I said, any help is welcome. If you get stuck, just holler (and provide details ;-)).

@JohnFrampton
Copy link

JohnFrampton commented Nov 7, 2018

I also have a problem on windows with files >4 GB via Git LFS.
From the Git lfs-thread I learned, there is a unsolved problem in the git engine on windows which causes the big-file-problem see here https://github.com/git-lfs/git-lfs/issues/2434

Can you please tell me, when we can expect a bugfix for that in git?

@PhilipOakley
Copy link

PhilipOakley commented Nov 7, 2018

It really needs someone to help the upstream git with the migration to a streaming interface, if I understand dscho's well informed comment above

If you are able to help with coding that would be great. (many codez make all issues shallow ;-)

@aggieNick02
Copy link

Note in the referenced git lfs issue there is a workaround for using >4GB files with git-lfs on windows. It is just a slight change in workflow for those that can't wait or don't have the time to fix directly.

@alegrigoriev
Copy link

long vs size_t difference doesn't matter, because neither of them should be used for file sizes or offsets. Instead, off_t must be used. Indeed, off_t used all throughout the code for this purpose. If there's a case where long or size_t is incorrectly used for file size or offset, it must be changed to off_t

@PhilipOakley
Copy link

There is plenty of discussion on the upstream mailing list about the issue of the size of various types on different systems, and their incompatibilities. The archive https://public-inbox.org/git/?q= is probably the most useful one for searching.

@dscho
Copy link
Member

dscho commented Feb 27, 2019

Can you please tell me, when we can expect a bugfix for that in git?

I think it might be this mindset that turned the discussion in this ticket away from a useful course: if you want something, you gotta put some effort behind it, not just wait for others to miraculously fulfill your wishes without getting anything in return.

So I'll close this ticket, and let those who are putting in more effort than mere words (you know who you are) be active elsewhere (you know where), being grateful for it (you know I am).

@dscho dscho closed this as completed Feb 27, 2019
@aggieNick02
Copy link

I'm sorry some users either don't realize or don't appreciate that much of the work on git-for-windows is done by volunteers. I think we lose a lot by closing this issue though - it is still an issue, it contains information about the root cause, and it is linked to as the cause of a git-lfs bug (git-lfs/git-lfs#2434). Would you consider reopening? Perhaps someone will pick it up someday (perhaps even me); while closing it may send a message, I also think it will create a good bit of confusion from folks watching the issue or dealing with it.

@PhilipOakley
Copy link

PhilipOakley commented Feb 27, 2019

extending on previous comment, try
https://public-inbox.org/git/[email protected]/

The current code for detecting zlib decode length errors is full of poorly defined behaviour because the up/down casting of the different variable types on different architectures produces different results (as opposed to undefined behaviour..).

I expect that some 'C language lawyer' action is needed to cast the zlib stream length to ptrdiff and then use that (ptr arithmetic) ubiquitously to get consistent results on all platforms.

I think the git_lfs link is a red herring because it fails to get to the bottom of the problem for systems where Windows can handle proper 64 bit addresses.

@dscho
Copy link
Member

dscho commented Feb 28, 2019

I think we lose a lot by closing this issue

I disagree. The valuable technical discussion with people following up with patches was not happening here. There are people putting their money where their mouth is, making sure that their wishes come true by putting some energy and effort behind it. Just not here. So: Let's just draw the curtain of charity over the rest of this ticket, and let it rest in peace.

@aggieNick02
Copy link

I understand the frustration, but following that logic means all the real issues that aren't seeing active investigation and/or fixing should be closed. Is that the plan moving forward?

To someone who experiences the bug and ends up here via google, etc, there will be confusion. They'll think "oh, this is a known issue, cool - wait, it's closed - why am I still encountering it"?

It would help users if something visible about the issue (perhaps title) could at least be updated to indicate that this issue is not fixed and users should not have any expectation that it ever will be.

@dscho
Copy link
Member

dscho commented Mar 1, 2019

@aggieNick02 are you really trying your best to bind our time here? Is that what you want? To keep talking, talking, talking, and not get anything done?

It would help users if something visible about the issue (perhaps title) could at least be updated to indicate that this issue is not fixed and users should not have any expectation that it ever will be.

I am totally not on board with this idea. Why? Because it makes you feel that you are a strict user and not responsible for anything while others should do all that.

How about getting involved instead? How about you update this ticket with the progress? How about you pay attention to the discussion on the mailing list, summarizing where the progress is at?

That is easily something you can do. And something that takes away the burden from others. Rather than piling and piling even more responsibility on those few who take care of the issues you want to see resolved. Or better put: trying to pile, because really, it is not the responsibility of anyone to take care of your wishes, not if you do not give them money or time or anything in return.

So: while I see what you are saying about the confusion and about opaque progress, I have to point out that this is a community effort, and if you choose not to be part of that effort, you have no say in how it is run. If you choose to be part of the effort, your contributions will be appreciated. And even better: you can then have what you want, because you make it so.

@JohnFrampton
Copy link

Dear dscho
It appears to me that you are frustrated about this. I can understand that. But open source does not only work by "if I want a bug to be fixed, I indulge in whatever project and do it myself". If some people with experience (like you) maintain a project others should not be put down because they politely ask for a bugfix. We all have our own projects to maintain and put effort in it for others should not be forced to do that work themselves. But you can run that, as you like and that includes closing a ticket that is still not fixed.
Thx for your comments

@dscho
Copy link
Member

dscho commented Mar 1, 2019

Dear @JohnFrampton thank you for speaking up. However, your speaking up does not help getting the issue at hand resolved, does it? What can you do to help?

@JohnFrampton
Copy link

Well I downloaded the code and have a look and have to find out what I understand and how I can deal with that. I will give it a try. But currently i'm payed to work on something else, so ... lets see ...
I will report as soon as I have anything achieved.

@dscho
Copy link
Member

dscho commented Mar 1, 2019

Well I downloaded the code and have a look and have to find out what I understand and how I can deal with that.

That's good. Now let's also get you into the conversation with the people who are already working on this: please head over to gitgitgadget#115.

@srothery
Copy link

srothery commented Dec 9, 2020

Hi just for others to know this issue still occurs in git for windows 2.29.2 - the linked gitgitgadget#115 is also closed but as far as I can tell not "complete" - it links to this which is still open however been quiet for over a year

@aggieNick02
Copy link

Thanks for the update @srothery . If you want/need to work with larger files on windows, it is possible, but involves workarounds. There is a bit of discussion at git-lfs/git-lfs#2434, with the workaround explained in a post there by @technoweenie.

It isn't perfect, but it is workable. We run a self-hosted git-lfs server with >4GB files both committed from and pulled to windows machines.

@srothery
Copy link

srothery commented Dec 9, 2020

Thanks @aggieNick02 - @technoweenie 's fix was to do with the smudge filter - should I also disable the clean filter too? If I do both of those does that mean for the whole repo all lfs files won't go into my working folder but the .git/lfs/objects right? I was looking to see if I could disable smudge/clean just for my files that are >4GB but can't spot examples or hints if this is possible.

@aggieNick02
Copy link

So it's been a little while since I've configured this, but here's what I remember/have settled on:

  • I don't know of any way to only apply the workaround to files that are >4GB. So for me it applies to all files in git-lfs
  • You can't disable the clean filter. This means that after committing and pushing a >4GB file, your file will be locally malformed. You will need to then perform a git checkout of the file followed by a git-lfs pull to fix this. If you forget to do this, you'll be reminded, as git status will notice your local corrupt file is different from what it should be.
  • My .gitconfig has the following lfs section (filter-process has to be skipped too):
    [filter "lfs"]
        smudge = git-lfs smudge --skip -- %f
        process = git-lfs filter-process --skip
        required = true
        clean = git-lfs clean -- %f```
    

@DaDummy
Copy link

DaDummy commented Jun 16, 2022

For anyone else stumbling onto this discussion and not knowing what to make of all those closed issues and what the state of the issue actually is:

Even though this issue was closed prematurely, folks organized somewhere else (thank you who ever you are!) and according to git-lfs/git-lfs#2434 (comment) the fix has been merged to git for windows for the 2.34.0 release and there seems to be an effort to upstream the fix to mainline git as well for 3.10 as far as I understood, but don't quote me on that, I might have misunderstood that part.

And just to avoid any ambiguity: This was always a windows-only bug. Upstreaming to mainline really just means that Windows builds created from the mainline sources will behave correctly too then.

@dscho
Copy link
Member

dscho commented Jun 17, 2022

this issue was closed prematurely

This issue was closed because the conversation was becoming counter-productive. You reminded me that I wanted to lock it, thank you.

@git-for-windows git-for-windows locked as too heated and limited conversation to collaborators Jun 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

16 participants