-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full file needed for Documents #107
Comments
I'm having a similar issue, I have a valid |
This is probably a duplicate of #83 |
This isn't really a duplicate since the header size of 262 is only applicable for non-Microsoft files, so the docs are actually wrong. I've since determined the only way to accurately determine the content type of MSO files is to read in the entire buffer or hard code a map based on file extension. In case anyone else wants it, this may save you some time: var MicrosoftExtMap = map[string]string{
".doc": "application/msword",
".dot": "application/msword",
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".dotx": "application/vnd.openxmlformats-officedocument.wordprocessingml.template",
".docm": "application/vnd.ms-word.document.macroEnabled.12",
".dotm": "application/vnd.ms-word.template.macroEnabled.12",
".xls": "application/vnd.ms-excel",
".xlt": "application/vnd.ms-excel",
".xla": "application/vnd.ms-excel",
".xlsx": "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
".xltx": "application/vnd.openxmlformats-officedocument.spreadsheetml.template",
".xlsm": "application/vnd.ms-excel.sheet.macroEnabled.12",
".xltm": "application/vnd.ms-excel.template.macroEnabled.12",
".xlam": "application/vnd.ms-excel.addin.macroEnabled.12",
".xlsb": "application/vnd.ms-excel.sheet.binary.macroEnabled.12",
".ppt": "application/vnd.ms-powerpoint",
".pot": "application/vnd.ms-powerpoint",
".pps": "application/vnd.ms-powerpoint",
".ppa": "application/vnd.ms-powerpoint",
".pptx": "application/vnd.openxmlformats-officedocument.presentationml.presentation",
".potx": "application/vnd.openxmlformats-officedocument.presentationml.template",
".ppsx": "application/vnd.openxmlformats-officedocument.presentationml.slideshow",
".ppam": "application/vnd.ms-powerpoint.addin.macroEnabled.12",
".pptm": "application/vnd.ms-powerpoint.presentation.macroEnabled.12",
".potm": "application/vnd.ms-powerpoint.template.macroEnabled.12",
".ppsm": "application/vnd.ms-powerpoint.slideshow.macroEnabled.12",
}
func MicrosoftContentType(filename string) (string, bool) {
ext := filepath.Ext(filename)
if contentType, ok := MicrosoftExtMap[ext]; ok {
return contentType, true
}
return "", false
}
func ContentType(filename string, header []byte) string {
if contentType, ok := MicrosoftContentType(filename); ok {
return contentType
}
kind, _ := filetype.Match(header)
return kind.MIME.Value
} |
The README specifically states:
I've tried this out and it works fine for all files except MS Office docs such as docx, xlsx, etc. These files have a kind of
application/zip
if given only the first 262 bytes, but if you give them the full file, either withMatchFile
orMatchReader
they are detected correctly.In fact, each file type seems to have a different buffer length minimum for
filetype
to report accurately.docx
only seems to require a minimum of 1750 bytes,.xlsm
requires at minimum of 1855 bytes. For each of these files, a buffer length under this amount will inaccurately reportapplication/zip
. For my application, this is very important.For now I'll have to do the work of determining the minimum buffer size for MSO files to report accurately, but if you know this already, please update the docs, or at least have a caveat around the 262 number.
The text was updated successfully, but these errors were encountered: