-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Are we having duplicate files in the Stack Exchange repo? #1017
Comments
Korean issue has already been pointed by a contributor a long time ago, but no-one cared: #838 (there was also a Portuguese file pointed in this issue). English this is weird. |
Ok so I guess these old files need to be identified and deleted @RavanJAltaie |
@benoit74 I checked in the library There is only 1 file per language with different description. |
@RavanJAltaie The correct search term would be The first one (on the left) is labeled as The second one is from 2020, so you need to find it on download.kiwix.org and ask for deletion. I have no idea why the older one should be bigger than the more recent one, maybe this should be investigated in case the recent one is broken / faulty. |
There is at least these two issues:
I don't achieve to get a grasp on why there is 3 "English" ZIMs in the imager service catalog. This is probably a bug in imager service, will be easier to confirm once the rest is cleaned-up |
I see we also have a problem with https://library.kiwix.org/#lang=eng&q=english&category=stack_exchange So please @RavanJAltaie try to identify how many of these duplicates we have and ask for deletions. Regarding korean and looking at download.kiwix.org/zim/stack_exchange I see that Should probably be deleted. @benoit74 any idea why the library would chose to display the 2020 file rather than the 2023 one? |
korean.stackexchange.com_en_all is not equal to korean.stackexchange.com_ko_all For korean.stackexchange.com_en_all, the 2020-10 version is the last one For korean.stackexchange.com_ko_all, the 2023-10 version is the last one |
Not possible with sotoki AFAIK, at least for now. |
Good catch. But then why don't we have three of them ( |
Either scraper has changed or metadata sourced from StackExchange has changed. I think the second option is most probable. |
I've created the deletion request: #1103 Let's keep this open to see what happens with the triple english once deletion is done. |
No need to keep this opened, I found the bugs: openzim/sotoki#321 and offspot/imager-service#436 |
I'm looking at our stackexchanges and I see a bunch of files with almost similar descriptions yet very different size.
I haven't investigated yet but @RavanJAltaie can you please have a look?
The text was updated successfully, but these errors were encountered: