-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to enable "Purge Items" Feature ? #93
Comments
Hi @tsuyoshihamano. What version of the connectors-sdk and Fusion are you using? |
Hi @mwmitchell , |
@mwmitchell are you familiar with this issue? Christian at Raytion reported it's currently blocking them on the Yammer/MS Teams connector work. We have our update with Raytion tomorrow morning if it'd be helpful for you to join. |
@tsuyoshihamano "Purge Items" feature should work for connectors that does recrawls. Yes, it should delete items in crawlDb as well as content collection, if they are not modified. This holds true for AccessControlItem in crawlDb, but it does not delete any thing from AccessControl collection. Purge items should be enabled by default in 5.3. Are you emitting checkpoint as in incremental crawl? |
Thanks @puneetkhanal , |
@tsuyoshihamano I looked further regarding purge stray items. Now, the purge stray items works in a special case, for that connector needs to emit a checkpoint and emit candidates with We would like to understand more about your use case. So, if you are implementing a recrawl or incremental connector, then you need to emit a delete for that item, in order to remove that item from solr collection. Purge items would work only in the special case, I mentioned above (as this is for special case only, it's better not to rely upon this as this is subject to change) Ideally, it would be better if the connector could figure out by itself which items it needs to delete and emit a delete for that item. The same case holds for AccessControlItem also. |
@puneetkhanal , |
@tsuyoshihamano yeah |
@puneetkhanal , would the next crawl also then emit a checkpoint after the crawl ? The deletions will then happen based on the diff of candidates between checkpoints of different crawls? |
@tsuyoshihamano yeah subsequent crawl will update a checkpoint with additional information that may be required for next crawl, and whenever next crawl ends, it will check crawldb to find stray items or obsolete items and delete them. |
It checks for items that have not been modified in the current crawl and then deletes them from crawlDb and solr collection (content collection). |
Thanks @puneetkhanal , |
Yeah, that is correct way /**
* Example Usage: {@code fetchContext.newDeleteAccessControlItem(id)
* .withQuery(Collections.singletonMap("name","xyz"), false)
* .emit();
* }
*/ |
Hi, Tsuyoshi from Raytion GmbH here.
I read about the Purge Items Feature which is also partially mentioned in your documentation here.
I assume that this is a feature where items stored in the Crawl DB and not being fetched in the previous job will be cleaned up.
Will those documents will be also deleted from the Solr Collection (they are also registered as Documents) ?
If yes, does this feature requires to be enabled explicitly as we are currently not able to observe our unvisited items to be deleted.
Does the same mechanism also applies to Access Controls ?
Thank you in advance.
The text was updated successfully, but these errors were encountered: