-
Notifications
You must be signed in to change notification settings - Fork 736
Added handling of filename_as_id and file_extractor to SharePointReader #934
Added handling of filename_as_id and file_extractor to SharePointReader #934
Conversation
…er (mimicking what is done, for example, for MinioReader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @ferdinandosimonetti!,
added one comment, rest everything looks great!
@@ -28,6 +28,8 @@ def __init__( | |||
client_id: str, | |||
client_secret: str, | |||
tenant_id: str, | |||
filename_as_id: bool = False, | |||
file_extractor: Optional[Dict[str, Union[str, BaseReader]]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a small mistake here:
file_extractor: Optional[Dict[str, BaseReader]] = None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully my second commit should solve the issue that prevented the first run to work out... previously I just forgot to add the appropriate imports from typing (Optional and Union).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't have to add union
for basereader. It'll be always a baseReader class
Replace file_extractor line with the below one
file_extractor: Optional[Dict[str, BaseReader]] = None,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote it that way because I was shamelessly copying line 25 of llama_hub/minio/minio-client/base.py
file_extractor: Optional[Dict[str, Union[str, BaseReader]]] = None,
however, I'll rewrite it that way
You'll need to look at linting and test case as well on this. |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Solved the last complaint about importing Union, that is unused |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt action on the lint part @ferdinandosimonetti
Highly appreciated!
Description
I've taken MinioReader's handling of file_extractor parameter for SimpleDirectoryReader
This allows to choose a customized matching between file extension and its Reader/Decoder, and *shouldn't wreak havoc on SharePointReader's functionality.
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Suggested Checklist:
make format; make lint
to appease the lint gods