Listing document titles #74

AstridDBJ · 2024-01-09T15:46:21Z

Hi!

I'm quite new to Python (so this might be an easy fix), but I found LitStudy really interesting to look into.

I have tried to load documents from different files (from different databases) and used litstudy.types.DocumentSet.union to get a DocumentSet without duplicates. However, I would like to know which papers are then in this new collection/dataset. Is it possible to get LitStudy to list (e.g. in a Pandas DataFrame?) the titles of the documents in a specific dataset? Or provide a list/table of the titles just at any stage in the process?

stijnh · 2024-01-09T16:17:12Z

Hi Astrid! Thanks for using litstudy and thanks for reporting this issue!

Unfortunately, at the moment there is no functionality to see which papers were removed when taking the union of multiple document sets.

Issue #68 discussed a similar problem where the is now way to find the papers removed by unique(). An idea there was to add a duplicates() method that returns the papers removed by unique() (such that len(docset) == len(docset.unique()) + len(docset.duplicates()). Something similar could be implemented for union().

We are open to contributes and will accept relevant pull requests that add this functionality.

AstridDBJ · 2024-01-10T09:12:49Z

Good to know, thanks! However, I'm actually more interested in the documents that are kept after the union (so not the removed duplicates); e.g. to know which documents I should look into for my review, and thus also the titles of the documents that the different kinds of histograms are based on. Is that possible to do with LitStudy?

stijnh · 2024-01-18T21:25:58Z

You can always print the documents like this:

docs_csv = docs_ieee | docs_springer

for doc in docs_csv:
  print(doc.title)

Would that work? Each document has many attribute that you can access (such as the title, authors, publisher, etc.). See here: https://nlesc.github.io/litstudy/api/types.html#litstudy.types.Document

NLeSC locked and limited conversation to collaborators Jan 25, 2024

isazi converted this issue into discussion #79 Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Listing document titles #74

Listing document titles #74

AstridDBJ commented Jan 9, 2024

stijnh commented Jan 9, 2024

AstridDBJ commented Jan 10, 2024

stijnh commented Jan 18, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Listing document titles #74

Listing document titles #74

Comments

AstridDBJ commented Jan 9, 2024

stijnh commented Jan 9, 2024

AstridDBJ commented Jan 10, 2024

stijnh commented Jan 18, 2024 • edited Loading

This issue was moved to a discussion.

stijnh commented Jan 18, 2024 •

edited

Loading