You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm quite new to Python (so this might be an easy fix), but I found LitStudy really interesting to look into.
I have tried to load documents from different files (from different databases) and used litstudy.types.DocumentSet.union to get a DocumentSet without duplicates. However, I would like to know which papers are then in this new collection/dataset. Is it possible to get LitStudy to list (e.g. in a Pandas DataFrame?) the titles of the documents in a specific dataset? Or provide a list/table of the titles just at any stage in the process?
The text was updated successfully, but these errors were encountered:
Hi Astrid! Thanks for using litstudy and thanks for reporting this issue!
Unfortunately, at the moment there is no functionality to see which papers were removed when taking the union of multiple document sets.
Issue #68 discussed a similar problem where the is now way to find the papers removed by unique(). An idea there was to add a duplicates() method that returns the papers removed by unique() (such that len(docset) == len(docset.unique()) + len(docset.duplicates()). Something similar could be implemented for union().
We are open to contributes and will accept relevant pull requests that add this functionality.
Good to know, thanks! However, I'm actually more interested in the documents that are kept after the union (so not the removed duplicates); e.g. to know which documents I should look into for my review, and thus also the titles of the documents that the different kinds of histograms are based on. Is that possible to do with LitStudy?
Hi!
I'm quite new to Python (so this might be an easy fix), but I found LitStudy really interesting to look into.
I have tried to load documents from different files (from different databases) and used litstudy.types.DocumentSet.union to get a DocumentSet without duplicates. However, I would like to know which papers are then in this new collection/dataset. Is it possible to get LitStudy to list (e.g. in a Pandas DataFrame?) the titles of the documents in a specific dataset? Or provide a list/table of the titles just at any stage in the process?
The text was updated successfully, but these errors were encountered: