Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more structured data when importing clippings & books #5

Open
mammuth opened this issue Jul 22, 2019 · 8 comments
Open

Add more structured data when importing clippings & books #5

mammuth opened this issue Jul 22, 2019 · 8 comments

Comments

@mammuth
Copy link
Owner

mammuth commented Jul 22, 2019

We're currently not storing all data which can be found in My Clippings.txt.

  • More book related information (if available) (ISBN, separate title & author, language
  • user added notes
  • Rethink the book-user mapping (currently a book is always owned by a user, but maybe that's dumb)
  • Add more clippings data (location, date, language, ...)
@mammuth
Copy link
Owner Author

mammuth commented Sep 22, 2019

Note that things like date are localized and seem to use different formats (maybe depending on the Kindle device?).

Examples

  • Paperwhite in German: Hinzugefügt am Samstag, 13. April 2019 10:25:27
  • ? in Spanish: Data de adição: terça-feira, 19 de março de 2019 23h32min06s GMT-03:00

@JSerwatka
Copy link
Contributor

JSerwatka commented Jun 28, 2021

Why there is author_name as a separate field in the Clipping model, when there is the same field inside the Book model, and the Book model is used as a fk?

@JSerwatka
Copy link
Contributor

JSerwatka commented Jun 28, 2021

Note that things like date are localized and seem to use different formats (maybe depending on the Kindle device?).

Do you know if the part which differentiate between a highligh, a bookmark, and a note is also localized?
e.g.

- Your Highlight on page 119-119 | Added on Wednesday, 31 March 2021 18:57:00
- Your Note on page 119 | Added on Wednesday, 31 March 2021 18:57:13
- Your Bookmark at location 1607 | Added on Wednesday, 2 June 2021 11:50:54

@mammuth
Copy link
Owner Author

mammuth commented Jul 11, 2021

Why there is author_name as a separate field in the Clipping model, when there is the same field inside the Book model, and the Book model is used as a fk?

I don't quite remember. I think books were introduced later and there are plaintext(/non-kindle) clippings where we simply don't know the book (but the author)

@mammuth
Copy link
Owner Author

mammuth commented Jul 11, 2021

Do you know if the part which differentiate between a highligh, a bookmark, and a note is also localized?

No, I think I noticed the above localization difference in error logs or something like that. This certainly is also a reason to store the actual raw files - then we might learn better, how they work (I think they also differ based on the device model...)

@JSerwatka
Copy link
Contributor

So do you want to store each MyClippings as plaintext inside db?

@mammuth
Copy link
Owner Author

mammuth commented Jul 22, 2021

Would there be a reason to prefer something else like the filesystem over the DB? 🤔
One reason would be DB size. I'm not sure whether there are any limits on the current hosting provider.

But I guess going with the DB makes most sense.

We could have a model for uploads which stores the timestamps and the file contents.

Going forward, created clippings could reference this model if we want to keep track of this information (not sure whether this might be a useful feature in the future?).

@JSerwatka
Copy link
Contributor

Would there be a reason to prefer something else like the filesystem over the DB? 🤔

I think not. Besides, exporting all data from one model to txt files can be done with just a few lines of code.

We could have a model for uploads which stores the timestamps and the file contents.

Exactly, the MyClippings model would just have a TextField and a timestamp.

I would use MyClippings files just to understand how they work, but regarding additional fields (author, note, date) and overall data update process I would choose the solution mentioned here: #16 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants