Parse info and download books from tululu.org

The script allows its user to parse info and download books and its covers from the website.

Setup

Create venv

python3 -m venv venv

Activate venv

source venv/bin/activate

Install dependencies

pip install -r requirements.txt

Run the script

python3 main.py

Parameters

The script accepts parameters that set a range of IDs for the script to work through. Running it this way:

python3 main.py

will cause it to run with the default parameters, namely 1 and 10. This means that the books with those IDs will be checked and downloaded.
Similarly, if it is run like this:

python3 main.py --start_id 10 --end_id 20

the script will iterate through the range of IDs from 10 to 20 inclusively.

Parsing the sci-fi category

The repository has another script which allows you to download sci-fi books and their covers. It also has a set of acceptable parameters. The script's main task is to iterate through pages of the category, parsing each book on every page. It retrieves the link to download the corresponding text file and the associated image for each book.

python3 parse_tululu_category.py

Please consider using parameters to customize the script's behavior.

--start_page - choose a value from 1 to 701. If no value is provided, the default starting page for iteration will be 1.
--end_page - choose a value from 1 to 701. If no value is provided, the default end page for iteration will be 701.
--dest_folder /Users/username - allows you to specify the directory where you want your results to be saved. By default, the script will use the current directory for saving the results.
--skip_imgs - if specified, the script will skip the image downloading process.
--skip_txt - if specified, the script will skip the txt files downloading process.
--json_path /Users/username/dev - allows you to independently set the file path for the JSON description output..

Rendering a website

There is a script named render_website.py in the repository. You can create a website out of downloaded sources from previous step with it.
Here are the steps:

Run the script (by the way you can customise its behavior related to the description file it will get data to create web pages from, just add --description_path flag and specify its location or leave it blank so books_description.json will be entered for you), it creates pages at /pages/ and start serving them infinitely so they are available at http://127.0.0.1:5500
Go http://127.0.0.1:5500/pages/index1.html

An example's deployed for you.
The complete product you are welcome to observe is at https://frqhero.github.io/layout__part3/pages/index1.html

Offline usage

It is worth noting that the project can be used offline in two ways. After the pages are created, you can either open the files directly from the 'pages' directory or, if the script is running and serving, access them at http://127.0.0.1:5500/pages/index1.html.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
media		media
pages		pages
static		static
.gitignore		.gitignore
README.md		README.md
books_description.json		books_description.json
main.py		main.py
parse_tululu_category.py		parse_tululu_category.py
render_website.py		render_website.py
requirements.txt		requirements.txt
template.html		template.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parse info and download books from tululu.org

Setup

Parameters

Parsing the sci-fi category

Rendering a website

Offline usage

About

Releases

Packages

Languages

frqhero/layout__part3

Folders and files

Latest commit

History

Repository files navigation

Parse info and download books from tululu.org

Setup

Parameters

Parsing the sci-fi category

Rendering a website

Offline usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages