This is an implementation of a simple image scraper made in Django using beautifulsoup for scraping. It has a simple frontend to view the images and download them.
Check Pipfile for details
- beautifulsoup4
- lxml
- requests
- cssutils
- pillow
- python 3.7
Clone the repository
$ git clone https://github.com/rachhek/imagescraper.git
$ cd imagescraper
Create a virtual environment and install the dependencies
$ pipenv shell
$ (imagescraper) pipenv install
Once the pipenv has finished installing, run migrations for django.
$ (imagescraper) python manage.py migrate
Run the server
$ (imagescraper) python manage.py runserver
Open the application in http://127.0.0.1:8000/
Example of scraping the homepage of http://unity.com
The Urls and images can be downloaded
The physical location of the images and txt file of URLs is
<path_to_project>/imagescraper/media/
scraper_app/lib.py
scraper_app/templates/scraper_app/scraper/index.html
- Cannot download images that are in the form of base64
- Only scrapes "img" tag and "background-url" style tags
- does not automatically scroll pages
- might not properly scrape images for a highly dynamic websites
The logs are stored in /imgscraper/debug.log