The ScrapPyJS
class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.
pip install ScrapPyJS
from ScrapPyJS import ScrapPyJS
# initiate ScrapPyJS
scrappy = ScrapPyJS()
# set js script
JS_SCRIPT = "return 'ScrapPy scrapping!'"
scrappy.set_script(JS_SCRIPT)
# rest of the code goes here...
# close ScrapPyJS
scrappy.end()
-
Use the
scrap
method to scrape a webpage:result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
-
Set up a list of target URLs
URLS = [ 'https://url1.com/', 'https://url2.com/homepage/', 'https://url2.com/about', ]
-
Use the
loop_through
method to scrape through the target webpages webpage:# The result value will be a list if save mode is on, else a JSON string result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
-
Retrieve the result of the scraping operation:
print(result)
-
Via toggle:
scrappy.toggle_save_mode()
Here, the save mode which is set to
False
by Default is toggled toTrue
. So the save file informations are default. -
Via
set_save_info
method:scrappy.set_save_info(save=True)
Here, we directly set save mode to
True
leaving other infos to default.
-
Via
set_save_info
method:FILE_NAME = "output" FILE_FORMAT = "json" SAVE_LOCATION = "path/to/file/" scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)
Please note that you will need to have the necessary Selenium
and WebDriver
dependencies installed to use this code.
The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md
This code has been licensed under MIT
open source copyleft license.
NAME: Hind Sagar Biswas
Website: coderaptors.epizy.com