This example requires a few additional libraries. You can install them using pip
:
pip install -r requirements.txt
redun run workflow.py main
By default this will scrape web pages from https://www.python.org/ with a depth of 2 link traversals. All of the HTML files encountered will be stored in crawl/
. Word frequency across all pages will be calculated and a CSV of the word counts will be stored in computed/word_counts.txt
.
Lastly, an HTML report is generated in reports/report.html
that summarizes the scrapping and analysis. The report is generated using a jinja2 template stored in templates/report.html
.
Feel free to try other urls and depth of scrapping using the task arguments:
redun run workflow.py main --url URL --depth DEPTH
Also feel free to alter and the report template templates/report.html
. It is passed the task make_report()
as a File
argument, so you should have automatic reactivity to changes in the template when rerun the workflow.