Skip to content

Appendix A: Classes Diagram

Martin Olveyra edited this page Apr 12, 2023 · 3 revisions

--

Script classes

Base classes

  • ArgumentParserScript - Provides a standard interface for any script that needs to parse arguments
  • BaseScript - Provides methods for any script that schedules others scripts and spiders on ScrapyCloud
  • WorkFlowManager - Provides a standard interface for scripts that manage workflows running on ScrapyCloud

Utility classes

  • DeliverScript - Provides base code and interface for scripts that read and deliver items from jobs that belong to same workflow.
  • BaseClonner - Provides base code and interface for objects that clone jobs
  • CloneJobScript - Standalone script for clonning jobs with exactly the same or modified parameters.

Crawl Managers

  • CrawlManager - Provides base code and interface for scripts that manage scheduling of spiders. It is also a standalone script for basic spider scheduling.
  • PeriodicCrawlManager - Schedules a spider with same arguments periodically and continuously, waiting for a job to finish before scheduling the next.
  • GeneratorCrawlManager - Schedules spider jobs according to a generator of parameters.
  • HCFCrawlManager - Schedules spiders that read from/write to a dynamic frontier. This crawl manager is implemented on a separated package, hcf-backend, which is a frontera backend for Hubstorage Crawl Frontier.

Graph Manager

  • GraphManager - Allows to define a complex workflow by defining a directed graph of tasks.

--