Skip to content
Change the repository type filter

All

    Repositories list

    • creepjs

      Public
      Creepy device and browser fingerprinting
      TypeScript
      MIT License
      203000Updated Jan 18, 2025Jan 18, 2025
    • lexbor

      Public
      Lexbor is development of an open source HTML Renderer library. http://lexbor.com
      C
      Apache License 2.0
      106000Updated Jan 18, 2025Jan 18, 2025
    • hero

      Public
      The web browser that’s nearly impossible for bot blockers to block
      TypeScript
      MIT License
      47000Updated Jan 18, 2025Jan 18, 2025
    • Python
      27000Updated Jan 17, 2025Jan 17, 2025
    • Page Object pattern for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      28000Updated Jan 17, 2025Jan 17, 2025
    • A python based HTML to text conversion library, command line client and Web service.
      Python
      Apache License 2.0
      29000Updated Jan 16, 2025Jan 16, 2025
    • TLS implementation in pure python, focused on interoperability testing
      Python
      Other
      81000Updated Jan 16, 2025Jan 16, 2025
    • Web data extraction tool implemented as chrome extension
      JavaScript
      GNU Lesser General Public License v3.0
      70000Updated Jan 16, 2025Jan 16, 2025
    • Contains the common item definitions used in Zyte.
      Python
      BSD 3-Clause "New" or "Revised" License
      8000Updated Jan 16, 2025Jan 16, 2025
    • fast python port of arc90's readability tool, updated to match latest readability.js!
      Python
      Apache License 2.0
      459000Updated Jan 16, 2025Jan 16, 2025
    • TrackMe

      Public
      Go
      GNU General Public License v3.0
      40000Updated Jan 15, 2025Jan 15, 2025
    • Node.js implementation of a proxy server (think Squid) with support for SSL, authentication and upstream proxy chaining.
      JavaScript
      Apache License 2.0
      1450010Updated Jan 15, 2025Jan 15, 2025
    • Zyte Data API integration for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      20000Updated Jan 15, 2025Jan 15, 2025
    • More routines for operating on iterables, beyond itertools
      Python
      MIT License
      292000Updated Jan 14, 2025Jan 14, 2025
    • Python
      5000Updated Jan 14, 2025Jan 14, 2025
    • iplist

      Public
      IP Address Collection and Management Service with multiple output formats: mikrotik, json, text, ipset, nfset, clashx, keenetic, switchy, amnezia
      PHP
      MIT License
      13000Updated Jan 14, 2025Jan 14, 2025
    • Spider templates for automatic crawlers.
      Python
      BSD 3-Clause "New" or "Revised" License
      4000Updated Jan 13, 2025Jan 13, 2025
    • A list of most common User Agent used on Internet.
      JavaScript
      MIT License
      16000Updated Jan 13, 2025Jan 13, 2025
    • Luminati HTTP/HTTPS Proxy manager
      JavaScript
      194000Updated Jan 12, 2025Jan 12, 2025
    • normality

      Public
      A tiny library for Python text normalisation. Useful for ad-hoc text processing.
      Python
      MIT License
      18000Updated Jan 9, 2025Jan 9, 2025
    • calamus

      Public
      A JSON-LD Serialization Libary for Python
      Python
      Apache License 2.0
      12000Updated Jan 8, 2025Jan 8, 2025
    • lol-html

      Public
      Low output latency streaming HTML parser/rewriter with CSS selector-based API
      Rust
      BSD 3-Clause "New" or "Revised" License
      85000Updated Jan 6, 2025Jan 6, 2025
    • Python binding to Modest engine (fast HTML5 parser with CSS selectors).
      Cython
      MIT License
      71000Updated Jan 5, 2025Jan 5, 2025
    • chompjs

      Public
      Parsing JavaScript objects into Python dictionaries
      C
      MIT License
      12000Updated Jan 4, 2025Jan 4, 2025
    • A browser driver on top of puppeteer, ready for production scenarios.
      JavaScript
      MIT License
      79000Updated Jan 1, 2025Jan 1, 2025
    • Python
      MIT License
      2000Updated Dec 30, 2024Dec 30, 2024
    • Crawlera middleware for Scrapy
      Python
      BSD 3-Clause "New" or "Revised" License
      88000Updated Dec 30, 2024Dec 30, 2024
    • courlan

      Public
      Clean, filter, normalize, and sample URLs to optimize crawls
      Python
      Apache License 2.0
      9002Updated Dec 30, 2024Dec 30, 2024
    • Python
      BSD 3-Clause "New" or "Revised" License
      9000Updated Dec 27, 2024Dec 27, 2024
    • List of libraries, tools and APIs for web scraping and data processing.
      Makefile
      Other
      796000Updated Dec 27, 2024Dec 27, 2024