-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Support flag to crawl only the root website. Do not hop to external links #11
Comments
I have a question, isn't this already achievable through |
if we set max_link=0 it will crawl only the Say for example we are passing the root_url as What we want to achieve in this issue is we that, it should only crawl internal links. Every links that has This will be useful in creating sitemap for a website. LMK if you have any more questions. |
Alright makes sense. What should I call the argument then, something like |
Oh wait there's already a pr open for this |
Would you like to pick this up? This is very similar to what we discussed. |
Sure! |
@devavinothm Are you working on this? #14 |
@indrajithi I can complete his pr if you want |
@Mews I have updated the description. Assigning to you. 🥇 |
Thanks, I'm going to sleep right now but I'll get to it tomorrow morning :) |
Very straightforward feature to add a flag to crawl only the root website and do not crawl to external links.
eg: If the root url provided is https://github.com. It should crawl pages in this domain only. It should not crawl https://exmaple.com
(optional) Can we also support an option to crawl only external links and no internal links. There could be some use cases for that
The text was updated successfully, but these errors were encountered: