RobotsSniffer

RobotsSniffer is a command-line tool written in C# for analyzing the robots.txt files of websites. The tool retrieves and parses robots.txt files to determine which paths are allowed or disallowed for web crawlers, helping users understand site restrictions and accessibility rules.

Features

Retrieve and analyze robots.txt files from a single URL or a list of URLs.
Parse the robots.txt file to display allowed and disallowed paths.
Optionally save the results to an output file.
Multi-threaded processing for improved performance when working with multiple URLs.
Configurable timeout for HTTP requests.
Gets sitemap URLs from robots.txt

Requirements

.NET 9 (for Compiling/Debugging)
Brain (optional)

Usage

Syntax

RobotsSniffer -u <url> | -l <url-list> [-o <output-file>] [-timeout <ms>]

Arguments

Argument	Description
`-u <url>`	Analyze the `robots.txt` file of a single URL.
`-l <url-list>`	Provide a file containing multiple URLs (one per line) to analyze in batch.
`-o <output>`	Save the results to the specified file. Optional.
`-timeout <ms>`	Set the HTTP request timeout in milliseconds (default: 5000).

Examples

Analyze a Single URL

RobotsSniffer -u https://example.com

Output:

[>] Url: https://example.com
[+] Checking url...
[+] Robots.txt found.
[?] Robots.txt content:
[?] Allowed:
[+] /
[?] Disallowed:
[-] /admin
[-] /private

Analyze a List of URLs

RobotsSniffer -l urls.txt -o output.txt

Where urls.txt contains:

https://example.com
https://another-site.com

Output:

Results are printed to the console and saved in output.txt.

How It Works

Argument Parsing: The tool validates and processes the command-line arguments to determine the mode of operation:
- Single URL (-u).
- Multiple URLs from a file (-l).
Fetching robots.txt: For each URL, the tool attempts to fetch the robots.txt file by appending /robots.txt to the base URL.
Parsing the Content: The robots.txt content is parsed to extract allowed (Allow) and disallowed (Disallow) paths.
Output: Results are displayed in the console and optionally written to the specified output file.
Parallel Processing: When analyzing multiple URLs, the tool uses multithreading (Parallel.ForEach) to process URLs concurrently for better performance.

Future Improvements

Support for identifying and extracting Sitemap URLs from robots.txt.
Enhanced error reporting and logging.
Option to customize the number of concurrent threads for URL processing.
HTTP headers customization (e.g., user-agent string).

Contributing

Contributions are welcome! If you'd like to add features, improve performance, or fix issues, feel free to submit a pull request.

Author

RobotsSniffer was created as a utility tool for web analysis, helping users understand how websites interact with web crawlers. Author takes no responsibility for the misuse of this tool.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Compile.bat		Compile.bat
LICENSE		LICENSE
Program.cs		Program.cs
RobotsSniffer.csproj		RobotsSniffer.csproj
RobotsSniffer.sln		RobotsSniffer.sln
image.png		image.png
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RobotsSniffer

Features

Requirements

Usage

Syntax

Arguments

Examples

Analyze a Single URL

Analyze a List of URLs

How It Works

Future Improvements

Contributing

Author

About

Releases 1

Languages

License

9dl/RobotsSniffer

Folders and files

Latest commit

History

Repository files navigation

RobotsSniffer

Features

Requirements

Usage

Syntax

Arguments

Examples

Analyze a Single URL

Analyze a List of URLs

How It Works

Future Improvements

Contributing

Author

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Languages