Skip to content

Latest commit

 

History

History
106 lines (78 loc) · 4.96 KB

README.md

File metadata and controls

106 lines (78 loc) · 4.96 KB

Homebrewing Forum Recipe Scraper

This Python script allows you to download recipes posted on homebrewing forums. It uses the requests and BeautifulSoup libraries to send HTTP requests and parse the HTML content of web pages. The script is designed to be simple and easy to modify to target specific websites.

Installation

Before running the script, make sure to have Python installed and install the required libraries with the following command:

pip install requests beautifulsoup4

Usage

  1. Open the script in your favorite code editor.
  2. Replace the url variable value with the URL of the specific homebrewing forum you want to scrape.
  3. Update the class names in the extract_recipe function to match the actual classes used by the forum for recipe titles, authors, dates, ingredients, and instructions.
  4. If the forum requires pagination or authentication, you'll need to modify the script to handle those requirements.
  5. Run the script with the following command:

python homebrew_recipe_scraper.py

The script will download the recipes from the specified URL and save them in a JSON file called homebrew_recipes.json. The output file will contain an array of recipe objects with the following properties:

  • title: The recipe title
  • author: The user who posted the recipe
  • date: The date the recipe was posted
  • ingredients: The list of ingredients for the recipe
  • instructions: The steps to prepare the homebrew

Example

  {
    "title": "Awesome IPA",
    "author": "Brewmaster99",
    "date": "2023-03-01",
    "ingredients": "10 lbs Pale Malt, 2 lbs Munich Malt, 1 lb Caramel Malt, 2 oz Cascade Hops, 2 oz Citra Hops, 1 oz Amarillo Hops, 1 pkg American Ale Yeast",
    "instructions": "Mash grains at 152°F for 60 minutes. Sparge and collect wort. Boil for 60 minutes, adding hops according to the schedule. Cool wort, pitch yeast, and ferment at 68°F for 2 weeks. Bottle or keg and enjoy!"
  },
  ...
]

Important Note

Before scraping any website, ensure you comply with the site's terms of service and robots.txt file. Web scraping may be against the terms of service for some websites, and failure to comply with these terms could result in consequences such as being banned from the site.

Customization and Advanced Usage

The script provided is a basic template that can be customized to suit various homebrewing forums or even other types of websites. Some possible customizations and advanced usage tips are listed below.

Handling Pagination

Many forums have multiple pages of recipes. To scrape recipes from all pages, you can modify the script to loop through the pages and extract recipes. Find the pagination element on the website and adjust the loop accordingly. Here's an example:

total_pages = 10
base_url = "https://www.example-homebrewing-forum.com/recipes?page="

for page_num in range(1, total_pages + 1):
    page_url = base_url + str(page_num)
    response = requests.get(page_url)
    soup = BeautifulSoup(response.content, "html.parser")
    
    # ...rest of the scraping code as shown earlier...

Handling Authentication

Some forums may require user authentication to access the recipes. You can modify the script to handle authentication by using requests.Session() to maintain cookies between requests. This example demonstrates a basic login using a POST request:

from bs4 import BeautifulSoup
import json

# Add your login credentials here
username = "your_username"
password = "your_password"

# Replace these URLs with the actual login and recipe URLs
login_url = "https://www.example-homebrewing-forum.com/login"
url = "https://www.example-homebrewing-forum.com/recipes"

# Create a session to maintain cookies
session = requests.Session()

# Send a POST request to the login URL with the required credentials
login_data = {"username": username, "password": password}
session.post(login_url, data=login_data)

# Now use the session to send the GET request and parse the content with BeautifulSoup
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")

# ...rest of the scraping code as shown earlier...

Rate Limiting

To avoid overloading the target website, it's a good practice to implement rate limiting or add delays between requests. You can use the time.sleep() function to add delays:


# Add a delay between requests (in seconds)
delay = 2

for page_num in range(1, total_pages + 1):
    # ...scraping code...
    
    # Add a delay before fetching the next page
    time.sleep(delay)

Conclusion

This script provides a basic starting point for web scraping recipes from homebrewing forums. Remember to always respect the target website's terms of service and robots.txt file and customize the script to suit the specific forum you're targeting. With these customizations, you can build a more advanced web scraper to download recipes or other data from various websites.