This project demonstrates how to scrape data from Reddit using Python's PRAW (Python Reddit API Wrapper) library. The script allows you to extract valuable information from subreddit posts, making it easy to analyze Reddit content programmatically.
- Scrape top posts from any subreddit
- Extract key post information:
- Post title
- Score
- URL
- Number of comments
- Post body text
- Creation date
- Author information
- Post ID
- Python 3.7+
- PRAW library
- pandas
- python-dotenv
To use this script, you'll need to:
- Create a Reddit Account
- Set up a Reddit Developer Application
- Go to https://www.reddit.com/prefs/apps
- Click "Create App" or "Create Another App"
- Choose "script" as the application type
- Fill in the necessary details
- Note down the following credentials:
- Client ID
- Client Secret
- User Agent
- Clone the repository:
git clone https://github.com/yourusername/Reddit_Scraping.git
cd Reddit_Scraping
- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
- Install required packages:
pip install praw pandas python-dotenv
Create a .env
file in the project root with your Reddit API credentials:
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USER_AGENT=your_user_agent
REDDIT_USERNAME=your_reddit_username
REDDIT_PASSWORD=your_reddit_password
- Never share your
.env
file publicly - Add
.env
to your.gitignore
from reddit_scraper import RedditScraper
# Initialize scraper
scraper = RedditScraper()
# Scrape top posts from a subreddit
datascience_posts = scraper.scrape_subreddit(
subreddit_name='datascience',
sort_by='top',
time_filter='all',
limit=20
)
# Save scraped data
scraper.save_to_file(datascience_posts)
# Scrape multiple subreddits
multi_subreddit_data = scraper.scrape_multiple_subreddits(
['datascience', 'MachineLearning', 'learnpython'],
limit=30
)
- Change
sort_by
: 'top', 'hot', 'new' - Modify
time_filter
: 'all', 'year', 'month', 'week', 'day' - Adjust
limit
to control number of posts
- Respect Reddit's API Terms of Service
- Be mindful of rate limits
- Use scraping responsibly
- Ensure all environment variables are correctly set
- Check your internet connection
- Verify Reddit API credentials
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
This project is for educational purposes. Always respect Reddit's terms of service and API usage guidelines.
Your Name - [Your Email or LinkedIn]
Project Link: https://github.com/koolgax99/reddit-scrapping-praw