-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
8 changed files
with
307 additions
and
188 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,66 @@ | ||
What is **e621dl**? | ||
=============== | ||
**e621dl** is an automated downloader for e621.net, which enables you to keep your | ||
favorite artists or tags up to date. | ||
**e621dl** is an automated downloader for e621.net that keeps your favorite tags, artists, or searches up-to-date. | ||
|
||
Each time **e621dl** is run, it will download all files that contain at least one | ||
tracked tag, that have been uploaded since the last time **e621dl** was run. | ||
How does **e621dl** work? | ||
=============== | ||
The behavior of e621dl is dependent on two files that tell it two crucial things: | ||
|
||
1. ***e621dl*** **has to know what tags, artists, or searches you'd like to track.** To determine this, it will look for a file called `tags.txt`. Don't worry about creating this file, **e621dl** will create a blank one with instructions the first time you run it. | ||
2. ***e621dl*** **has to know the last time you ran it.** To find this, it will look in a file called `config.txt`. Again, don't create this file yourself, instead just run **e621dl** and it will create a config file for you with default settings. One of these settings, `"last_run"` tells **e621dl** when the last time it ran was. | ||
|
||
Getting Started | ||
=============== | ||
**e621dl** requires Python 2.7, so download and install that first. Once you have Python installed: | ||
|
||
1. Download or clone this project | ||
2. In the same directory as `e621dl.py`, create a file called `tags.txt` | ||
3. Add tags or artists you wish to download to `tags.txt`, one tag per line. | ||
|
||
example `tags.txt`: | ||
- [Download the latest release] (https://github.com/wwyaiykycnf/e621dl/releases/latest) and unzip it. | ||
- Run `e621dl.py`. You should see something like: | ||
``` | ||
cat | ||
dog | ||
> ./e621dl.py | ||
configfile ERROR new default file created: config.txt | ||
configfile ERROR verify this new config file and re-run the program | ||
tagfile ERROR new default file created: tags.txt | ||
tagfile ERROR please add tags you wish to this file and re-run the program | ||
e621dl ERROR error(s) encountered during initialization, see above | ||
``` | ||
It's not as bad as it looks. **e621dl** is telling you that it couldn't find a config file or tags file, so it created these files. Most users will not need to modify `config.txt` but feel free to look at it and see what settings you can change. | ||
|
||
- Add tags or artists you wish to download to `tags.txt`. There should already be instructions in the `tags.txt` that was created for you. All lines starting with a `#` are ignored by **e621dl**, so feel free to leave the instructions in the file after adding your tags, if you wish. | ||
|
||
Once you've added a few lines to `tags.txt` and reviewed `config.txt`, you're ready to run **e621dl**! | ||
|
||
*Note: e621dl has only been tested with a single per line, but may work with more.* | ||
|
||
Running **e621dl** | ||
=============== | ||
**e621dl** requires Python 2.7, so make sure you have that. | ||
|
||
Running `e621dl.py` will begin an 'update'. The tags/artists listed in | ||
`tags.txt` will be checked, one at a time, to see if there are any files that | ||
have been uploaded since the last time **e621dl** was run. | ||
When you run **e621dl**, it will determine the time it was last run, and then: | ||
- read a line from `tags.txt` | ||
- perform a search on e621.net using that line | ||
- download all new files matching that search (files uploaded AFTER the last time **e621dl** was run) | ||
|
||
This process will be repeated for each line in `tags.txt` until the file has been completely processed. The last run date will then be set to yesterday's date, and **e621dl** will report the number of total downloads. | ||
|
||
### I did that and not much happened | ||
The first time you run **e621dl**, its possible that not too much will happen. Remember when `config.txt` was created and you opened it and saw that the `"last_run"` date was set to yesterday? **e621dl** does that automatically, and it's set that way intentionally. **e621dl** might check some old uploads to make sure it didn't miss anything, but it will never re-download old files. | ||
|
||
Anyway, if that's not what you wanted for your very first update, you'll need to: | ||
|
||
### Change the last run date | ||
You can set the last run date to any date that follows the format `YYYY-MM-DD`. So if you'd like to download EVERYTHING of a given tag from the beginning of time (Ok, the beginning of e621.net) until today, change the `"last_run"` variable in `config.txt` to something old: | ||
|
||
"last_run": "1986-01-01", | ||
|
||
Bugs, New Features, and Feedback | ||
================= | ||
If you find something broken or have any ideas about features you'd like to see in the future, please contact me at [[email protected]]. I read every single email, so even if you think your idea is off-the-wall, or your bug is super-rare, please let me know and I'll see what I can do. | ||
|
||
Donations | ||
=============== | ||
If you've benefitted from this *free* project, please consider donating something! Your support enables bug fixes, new features, and future development! | ||
|
||
[![Wishlist browse button](http://img.shields.io/amazon/wishlist.png?color=blue)](http://amzn.com/w/2F4EC3BPU9JON "Support me by buying something for me on Amazon") | ||
[![BitCoin donate button](http://img.shields.io/bitcoin/donate.png?color=brightgreen)](https://coinbase.com/checkouts/1FZR3iP9zHRqQZeG8zg8Tmx471jP1c8eYe "Make a donation to this project using BitCoin") | ||
[![DogeCoin donate button](http://img.shields.io/dogecoin/donate.png?color=yellow)](README.md#note-dogecoin-donations-may-be-sent-to-dkfycmjxndgaqhq5wdyjlneoqd3xbdygdr "Many donate. So Project. Wow. Very DogeCoin.") | ||
|
||
The first time you run **e621dl**, not much will happen. When **e621dl** cannot | ||
determine the last time it was run (e.g., the first time it is run) the current | ||
date is used. | ||
#######Note: *[Dogecoin](http://dogecoin.com) donations may be sent to:* `DKfycmjXNDgaqhQ5wdyJLNEoqd3XBDyGdr` | ||
|
||
The last run date may be altered by modifying `.lastrun.txt`, but be sure to | ||
match the YYYY-MM-DD format present in `.lastrun.txt` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,111 +1,160 @@ | ||
#!/usr/bin/env python | ||
# pylint: disable=missing-docstring,line-too-long,too-many-public-methods, | ||
import os.path | ||
import datetime | ||
import logging | ||
import init_e621dl | ||
import sys | ||
import regex | ||
import lib.support as support | ||
import lib.default as default | ||
import datetime | ||
import cPickle as pickle | ||
import json | ||
import lib.e621_api as e621_api | ||
import re | ||
|
||
# IMPORTANT: Update this line to point to your tagfile | ||
TAGFILE = 'tags.txt' | ||
|
||
def get_last_run(): | ||
try: | ||
with open(".lastrun.txt", 'r') as lastrun: | ||
return lastrun.read() | ||
############################################################################## | ||
# INITIALIZATION | ||
# - parse command line arguments | ||
# - create a logger to show runtime messages | ||
# - open config file | ||
# - open file containing tracked tags | ||
# - populate the recent downloads cache | ||
############################################################################## | ||
CONFIG_FILE = 'config.txt' # modify to use different config file | ||
|
||
# get args dictionary | ||
ARGS = support.get_args_dict() | ||
|
||
# set up logging | ||
logging.basicConfig(level=ARGS['log_lvl'], format=default.LOGGER_FMT, | ||
stream=sys.stderr) | ||
LOG = logging.getLogger('e621dl') | ||
|
||
except IOError: | ||
return datetime.date.today().strftime("%Y-%m-%d") | ||
# this flag will be set to true if a fatal error occurs in pre-update | ||
EARLY_TERMINATE = False | ||
|
||
# read the config file. if not found, create a new one | ||
if not os.path.isfile(CONFIG_FILE): | ||
CONFIG = support.make_default_configfile(CONFIG_FILE) | ||
EARLY_TERMINATE = True | ||
else: | ||
CONFIG = support.read_configfile(CONFIG_FILE) | ||
|
||
# read the tags file. if not found, create a new one | ||
TAGS = [] | ||
if not os.path.isfile(CONFIG['tag_file']): | ||
support.make_default_tagfile(CONFIG['tag_file']) | ||
EARLY_TERMINATE = True | ||
else: | ||
TAGS = support.read_tagfile(CONFIG['tag_file']) | ||
|
||
# tags was read but contained nothing | ||
if len(TAGS) == 0 and not EARLY_TERMINATE: | ||
LOG.error('no tags found in ' + CONFIG['tag_file']) | ||
LOG.error('add tags (or groups of tags) to this file and re-run the program') | ||
EARLY_TERMINATE = True | ||
|
||
# open the cache (this can't really fail; just creates a new blank one) | ||
CACHE = support.get_cache(CONFIG['cache_name'], CONFIG['cache_size']) | ||
|
||
# create the downloads directory if needed | ||
if not os.path.exists(CONFIG['downloads']): | ||
os.makedirs(CONFIG['downloads']) | ||
|
||
# keeps running total of files downloaded in this run | ||
TOTAL_DOWNLOADS = 0 | ||
|
||
def set_last_run(): | ||
with open(".lastrun.txt", 'w') as lastrun: | ||
yesterday = datetime.date.fromordinal(datetime.date.today().toordinal()-1) | ||
lastrun.write(yesterday.strftime("%Y-%m-%d")) | ||
# exit before updating if any errors occurred in pre-update | ||
if EARLY_TERMINATE: | ||
LOG.error('error(s) encountered during initialization, see above') | ||
exit() | ||
else: | ||
LOG.debug('successfully initialized\n') | ||
|
||
def download_image(link, tag, path): | ||
filename = regex.get_filename(link) | ||
tag = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', tag) | ||
completepath = path + tag + "_" + filename | ||
############################################################################## | ||
# UPDATE | ||
# - for each tag (or tag group) in the tagfile: | ||
# - for each upload since the last time e621dl was run: | ||
# - if the file has not previously been downloaded, download it | ||
# - count number of downloads for reporting in post-update | ||
############################################################################## | ||
|
||
if os.path.isfile(completepath) == True: | ||
return ' skipped (already exists)' | ||
LOG.info("e621dl was last run on " + CONFIG['last_run']) | ||
|
||
elif filename in CACHE: | ||
return ' skipped (previously downloaded)' | ||
for line in TAGS: | ||
LOG.info("Checking for new uploads tagged: " + line) | ||
|
||
else: | ||
with open(completepath, 'wb') as dest: | ||
source = SPOOF.open(link) | ||
dest.write(source.read()) | ||
# prepare to start accumulating list of download links for line | ||
accumulating = True | ||
current_page = 1 | ||
links_to_download = [] | ||
|
||
CACHE.push(filename) | ||
pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL) | ||
while accumulating: | ||
LOG.debug('getting page ' + str(current_page) + ' of ' + line) | ||
links_found = e621_api.get_posts(line, CONFIG['last_run'], | ||
current_page, default.MAX_RESULTS) | ||
|
||
return ' downloaded' | ||
if not links_found: | ||
accumulating = False | ||
|
||
# parse arguments and set up logger | ||
ARGS = init_e621dl.get_args() | ||
logging.basicConfig(level=init_e621dl.get_log_level(ARGS), | ||
format='%(name)-8s %(levelname)-8s %(message)s', stream=sys.stderr) | ||
LOG = logging.getLogger('e621dl') | ||
else: | ||
# add links found to list to be downloaded | ||
links_to_download += links_found | ||
# continue accumulating if found == max, else stop accumulation | ||
accumulating = len(links_found) == default.MAX_RESULTS | ||
current_page += 1 | ||
|
||
# prepare to run | ||
CACHE = init_e621dl.get_downloads_list('.cache') | ||
TAGS = init_e621dl.read_tags_file(TAGFILE) | ||
LASTRUN = get_last_run() | ||
DOWNLOAD_DIR = '''./downloads/''' | ||
TOTAL_DOWNLOADS = 0 | ||
SPOOF = init_e621dl.SpoofOpen() | ||
remaining = len(links_to_download) | ||
|
||
if not os.path.exists(DOWNLOAD_DIR): | ||
os.makedirs(DOWNLOAD_DIR) | ||
if remaining == 0: | ||
LOG.info('no new uploads for: ' + line) | ||
|
||
LOG.info("e621dl was last run on " + LASTRUN) | ||
else: | ||
LOG.info(str(remaining) + ' new uploads for: ' + line) | ||
|
||
for tag in TAGS: | ||
LOG.info("Checking for new uploads tagged: " + tag) | ||
for item in links_to_download: | ||
|
||
accumulating = True | ||
page_number = 1 | ||
dl_links = [] | ||
LOG.debug('item md5 = ' + item.md5) | ||
# construct full filename | ||
filename = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', line) + '--' + \ | ||
item.md5 + '.' + item.ext | ||
|
||
# get all post pages in same list | ||
while accumulating: | ||
results_page = regex.get_results_page(tag, LASTRUN, page_number) | ||
# skip if already in cache | ||
if item.md5 in CACHE: | ||
LOG.info('(' + str(remaining) + ') skipped (previously downloaded)') | ||
|
||
if regex.results_exist(results_page): | ||
LOG.debug('page ' + str(page_number) + ' contained results') | ||
dl_links += regex.get_links(results_page) | ||
page_number += 1 | ||
# skip if already in download directory | ||
elif os.path.isfile(filename): | ||
LOG.info('(' + str(remaining) + ') skipped (already in downloads directory') | ||
|
||
else: | ||
LOG.debug('page ' + str(page_number) + ' contained nothing') | ||
accumulating = False | ||
# otherwise, download it | ||
else: | ||
LOG.info('(' + str(remaining) + ') downloading... ') | ||
e621_api.download(item.url, CONFIG['downloads'] + filename) | ||
|
||
LOG.info('number of uploads found: ' + str(len(dl_links))) | ||
# push to cache, write cache to disk | ||
CACHE.push(item.md5) | ||
pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL) | ||
TOTAL_DOWNLOADS += 1 | ||
|
||
# download image in each post page | ||
remaining = len(dl_links) | ||
for link in dl_links: | ||
status = download_image(link, tag, DOWNLOAD_DIR) | ||
img_string = '(%d) %s' % (remaining, regex.get_filename(link)) | ||
# decrement remaining downloads | ||
remaining -= 1 | ||
|
||
LOG.info(img_string + status) | ||
if status == ' downloaded': | ||
TOTAL_DOWNLOADS += 1 | ||
remaining -= 1 | ||
LOG.debug('update for ' + line + ' completed\n') | ||
print '' | ||
|
||
LOG.info('update for ' + tag + ' completed') | ||
LOG.info('') | ||
############################################################################## | ||
# WRAP-UP | ||
# - report number of downloads in this session | ||
# - set last run to yesterday (see FAQ for why it isn't today) | ||
############################################################################## | ||
LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS)) | ||
YESTERDAY = datetime.date.fromordinal(datetime.date.today().toordinal()-1) | ||
CONFIG['last_run'] = YESTERDAY.strftime(default.DATETIME_FMT) | ||
|
||
#set_last_run() | ||
with open(CONFIG_FILE, 'wb') as outfile: | ||
json.dump(CONFIG, outfile, indent=4, sort_keys=True, | ||
ensure_ascii=False, separators=(',', ':\t\t')) | ||
|
||
LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS)) | ||
set_last_run() | ||
LOG.info('last run set to ' + get_last_run()) | ||
LOG.info('last run updated to ' + CONFIG['last_run']) | ||
|
||
exit() | ||
|
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
Oops, something went wrong.