diff --git a/README.md b/README.md index 402d32d..2b28460 100644 --- a/README.md +++ b/README.md @@ -1,16 +1,13 @@ -What is **e621dl**? -=============== +### What is **e621dl**? **e621dl** is an automated downloader for e621.net that keeps your favorite tags, artists, or searches up-to-date. -How does **e621dl** work? -=============== +### How does **e621dl** work? The behavior of e621dl is dependent on two files that tell it two crucial things: -1. ***e621dl*** **has to know what tags, artists, or searches you'd like to track.** To determine this, it will look for a file called `tags.txt`. Don't worry about creating this file, **e621dl** will create a blank one with instructions the first time you run it. +1. ***e621dl*** **has to know what tags, artists, or searches you'd like to track.** To determine this, it will look for a *tag file*. Don't worry about creating this file, **e621dl** will create a blank one (called `tags.txt`) with instructions the first time you run it. 2. ***e621dl*** **has to know the last time you ran it.** To find this, it will look in a file called `config.txt`. Again, don't create this file yourself, instead just run **e621dl** and it will create a config file for you with default settings. One of these settings, `"last_run"` tells **e621dl** when the last time it ran was. -Getting Started -=============== +### Getting Started **e621dl** requires Python 2.7, so download and install that first. Once you have Python installed: - [Download the latest release] (https://github.com/wwyaiykycnf/e621dl/releases/latest) and unzip it. @@ -25,42 +22,48 @@ e621dl ERROR error(s) encountered during initialization, see above ``` It's not as bad as it looks. **e621dl** is telling you that it couldn't find a config file or tags file, so it created these files. Most users will not need to modify `config.txt` but feel free to look at it and see what settings you can change. -- Add tags or artists you wish to download to `tags.txt`. There should already be instructions in the `tags.txt` that was created for you. All lines starting with a `#` are ignored by **e621dl**, so feel free to leave the instructions in the file after adding your tags, if you wish. +- Add tags or artists you wish to download to the newly-created tag file. There should already be instructions in the `tags.txt` that was created for you. All lines starting with a `#` are ignored by **e621dl**, so feel free to leave the instructions in the file after adding your tags, if you wish. -Once you've added a few lines to `tags.txt` and reviewed `config.txt`, you're ready to run **e621dl**! +Once you've added a few lines to the tag file and reviewed `config.txt`, you're ready to run **e621dl**! -Running **e621dl** -=============== - +### Running **e621dl** When you run **e621dl**, it will determine the time it was last run, and then: -- read a line from `tags.txt` +- read a line from the tag file - perform a search on e621.net using that line - download all new files matching that search (files uploaded AFTER the last time **e621dl** was run) -This process will be repeated for each line in `tags.txt` until the file has been completely processed. The last run date will then be set to yesterday's date, and **e621dl** will report the number of total downloads. +This process will be repeated for each line in the tag file, until every line has been checked. The last run date will then be set to yesterday's date, and **e621dl** will report the number of total downloads. -### I did that and not much happened -The first time you run **e621dl**, its possible that not too much will happen. Remember when `config.txt` was created and you opened it and saw that the `"last_run"` date was set to yesterday? **e621dl** does that automatically, and it's set that way intentionally. **e621dl** might check some old uploads to make sure it didn't miss anything, but it will never re-download old files. +**Example Output:** +``` +> ./e621dl.py +e621dl INFO e621dl was last run on 2014-06-18 +e621dl INFO Checking for new uploads tagged: cat +e621dl INFO 10 new uploads for: cat +e621dl INFO will download 10 (cached: 0, existing: 0) -Anyway, if that's not what you wanted for your very first update, you'll need to: +e621dl INFO starting download of 10 files -### Change the last run date -You can set the last run date to any date that follows the format `YYYY-MM-DD`. So if you'd like to download EVERYTHING of a given tag from the beginning of time (Ok, the beginning of e621.net) until today, change the `"last_run"` variable in `config.txt` to something old: +Downloading: [###################################] 100.00% Done... + +e621dl INFO successfully downloaded 10 files +e621dl INFO last run updated to 2014-06-18 +``` - "last_run": "1986-01-01", +### Configuring **e621dl** +Please see [How Do Config File](docs/config_readme.md) to learn more about **e621dl**'s settings and how to change them. -Bugs, New Features, and Feedback -================= -If you find something broken or have any ideas about features you'd like to see in the future, please contact me at [wwyaiykycnf@gmail.com]. I read every single email, so even if you think your idea is off-the-wall, or your bug is super-rare, please let me know and I'll see what I can do. +### Frequently Asked Questions -Donations -=============== -If you've benefitted from this *free* project, please consider donating something! Your support enables bug fixes, new features, and future development! +##### Very few or no downloads +The first time you run **e621dl**, its possible that not too much will happen. **e621dl** picks yesterday as the `"last_run"` date when it is first run, so if that's not what you wanted, you'll need to manually change the last run date. Look in [How Do Config File](docs/config_readme.md) for `"last_run"`. -[![Wishlist browse button](http://img.shields.io/amazon/wishlist.png?color=blue)](http://amzn.com/w/2F4EC3BPU9JON "Support me by buying something for me on Amazon") -[![BitCoin donate button](http://img.shields.io/bitcoin/donate.png?color=brightgreen)](https://coinbase.com/checkouts/1FZR3iP9zHRqQZeG8zg8Tmx471jP1c8eYe "Make a donation to this project using BitCoin") -[![DogeCoin donate button](http://img.shields.io/dogecoin/donate.png?color=yellow)](README.md#note-dogecoin-donations-may-be-sent-to-dkfycmjxndgaqhq5wdyjlneoqd3xbdygdr "Many donate. So Project. Wow. Very DogeCoin.") +### Bugs +If you experience a crash or other unexpected results, please use [the reporting instructions](docs/reporting_bugs.md) for the quickest response. -#######Note: *[Dogecoin](http://dogecoin.com) donations may be sent to:* `DKfycmjXNDgaqhQ5wdyJLNEoqd3XBDyGdr` +## Feedback and Feature Requests +If you have any ideas for how things might work better, or about features you'd like to see in the future, please send an email to wwyaiykycnf+features@gmail.com. I read every single email, so even if you think your idea is off-the-wall, please let me know and I'll see what I can do. +### Donations +If you've benefitted from this *free* project, please consider [buying me something on Amazon](http://amzn.com/w/20RZIUHXLO6R4)! Your support enables bug fixes, new features, and future development. Thanks for thinking of me! diff --git a/docs/config_readme.md b/docs/config_readme.md index c361bf2..d2fb153 100644 --- a/docs/config_readme.md +++ b/docs/config_readme.md @@ -30,8 +30,8 @@ Note that for the `cache_size`, `create_subdirectories`, and `parallel_downloads | Option Name | Quotes? | Acceptable Range | Description | | --------------------- | ------- | --------------------------- |----------------------------------------------------------- | -| download_directory | Yes | anything | path where **e621dl** puts downloads (must end with `/`) | -| create_subdirectories | No | `true` or `false` | create a subfolder for each line in tag file if true | +| download_directory | Yes | anything | path where **e621dl** puts downloads (must end with `/`) | +| create_subdirectories | No | `true` or `false` | create a subfolder for each line in tag file if true | | last_run | Yes | date (format: `YYYY-MM-DD`) | the last day **e621dl** was last run | diff --git a/docs/reporting_bugs.md b/docs/reporting_bugs.md new file mode 100644 index 0000000..ae1780a --- /dev/null +++ b/docs/reporting_bugs.md @@ -0,0 +1,33 @@ +## So bug. Much Broken. Wow. Very crash. + +Well... shit. I'm sorry that you're here. Unfortunately its impossible to test everything, but if you're willing to follow the steps below, I'll do my best to fix the problem for you and anyone else who has it. + +This system is a little clunky, but bear with me for now until I get something more formal in place. + + +### Windows Users +1. navigate to the folder where e621dl.py is located on your computer +2. [click on the address bar of windows explorer and type: cmd](http://lifehacker.com/5989434/quickly-open-a-command-prompt-from-the-windows-explorer-address-bar) +3. a command prompt should open in the folder where e621dl.py is located. +4. type/paste the following, one line at a time: + + ```Batchfile + echo %PROCESSOR_ARCHITECTURE% >> output.txt + systeminfo | findstr /C:"OS" >> output.txt + python.exe --version >> output.txt 2>&1 + python.exe e621dl.py -v >> output.txt 2>&1 + ``` +5. This will send your processor architecture, version of windows, version to python, and **debug output of e621dl** to a file called output.txt. + +6. Send an email to wwyaiykycnf+bugs@gmail.com, and attach output.txt, config.txt, and tags.txt to the email. + +### OSX/Linux/Unix +1. open a terminal and navigate to where e621dl.py is located on your computer +2. run the following: + + ```Shell + uname -a >> output.txt; python --version >> output.txt 2>&1; ./e621dl.py -v >> output.txt 2>&1 + ``` +3. Send an email to wwyaiykycnf+bugs@gmail.com, and attach output.txt, config.txt, and tags.txt to the email. + +##### Thanks for reporting bugs, so I can fix them! diff --git a/e621dl.py b/e621dl.py index 196afc6..97b2e67 100755 --- a/e621dl.py +++ b/e621dl.py @@ -10,6 +10,9 @@ import json import lib.e621_api as e621_api from lib.downloader import multi_download +from lib.version import VERSION + +if __name__ == '__main__': ############################################################################## # INITIALIZATION @@ -19,57 +22,47 @@ # - open file containing tracked tags # - populate the recent downloads cache ############################################################################## -CONFIG_FILE = 'config.txt' # modify to use different config file - -# get args dictionary - -# set up logging -logging.basicConfig( - level=support.get_verbosity_level(), - format=default.LOGGER_FMT, - stream=sys.stderr) -LOG = logging.getLogger('e621dl') - -# this flag will be set to true if a fatal error occurs in pre-update -EARLY_TERMINATE = False - -# read the config file. if not found, create a new one -if not os.path.isfile(CONFIG_FILE): - CONFIG = support.make_default_configfile(CONFIG_FILE) - EARLY_TERMINATE = True -else: - CONFIG = support.read_configfile(CONFIG_FILE) - -# read the tags file. if not found, create a new one -TAGS = [] -if not os.path.isfile(CONFIG['tag_file']): - support.make_default_tagfile(CONFIG['tag_file']) - EARLY_TERMINATE = True -else: - TAGS = support.read_tagfile(CONFIG['tag_file']) - -# tags was read but contained nothing -if len(TAGS) == 0 and not EARLY_TERMINATE: - LOG.error('no tags found in ' + CONFIG['tag_file']) - LOG.error('add tags (or groups of tags) to this file and re-run the program') - EARLY_TERMINATE = True - -# open the cache (this can't really fail; just creates a new blank one) -CACHE = support.get_cache(CONFIG['cache_name'], CONFIG['cache_size']) - -# create the downloads directory if needed -if not os.path.exists(CONFIG['download_directory']): - os.makedirs(CONFIG['download_directory']) - -# keeps running total of files downloaded in this run -TOTAL_DOWNLOADS = 0 - -# exit before updating if any errors occurred in pre-update -if EARLY_TERMINATE: - LOG.error('error(s) encountered during initialization, see above') - exit() -else: - LOG.debug('successfully initialized\n') + CONFIG_FILE = 'config.txt' # modify to use different config file + + # set up logging + logging.basicConfig( + level=support.get_verbosity_level(), + format=default.LOGGER_FMT, + stream=sys.stderr) + LOG = logging.getLogger('e621dl') + + # report current version + LOG.info('running e621dl version %s', VERSION) + + # this flag will be set to true if a fatal error occurs in pre-update + EARLY_TERMINATE = False + + # read the config file. if not found, create a new one + EARLY_TERMINATE |= not os.path.isfile(CONFIG_FILE) + CONFIG = support.get_configfile(CONFIG_FILE) + EARLY_TERMINATE |= not support.validate_config(CONFIG) + + # read the tags file. if not found, create a new one + EARLY_TERMINATE |= not os.path.isfile(CONFIG['tag_file']) + TAGS = support.get_tagfile(CONFIG['tag_file']) + + # tags was read but contained nothing + if len(TAGS) == 0 and not EARLY_TERMINATE: + LOG.error('no tags found in %s', CONFIG['tag_file']) + LOG.error('add lines to this file and re-run program') + EARLY_TERMINATE |= True + + # open the cache (this can't really fail; just creates a new blank one) + CACHE = support.get_cache(CONFIG['cache_name'], CONFIG['cache_size']) + + # create the downloads directory if needed + if not os.path.exists(CONFIG['download_directory']): + os.makedirs(CONFIG['download_directory']) + + # exit before updating if any errors occurred in pre-update + if EARLY_TERMINATE: + LOG.error('error(s) encountered during initialization, see above') + exit() ############################################################################## # UPDATE @@ -78,85 +71,84 @@ # - if the file has not previously been downloaded, download it # - count number of downloads for reporting in post-update ############################################################################## + LOG.info("e621dl was last run on %s", CONFIG['last_run']) -LOG.info("e621dl was last run on " + CONFIG['last_run']) + # keeps running total of files downloaded in this run + TOTAL_DOWNLOADS = 0 -URL_AND_NAME_LIST = [] + URL_AND_NAME_LIST = [] -for line in TAGS: - LOG.info("Checking for new uploads tagged: " + line) + for line in TAGS: + LOG.debug("Checking for new uploads tagged: %s", line) - # prepare to start accumulating list of download links for line - accumulating = True - current_page = 1 - links_in_cache = 0 - links_on_disk = 0 - potential_downloads = [] + # prepare to start accumulating list of download links for line + accumulating = True + current_page = 1 + links_in_cache = 0 + links_on_disk = 0 + potential_downloads = [] - while accumulating: - LOG.debug('getting page ' + str(current_page) + ' of ' + line) - links_found = e621_api.get_posts(line, CONFIG['last_run'], - current_page, default.MAX_RESULTS) + while accumulating: + LOG.debug('getting page %d', current_page) + links_found = e621_api.get_posts(line, CONFIG['last_run'], + current_page, default.MAX_RESULTS) - if not links_found: - accumulating = False + if not links_found: + accumulating = False - else: - # add links found to list to be downloaded - potential_downloads += links_found - # continue accumulating if found == max, else stop accumulation - accumulating = len(links_found) == default.MAX_RESULTS - current_page += 1 + else: + # add links found to list to be downloaded + potential_downloads += links_found + # continue accumulating if found == max, else stop accumulation + accumulating = len(links_found) == default.MAX_RESULTS + current_page += 1 - if len(potential_downloads) == 0: - LOG.debug('no new uploads for: ' + line) + LOG.info('%d new uploads tagged: %s', len(potential_downloads), line) - else: - will_download = 0 + if len(potential_downloads) > 0: + will_download = 0 -# there were uploads. determine should any be downloaded - LOG.info(str(len(potential_downloads)) + ' new uploads for: ' + line) - current = 0 - for idx, item in enumerate(potential_downloads): + # there were uploads. determine should any be downloaded + current = 0 + for idx, item in enumerate(potential_downloads): - LOG.debug('item md5 = ' + item.md5) - current = '\t(' + str(idx) + ') ' + LOG.debug('item md5 = %d', item.md5) + current = '\t(' + str(idx) + ') ' - # construct full filename - filename = support.safe_filename(line, item, CONFIG) + # construct full filename + filename = support.safe_filename(line, item, CONFIG) - # skip if already in cache - if item.md5 in CACHE: - links_in_cache += 1 - LOG.debug(current + 'skipped (previously downloaded)') + # skip if already in download directory + if os.path.isfile(CONFIG['download_directory'] + filename): + links_on_disk += 1 + LOG.debug('%s skipped (already in download dir', current) - # skip if already in download directory - elif os.path.isfile(CONFIG['download_directory'] + filename): - links_on_disk += 1 - LOG.debug(current + 'skipped (already in downloads directory') + # skip if already in cache + elif item.md5 in CACHE: + links_in_cache += 1 + LOG.debug('%s skipped (previously downloaded)', current) - # otherwise, download it - else: - LOG.debug(current + 'will be downloaded') - URL_AND_NAME_LIST.append( + # otherwise, download it + else: + LOG.debug('%s will be downloaded', current) + URL_AND_NAME_LIST.append( (item.url, CONFIG['download_directory'] + filename)) - will_download += 1 - # push to cache, write cache to disk - CACHE.push(item.md5) - TOTAL_DOWNLOADS += 1 + will_download += 1 + # push to cache, write cache to disk + CACHE.push(item.md5) + TOTAL_DOWNLOADS += 1 - LOG.debug('update for ' + line + ' completed\n') - LOG.info('\twill download ' + str(will_download) + \ - '\t(cached: ' + str(links_in_cache) + \ - ', existing: ' + str(links_on_disk) + ')\n') + LOG.debug('update for %s completed\n', line) + LOG.info('%5d total (%d new, %d existing, %d cached)\n', + TOTAL_DOWNLOADS, will_download, links_on_disk, links_in_cache) -if URL_AND_NAME_LIST: - print '' - LOG.info('starting download of ' + str(len(URL_AND_NAME_LIST)) + ' files') - multi_download(URL_AND_NAME_LIST, CONFIG['parallel_downloads']) -else: - LOG.info('nothing to download') + if URL_AND_NAME_LIST: + print '' + LOG.info('starting download of %d files', len(URL_AND_NAME_LIST)) + multi_download(URL_AND_NAME_LIST, CONFIG['parallel_downloads']) + else: + LOG.info('nothing to download') ############################################################################## @@ -165,15 +157,16 @@ # - report number of downloads in this session # - set last run to yesterday (see FAQ for why it isn't today) ############################################################################## -pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL) -if URL_AND_NAME_LIST: - LOG.info('successfully downloaded ' + str(TOTAL_DOWNLOADS) + ' files') -YESTERDAY = datetime.date.fromordinal(datetime.date.today().toordinal()-1) -CONFIG['last_run'] = YESTERDAY.strftime(default.DATETIME_FMT) + pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL) + if URL_AND_NAME_LIST: + LOG.info('successfully downloaded %d files', TOTAL_DOWNLOADS) + YESTERDAY = datetime.date.fromordinal(datetime.date.today().toordinal()-1) + CONFIG['last_run'] = YESTERDAY.strftime(default.DATETIME_FMT) -with open(CONFIG_FILE, 'wb') as outfile: - json.dump(CONFIG, outfile, indent=4, sort_keys=True, ensure_ascii=False) + with open(CONFIG_FILE, 'wb') as outfile: + json.dump(CONFIG, outfile, indent=4, + sort_keys=True, ensure_ascii=False) -LOG.info('last run updated to ' + CONFIG['last_run']) + LOG.info('last run updated to %s', CONFIG['last_run']) -exit() + exit() diff --git a/lib/default.py b/lib/default.py index 6806724..6ab5675 100644 --- a/lib/default.py +++ b/lib/default.py @@ -12,7 +12,7 @@ 'last_run': datetime.now().strftime(DATETIME_FMT), 'tag_file': "tags.txt", 'parallel_downloads': 8, - 'create_subdirectories': True + 'create_subdirectories': False } LOGGER_FMT = "%(name)-11s %(levelname)-8s %(message)s" diff --git a/lib/support.py b/lib/support.py index 358877c..f5119ea 100644 --- a/lib/support.py +++ b/lib/support.py @@ -5,10 +5,15 @@ import json import FixedFifo import default +import re +from types import IntType, BooleanType from urllib import FancyURLopener import cPickle as pickle import os +class SpoofOpen(FancyURLopener): + version = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12' + def get_verbosity_level(): # build the parser @@ -41,11 +46,14 @@ def make_default_configfile(filename): json.dump(default.CONFIG_FILE, outfile, indent=4, sort_keys=True,) return default.CONFIG_FILE -def read_configfile(filename): +def get_configfile(filename): log = logging.getLogger('configfile') - with open(filename, 'r') as infile: - log.debug('opened ' + filename) - return json.load(infile) + if not os.path.isfile(filename): + return make_default_configfile + else: + with open(filename, 'r') as infile: + log.debug('opened ' + filename) + return json.load(infile) def make_default_tagfile(filename): log = logging.getLogger('tagfile') @@ -55,31 +63,34 @@ def make_default_tagfile(filename): log.error('new default file created: ' + filename) log.error('please add tags you wish to this file and re-run the program') -def read_tagfile(filename): - log = logging.getLogger('tagfile') - tag_list = [] - - # read out all lines not starting with # - for line in open(filename): - raw_line = line.strip() - if not raw_line.startswith("#"): - tag_list.append(raw_line) +def get_tagfile(filename): + log = logging.getLogger('tag_file') - log.debug('opened ' + filename + ' and read ' + str(len(tag_list)) + ' items') + if not os.path.isfile(filename): + return make_default_configfile(filename) + else: + # read out all lines not starting with # + tag_list = [] + for line in open(filename): + raw_line = line.strip() + if not raw_line.startswith("#") and raw_line != '': + tag_list.append(raw_line) + + log.debug('opened %s and read %d items', filename, len(tag_list)) return tag_list def get_cache(filename, size): log = logging.getLogger('cache') try: cache = pickle.load(open(filename, 'rb')) - cache.resize(size) + cache.resize(int(size)) log.debug('loaded existing cache') - log.debug('capacity = ' + str(len(cache)) + ' of ' + str(cache.size())) - log.debug('size on disk = ' + str(os.path.getsize(filename)/1024) + 'kb') + log.debug('capacity = %d (of %d)', len(cache), cache.size()) + log.debug('size on disk = %f kb', os.path.getsize(filename)/1024) except IOError: cache = FixedFifo.FixedFifo(size) - log.debug('new blank cache created. size = ' + str(size)) + log.debug('new blank cache created. size = %d', size) return cache @@ -92,14 +103,37 @@ def safe_filename(tag_line, item, config_dict): safe_tagline = ''.join([sub_char(c) for c in tag_line]) if config_dict['create_subdirectories'] == True: - if not os.path.isdir(config_dict['download_directory'] + safe_tagline): + if not os.path.isdir(config_dict['download_directory'] + safe_tagline.decode('utf-8')): os.makedirs(config_dict['download_directory'] + safe_tagline) safe_filename = safe_tagline + '/' + item.md5 + '.' + item.ext else: - safe_filename = safe_tagline + '_' + item.md5 + '.' + item.ext + safe_filename = safe_tagline.decode('utf-8') + '_' + item.md5.decode('utf-8') + '.' + item.ext.decode('utf-8') return safe_filename -class SpoofOpen(FancyURLopener): - version = 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.12) Gecko/20070731 Ubuntu/dapper-security Firefox/1.5.0.12' +def validate_config(c): + log = logging.getLogger('config_file') + try: + assert type(c['create_subdirectories']) is BooleanType, \ + "'create_subdirectories' must be set to true or false" + assert c['parallel_downloads'] in range(1, 17),\ + "'parallel_downloads' must be a number from 1 to 16 (no quotes)" + + assert type(c['cache_size']) is IntType and \ + c['cache_size'] > 0, \ + "'cache_size' must be a number greater than 0 (no quotes)" + + assert bool(re.match(r'\d{4}-\d{2}-\d{2}', c['last_run'])) == True, \ + "'last_run' format must be: \"YYYY-MM-DD\" (quotes required" + + if not os.path.exists(c['download_directory']): + log.info('empty download directory created') + os.makedirs(c['download_directory']) + + return True + + except AssertionError as ex_msg: + log.error("could not parse config file") + log.error(ex_msg) + return False diff --git a/lib/version.py b/lib/version.py new file mode 100644 index 0000000..2f000cd --- /dev/null +++ b/lib/version.py @@ -0,0 +1,2 @@ +#!/usr/bin/env python +VERSION = "2.3.5"