Merge branch 'develop' for v2.0.0

wwyaiykycnf · Jun 17, 2014 · 93b1d56 · 93b1d56
2 parents 897a681 + 5b42fe5
commit 93b1d56
Show file tree

Hide file tree

Showing 8 changed files with 307 additions and 188 deletions.
diff --git a/README.md b/README.md
@@ -1,37 +1,66 @@
 What is **e621dl**?
 ===============
-**e621dl** is an automated downloader for e621.net, which enables you to keep your 
-favorite artists or tags up to date.  
+**e621dl** is an automated downloader for e621.net that keeps your favorite tags, artists, or searches up-to-date.
 
-Each time **e621dl** is run, it will download all files that contain at least one 
-tracked tag, that have been uploaded since the last time **e621dl** was run. 
+How does **e621dl** work?
+===============
+The behavior of e621dl is dependent on two files that tell it two crucial things:
+
+1. ***e621dl*** **has to know what tags, artists, or searches you'd like to track.**  To determine this, it will look for a file called `tags.txt`.  Don't worry about creating this file, **e621dl** will create a blank one with instructions the first time you run it.
+2. ***e621dl*** **has to know the last time you ran it.**  To find this, it will look in a file called `config.txt`.  Again, don't create this file yourself, instead just run **e621dl** and it will create a config file for you with default settings.  One of these settings, `"last_run"` tells **e621dl** when the last time it ran was. 
 
 Getting Started
 ===============
+**e621dl** requires Python 2.7, so download and install that first.  Once you have Python installed:
 
-1. Download or clone this project
-2. In the same directory as `e621dl.py`, create a file called `tags.txt`
-3. Add tags or artists you wish to download to `tags.txt`, one tag per line.
-
-example `tags.txt`:
+- [Download the latest release] (https://github.com/wwyaiykycnf/e621dl/releases/latest) and unzip it.
+- Run `e621dl.py`.  You should see something like:
 ```
-    cat
-    dog
+> ./e621dl.py
+configfile  ERROR    new default file created: config.txt
+configfile  ERROR    verify this new config file and re-run the program
+tagfile     ERROR    new default file created: tags.txt
+tagfile     ERROR    please add tags you wish to this file and re-run the program
+e621dl      ERROR    error(s) encountered during initialization, see above
 ```
+It's not as bad as it looks.  **e621dl** is telling you that it couldn't find a config file or tags file, so it created these files.  Most users will not need to modify `config.txt` but feel free to look at it and see what settings you can change. 
+
+- Add tags or artists you wish to download to `tags.txt`.  There should already be instructions in the `tags.txt` that was created for you.  All lines starting with a `#` are ignored by **e621dl**, so feel free to leave the instructions in the file after adding your tags, if you wish. 
+
+Once you've added a few lines to `tags.txt` and reviewed `config.txt`, you're ready to run **e621dl**!
 
-*Note: e621dl has only been tested with a single per line, but may work with more.* 
 
 Running **e621dl**
 ===============
-**e621dl** requires Python 2.7, so make sure you have that. 
 
-Running `e621dl.py` will begin an 'update'.  The tags/artists listed in
-`tags.txt` will be checked, one at a time, to see if there are any files that
-have been uploaded since the last time **e621dl** was run.
+When you run **e621dl**, it will determine the time it was last run, and then:
+- read a line from `tags.txt` 
+- perform a search on e621.net using that line
+- download all new files matching that search (files uploaded AFTER the last time **e621dl** was run)
+
+This process will be repeated for each line in `tags.txt` until the file has been completely processed.  The last run date will then be set to yesterday's date, and **e621dl** will report the number of total downloads. 
+
+### I did that and not much happened 
+The first time you run **e621dl**, its possible that not too much will happen.  Remember when `config.txt` was created and you opened it and saw that the `"last_run"` date was set to yesterday?  **e621dl** does that automatically, and it's set that way intentionally.  **e621dl** might check some old uploads to make sure it didn't miss anything, but it will never re-download old files. 
+
+Anyway, if that's not what you wanted for your very first update, you'll need to:  
+
+### Change the last run date
+You can set the last run date to any date that follows the format `YYYY-MM-DD`.  So if you'd like to download EVERYTHING of a given tag from the beginning of time (Ok, the beginning of e621.net) until today, change the `"last_run"` variable in `config.txt` to something old:
+
+    "last_run": "1986-01-01",
+
+Bugs, New Features, and Feedback
+=================
+If you find something broken or have any ideas about features you'd like to see in the future, please contact me at [[email protected]].  I read every single email, so even if you think your idea is off-the-wall, or your bug is super-rare, please let me know and I'll see what I can do. 
+
+Donations
+===============
+If you've benefitted from this *free* project, please consider donating something!  Your support enables bug fixes, new features, and future development!  
+
+[![Wishlist browse button](http://img.shields.io/amazon/wishlist.png?color=blue)](http://amzn.com/w/2F4EC3BPU9JON "Support me by buying something for me on Amazon")
+[![BitCoin donate button](http://img.shields.io/bitcoin/donate.png?color=brightgreen)](https://coinbase.com/checkouts/1FZR3iP9zHRqQZeG8zg8Tmx471jP1c8eYe "Make a donation to this project using BitCoin")
+[![DogeCoin donate button](http://img.shields.io/dogecoin/donate.png?color=yellow)](README.md#note-dogecoin-donations-may-be-sent-to-dkfycmjxndgaqhq5wdyjlneoqd3xbdygdr "Many donate.  So Project.  Wow.  Very DogeCoin.")
 
-The first time you run **e621dl**, not much will happen.  When **e621dl** cannot 
-determine the last time it was run (e.g., the first time it is run) the current
-date is used.
+#######Note: *[Dogecoin](http://dogecoin.com) donations may be sent to:* `DKfycmjXNDgaqhQ5wdyJLNEoqd3XBDyGdr`
 
-The last run date may be altered by modifying `.lastrun.txt`, but be sure to 
-match the YYYY-MM-DD format present in `.lastrun.txt`
diff --git a/e621dl.py b/e621dl.py
@@ -1,111 +1,160 @@
 #!/usr/bin/env python
 # pylint: disable=missing-docstring,line-too-long,too-many-public-methods,
 import os.path
-import datetime
 import logging
-import init_e621dl
 import sys
-import regex
+import lib.support as support
+import lib.default as default
+import datetime
 import cPickle as pickle
+import json
+import lib.e621_api as e621_api
 import re
 
-# IMPORTANT: Update this line to point to your tagfile
-TAGFILE = 'tags.txt'
-
-def get_last_run():
-    try:
-        with open(".lastrun.txt", 'r') as lastrun:
-            return lastrun.read()
+##############################################################################
+# INITIALIZATION
+# - parse command line arguments
+# - create a logger to show runtime messages
+# - open config file
+# - open file containing tracked tags
+# - populate the recent downloads cache
+##############################################################################
+CONFIG_FILE = 'config.txt' # modify to use different config file
+
+# get args dictionary
+ARGS = support.get_args_dict()
+
+# set up logging
+logging.basicConfig(level=ARGS['log_lvl'], format=default.LOGGER_FMT,
+    stream=sys.stderr)
+LOG = logging.getLogger('e621dl')
 
-    except IOError:
-        return datetime.date.today().strftime("%Y-%m-%d")
+# this flag will be set to true if a fatal error occurs in pre-update
+EARLY_TERMINATE = False
+
+# read the config file.  if not found, create a new one
+if not os.path.isfile(CONFIG_FILE):
+    CONFIG = support.make_default_configfile(CONFIG_FILE)
+    EARLY_TERMINATE = True
+else:
+    CONFIG = support.read_configfile(CONFIG_FILE)
+
+# read the tags file.  if not found, create a new one
+TAGS = []
+if not os.path.isfile(CONFIG['tag_file']):
+    support.make_default_tagfile(CONFIG['tag_file'])
+    EARLY_TERMINATE = True
+else:
+    TAGS = support.read_tagfile(CONFIG['tag_file'])
+
+# tags was read but contained nothing
+if len(TAGS) == 0 and not EARLY_TERMINATE:
+    LOG.error('no tags found in ' + CONFIG['tag_file'])
+    LOG.error('add tags (or groups of tags) to this file and re-run the program')
+    EARLY_TERMINATE = True
+
+# open the cache (this can't really fail; just creates a new blank one)
+CACHE = support.get_cache(CONFIG['cache_name'], CONFIG['cache_size'])
+
+# create the downloads directory if needed
+if not os.path.exists(CONFIG['downloads']):
+    os.makedirs(CONFIG['downloads'])
+
+# keeps running total of files downloaded in this run
+TOTAL_DOWNLOADS = 0
 
-def set_last_run():
-    with open(".lastrun.txt", 'w') as lastrun:
-        yesterday = datetime.date.fromordinal(datetime.date.today().toordinal()-1)
-        lastrun.write(yesterday.strftime("%Y-%m-%d"))
+# exit before updating if any errors occurred in pre-update
+if EARLY_TERMINATE:
+    LOG.error('error(s) encountered during initialization, see above')
+    exit()
+else:
+    LOG.debug('successfully initialized\n')
 
-def download_image(link, tag, path):
-    filename = regex.get_filename(link)
-    tag = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', tag)
-    completepath = path + tag + "_" + filename
+##############################################################################
+# UPDATE
+# - for each tag (or tag group) in the tagfile:
+#   - for each upload since the last time e621dl was run:
+#       - if the file has not previously been downloaded, download it
+# - count number of downloads for reporting in post-update
+##############################################################################
 
-    if os.path.isfile(completepath) == True:
-        return ' skipped (already exists)'
+LOG.info("e621dl was last run on " + CONFIG['last_run'])
 
-    elif filename in CACHE:
-        return ' skipped (previously downloaded)'
+for line in TAGS:
+    LOG.info("Checking for new uploads tagged: " + line)
 
-    else:
-        with open(completepath, 'wb') as dest:
-            source = SPOOF.open(link)
-            dest.write(source.read())
+    # prepare to start accumulating list of download links for line
+    accumulating = True
+    current_page = 1
+    links_to_download = []
 
-        CACHE.push(filename)
-        pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL)
+    while accumulating:
+        LOG.debug('getting page ' + str(current_page) + ' of ' + line)
+        links_found = e621_api.get_posts(line, CONFIG['last_run'],
+                current_page, default.MAX_RESULTS)
 
-        return ' downloaded'
+        if not links_found:
+            accumulating = False
 
-# parse arguments and set up logger
-ARGS = init_e621dl.get_args()
-logging.basicConfig(level=init_e621dl.get_log_level(ARGS),
-    format='%(name)-8s %(levelname)-8s %(message)s', stream=sys.stderr)
-LOG = logging.getLogger('e621dl')
+        else:
+            # add links found to list to be downloaded
+            links_to_download += links_found
+            # continue accumulating if found == max, else stop accumulation
+            accumulating = len(links_found) == default.MAX_RESULTS
+            current_page += 1
 
-# prepare to run
-CACHE = init_e621dl.get_downloads_list('.cache')
-TAGS = init_e621dl.read_tags_file(TAGFILE)
-LASTRUN = get_last_run()
-DOWNLOAD_DIR = '''./downloads/'''
-TOTAL_DOWNLOADS = 0
-SPOOF = init_e621dl.SpoofOpen()
+    remaining = len(links_to_download)
 
-if not os.path.exists(DOWNLOAD_DIR):
-    os.makedirs(DOWNLOAD_DIR)
+    if remaining == 0:
+        LOG.info('no new uploads for: ' + line)
 
-LOG.info("e621dl was last run on " + LASTRUN)
+    else:
+        LOG.info(str(remaining) + ' new uploads for: ' + line)
 
-for tag in TAGS:
-    LOG.info("Checking for new uploads tagged: " + tag)
+        for item in links_to_download:
 
-    accumulating = True
-    page_number   = 1
-    dl_links = []
+            LOG.debug('item md5 = ' + item.md5)
+            # construct full filename
+            filename = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', line) + '--' + \
+                item.md5 + '.' + item.ext
 
-    # get all post pages in same list
-    while accumulating:
-        results_page = regex.get_results_page(tag, LASTRUN, page_number)
+            # skip if already in cache
+            if item.md5 in CACHE:
+                LOG.info('(' + str(remaining) + ') skipped (previously downloaded)')
 
-        if regex.results_exist(results_page):
-            LOG.debug('page ' + str(page_number) + ' contained results')
-            dl_links += regex.get_links(results_page)
-            page_number += 1
+            # skip if already in download directory
+            elif os.path.isfile(filename):
+                LOG.info('(' + str(remaining) + ') skipped (already in downloads directory')
 
-        else:
-            LOG.debug('page ' + str(page_number) + ' contained nothing')
-            accumulating = False
+            # otherwise, download it
+            else:
+                LOG.info('(' + str(remaining) + ') downloading... ')
+                e621_api.download(item.url, CONFIG['downloads'] + filename)
 
-    LOG.info('number of uploads found: ' + str(len(dl_links)))
+                # push to cache, write cache to disk
+                CACHE.push(item.md5)
+                pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL)
+                TOTAL_DOWNLOADS += 1
 
-    # download image in each post page
-    remaining = len(dl_links)
-    for link in dl_links:
-        status = download_image(link, tag, DOWNLOAD_DIR)
-        img_string = '(%d) %s' % (remaining, regex.get_filename(link))
+            # decrement remaining downloads
+            remaining -= 1
 
-        LOG.info(img_string + status)
-        if status == ' downloaded':
-            TOTAL_DOWNLOADS += 1
-        remaining -= 1
+        LOG.debug('update for ' + line + ' completed\n')
+    print ''
 
-    LOG.info('update for ' + tag + ' completed')
-    LOG.info('')
+##############################################################################
+# WRAP-UP
+# - report number of downloads in this session
+# - set last run to yesterday (see FAQ for why it isn't today)
+##############################################################################
+LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS))
+YESTERDAY = datetime.date.fromordinal(datetime.date.today().toordinal()-1)
+CONFIG['last_run'] = YESTERDAY.strftime(default.DATETIME_FMT)
 
-#set_last_run()
+with open(CONFIG_FILE, 'wb') as outfile:
+    json.dump(CONFIG, outfile, indent=4, sort_keys=True,
+        ensure_ascii=False, separators=(',', ':\t\t'))
 
-LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS))
-set_last_run()
-LOG.info('last run set to ' + get_last_run())
+LOG.info('last run updated to ' + CONFIG['last_run'])
 
 exit()
-
diff --git a/init_e621dl.py b/init_e621dl.py
diff --git a/FixedFifo.py → lib/FixedFifo.py b/FixedFifo.py → lib/FixedFifo.py
@@ -51,6 +51,9 @@ def push(self, key):
         self.contents.insert(0, key)
         return self.pop() if len(self.contents) > self.max_size else None
 
+    def size(self):
+        return self.max_size
+
     def resize(self, newsize):
         self.contents = self.contents[:newsize]
         self.max_size = newsize

diff --git a/lib/__init__.py b/lib/__init__.py