Skip to content

Commit

Permalink
Merge branch 'develop' for v2.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
wwyaiykycnf committed Jun 17, 2014
2 parents 897a681 + 5b42fe5 commit 93b1d56
Show file tree
Hide file tree
Showing 8 changed files with 307 additions and 188 deletions.
71 changes: 50 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,66 @@
What is **e621dl**?
===============
**e621dl** is an automated downloader for e621.net, which enables you to keep your
favorite artists or tags up to date.
**e621dl** is an automated downloader for e621.net that keeps your favorite tags, artists, or searches up-to-date.

Each time **e621dl** is run, it will download all files that contain at least one
tracked tag, that have been uploaded since the last time **e621dl** was run.
How does **e621dl** work?
===============
The behavior of e621dl is dependent on two files that tell it two crucial things:

1. ***e621dl*** **has to know what tags, artists, or searches you'd like to track.** To determine this, it will look for a file called `tags.txt`. Don't worry about creating this file, **e621dl** will create a blank one with instructions the first time you run it.
2. ***e621dl*** **has to know the last time you ran it.** To find this, it will look in a file called `config.txt`. Again, don't create this file yourself, instead just run **e621dl** and it will create a config file for you with default settings. One of these settings, `"last_run"` tells **e621dl** when the last time it ran was.

Getting Started
===============
**e621dl** requires Python 2.7, so download and install that first. Once you have Python installed:

1. Download or clone this project
2. In the same directory as `e621dl.py`, create a file called `tags.txt`
3. Add tags or artists you wish to download to `tags.txt`, one tag per line.

example `tags.txt`:
- [Download the latest release] (https://github.com/wwyaiykycnf/e621dl/releases/latest) and unzip it.
- Run `e621dl.py`. You should see something like:
```
cat
dog
> ./e621dl.py
configfile ERROR new default file created: config.txt
configfile ERROR verify this new config file and re-run the program
tagfile ERROR new default file created: tags.txt
tagfile ERROR please add tags you wish to this file and re-run the program
e621dl ERROR error(s) encountered during initialization, see above
```
It's not as bad as it looks. **e621dl** is telling you that it couldn't find a config file or tags file, so it created these files. Most users will not need to modify `config.txt` but feel free to look at it and see what settings you can change.

- Add tags or artists you wish to download to `tags.txt`. There should already be instructions in the `tags.txt` that was created for you. All lines starting with a `#` are ignored by **e621dl**, so feel free to leave the instructions in the file after adding your tags, if you wish.

Once you've added a few lines to `tags.txt` and reviewed `config.txt`, you're ready to run **e621dl**!

*Note: e621dl has only been tested with a single per line, but may work with more.*

Running **e621dl**
===============
**e621dl** requires Python 2.7, so make sure you have that.

Running `e621dl.py` will begin an 'update'. The tags/artists listed in
`tags.txt` will be checked, one at a time, to see if there are any files that
have been uploaded since the last time **e621dl** was run.
When you run **e621dl**, it will determine the time it was last run, and then:
- read a line from `tags.txt`
- perform a search on e621.net using that line
- download all new files matching that search (files uploaded AFTER the last time **e621dl** was run)

This process will be repeated for each line in `tags.txt` until the file has been completely processed. The last run date will then be set to yesterday's date, and **e621dl** will report the number of total downloads.

### I did that and not much happened
The first time you run **e621dl**, its possible that not too much will happen. Remember when `config.txt` was created and you opened it and saw that the `"last_run"` date was set to yesterday? **e621dl** does that automatically, and it's set that way intentionally. **e621dl** might check some old uploads to make sure it didn't miss anything, but it will never re-download old files.

Anyway, if that's not what you wanted for your very first update, you'll need to:

### Change the last run date
You can set the last run date to any date that follows the format `YYYY-MM-DD`. So if you'd like to download EVERYTHING of a given tag from the beginning of time (Ok, the beginning of e621.net) until today, change the `"last_run"` variable in `config.txt` to something old:

"last_run": "1986-01-01",

Bugs, New Features, and Feedback
=================
If you find something broken or have any ideas about features you'd like to see in the future, please contact me at [[email protected]]. I read every single email, so even if you think your idea is off-the-wall, or your bug is super-rare, please let me know and I'll see what I can do.

Donations
===============
If you've benefitted from this *free* project, please consider donating something! Your support enables bug fixes, new features, and future development!

[![Wishlist browse button](http://img.shields.io/amazon/wishlist.png?color=blue)](http://amzn.com/w/2F4EC3BPU9JON "Support me by buying something for me on Amazon")
[![BitCoin donate button](http://img.shields.io/bitcoin/donate.png?color=brightgreen)](https://coinbase.com/checkouts/1FZR3iP9zHRqQZeG8zg8Tmx471jP1c8eYe "Make a donation to this project using BitCoin")
[![DogeCoin donate button](http://img.shields.io/dogecoin/donate.png?color=yellow)](README.md#note-dogecoin-donations-may-be-sent-to-dkfycmjxndgaqhq5wdyjlneoqd3xbdygdr "Many donate. So Project. Wow. Very DogeCoin.")

The first time you run **e621dl**, not much will happen. When **e621dl** cannot
determine the last time it was run (e.g., the first time it is run) the current
date is used.
#######Note: *[Dogecoin](http://dogecoin.com) donations may be sent to:* `DKfycmjXNDgaqhQ5wdyJLNEoqd3XBDyGdr`

The last run date may be altered by modifying `.lastrun.txt`, but be sure to
match the YYYY-MM-DD format present in `.lastrun.txt`
205 changes: 127 additions & 78 deletions e621dl.py
Original file line number Diff line number Diff line change
@@ -1,111 +1,160 @@
#!/usr/bin/env python
# pylint: disable=missing-docstring,line-too-long,too-many-public-methods,
import os.path
import datetime
import logging
import init_e621dl
import sys
import regex
import lib.support as support
import lib.default as default
import datetime
import cPickle as pickle
import json
import lib.e621_api as e621_api
import re

# IMPORTANT: Update this line to point to your tagfile
TAGFILE = 'tags.txt'

def get_last_run():
try:
with open(".lastrun.txt", 'r') as lastrun:
return lastrun.read()
##############################################################################
# INITIALIZATION
# - parse command line arguments
# - create a logger to show runtime messages
# - open config file
# - open file containing tracked tags
# - populate the recent downloads cache
##############################################################################
CONFIG_FILE = 'config.txt' # modify to use different config file

# get args dictionary
ARGS = support.get_args_dict()

# set up logging
logging.basicConfig(level=ARGS['log_lvl'], format=default.LOGGER_FMT,
stream=sys.stderr)
LOG = logging.getLogger('e621dl')

except IOError:
return datetime.date.today().strftime("%Y-%m-%d")
# this flag will be set to true if a fatal error occurs in pre-update
EARLY_TERMINATE = False

# read the config file. if not found, create a new one
if not os.path.isfile(CONFIG_FILE):
CONFIG = support.make_default_configfile(CONFIG_FILE)
EARLY_TERMINATE = True
else:
CONFIG = support.read_configfile(CONFIG_FILE)

# read the tags file. if not found, create a new one
TAGS = []
if not os.path.isfile(CONFIG['tag_file']):
support.make_default_tagfile(CONFIG['tag_file'])
EARLY_TERMINATE = True
else:
TAGS = support.read_tagfile(CONFIG['tag_file'])

# tags was read but contained nothing
if len(TAGS) == 0 and not EARLY_TERMINATE:
LOG.error('no tags found in ' + CONFIG['tag_file'])
LOG.error('add tags (or groups of tags) to this file and re-run the program')
EARLY_TERMINATE = True

# open the cache (this can't really fail; just creates a new blank one)
CACHE = support.get_cache(CONFIG['cache_name'], CONFIG['cache_size'])

# create the downloads directory if needed
if not os.path.exists(CONFIG['downloads']):
os.makedirs(CONFIG['downloads'])

# keeps running total of files downloaded in this run
TOTAL_DOWNLOADS = 0

def set_last_run():
with open(".lastrun.txt", 'w') as lastrun:
yesterday = datetime.date.fromordinal(datetime.date.today().toordinal()-1)
lastrun.write(yesterday.strftime("%Y-%m-%d"))
# exit before updating if any errors occurred in pre-update
if EARLY_TERMINATE:
LOG.error('error(s) encountered during initialization, see above')
exit()
else:
LOG.debug('successfully initialized\n')

def download_image(link, tag, path):
filename = regex.get_filename(link)
tag = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', tag)
completepath = path + tag + "_" + filename
##############################################################################
# UPDATE
# - for each tag (or tag group) in the tagfile:
# - for each upload since the last time e621dl was run:
# - if the file has not previously been downloaded, download it
# - count number of downloads for reporting in post-update
##############################################################################

if os.path.isfile(completepath) == True:
return ' skipped (already exists)'
LOG.info("e621dl was last run on " + CONFIG['last_run'])

elif filename in CACHE:
return ' skipped (previously downloaded)'
for line in TAGS:
LOG.info("Checking for new uploads tagged: " + line)

else:
with open(completepath, 'wb') as dest:
source = SPOOF.open(link)
dest.write(source.read())
# prepare to start accumulating list of download links for line
accumulating = True
current_page = 1
links_to_download = []

CACHE.push(filename)
pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL)
while accumulating:
LOG.debug('getting page ' + str(current_page) + ' of ' + line)
links_found = e621_api.get_posts(line, CONFIG['last_run'],
current_page, default.MAX_RESULTS)

return ' downloaded'
if not links_found:
accumulating = False

# parse arguments and set up logger
ARGS = init_e621dl.get_args()
logging.basicConfig(level=init_e621dl.get_log_level(ARGS),
format='%(name)-8s %(levelname)-8s %(message)s', stream=sys.stderr)
LOG = logging.getLogger('e621dl')
else:
# add links found to list to be downloaded
links_to_download += links_found
# continue accumulating if found == max, else stop accumulation
accumulating = len(links_found) == default.MAX_RESULTS
current_page += 1

# prepare to run
CACHE = init_e621dl.get_downloads_list('.cache')
TAGS = init_e621dl.read_tags_file(TAGFILE)
LASTRUN = get_last_run()
DOWNLOAD_DIR = '''./downloads/'''
TOTAL_DOWNLOADS = 0
SPOOF = init_e621dl.SpoofOpen()
remaining = len(links_to_download)

if not os.path.exists(DOWNLOAD_DIR):
os.makedirs(DOWNLOAD_DIR)
if remaining == 0:
LOG.info('no new uploads for: ' + line)

LOG.info("e621dl was last run on " + LASTRUN)
else:
LOG.info(str(remaining) + ' new uploads for: ' + line)

for tag in TAGS:
LOG.info("Checking for new uploads tagged: " + tag)
for item in links_to_download:

accumulating = True
page_number = 1
dl_links = []
LOG.debug('item md5 = ' + item.md5)
# construct full filename
filename = re.sub('[\<\>:"/\\\|\?\*\ ]', '_', line) + '--' + \
item.md5 + '.' + item.ext

# get all post pages in same list
while accumulating:
results_page = regex.get_results_page(tag, LASTRUN, page_number)
# skip if already in cache
if item.md5 in CACHE:
LOG.info('(' + str(remaining) + ') skipped (previously downloaded)')

if regex.results_exist(results_page):
LOG.debug('page ' + str(page_number) + ' contained results')
dl_links += regex.get_links(results_page)
page_number += 1
# skip if already in download directory
elif os.path.isfile(filename):
LOG.info('(' + str(remaining) + ') skipped (already in downloads directory')

else:
LOG.debug('page ' + str(page_number) + ' contained nothing')
accumulating = False
# otherwise, download it
else:
LOG.info('(' + str(remaining) + ') downloading... ')
e621_api.download(item.url, CONFIG['downloads'] + filename)

LOG.info('number of uploads found: ' + str(len(dl_links)))
# push to cache, write cache to disk
CACHE.push(item.md5)
pickle.dump(CACHE, open('.cache', 'wb'), pickle.HIGHEST_PROTOCOL)
TOTAL_DOWNLOADS += 1

# download image in each post page
remaining = len(dl_links)
for link in dl_links:
status = download_image(link, tag, DOWNLOAD_DIR)
img_string = '(%d) %s' % (remaining, regex.get_filename(link))
# decrement remaining downloads
remaining -= 1

LOG.info(img_string + status)
if status == ' downloaded':
TOTAL_DOWNLOADS += 1
remaining -= 1
LOG.debug('update for ' + line + ' completed\n')
print ''

LOG.info('update for ' + tag + ' completed')
LOG.info('')
##############################################################################
# WRAP-UP
# - report number of downloads in this session
# - set last run to yesterday (see FAQ for why it isn't today)
##############################################################################
LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS))
YESTERDAY = datetime.date.fromordinal(datetime.date.today().toordinal()-1)
CONFIG['last_run'] = YESTERDAY.strftime(default.DATETIME_FMT)

#set_last_run()
with open(CONFIG_FILE, 'wb') as outfile:
json.dump(CONFIG, outfile, indent=4, sort_keys=True,
ensure_ascii=False, separators=(',', ':\t\t'))

LOG.info('total files downloaded: ' + str(TOTAL_DOWNLOADS))
set_last_run()
LOG.info('last run set to ' + get_last_run())
LOG.info('last run updated to ' + CONFIG['last_run'])

exit()

55 changes: 0 additions & 55 deletions init_e621dl.py

This file was deleted.

3 changes: 3 additions & 0 deletions FixedFifo.py → lib/FixedFifo.py
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ def push(self, key):
self.contents.insert(0, key)
return self.pop() if len(self.contents) > self.max_size else None

def size(self):
return self.max_size

def resize(self, newsize):
self.contents = self.contents[:newsize]
self.max_size = newsize
Expand Down
Empty file added lib/__init__.py
Empty file.
Loading

0 comments on commit 93b1d56

Please sign in to comment.