Self-hatred

[Bash] Download ALL of an Imgur user’s images

I’ve consulted Imgur’s API and asked around: There does not seem to be any download limits in place. That said, my script does place nice, and you can throttle requests if you wish. Imgur’s API does include a way to just fetch a list of a user’s albums and their respective images, but I’ve spent a decade procrastinating over learning awk and sed. I needed the practice. :)

UPDATE: I picked up a few comments on this script elsewhere. To answer the prevailing question: This does not support any halt/resume, or other error checking; it is meant to grab an entire user archive in one sweep. It is also not threaded, although it very easily could be. This is out of consideration to Imgur, who see enough traffic without getting slammed by 20 or 30 wget requests a second. If you want to thread it, you thread it.

Use:

  1. Make a new, empty directory and cd into.
  2. fetch $url
  3. Wait for it to finish.
#!/bin/bash

INDEX=index.html
ALIST=albums.list
ILIST=images.list
DELAY=5

# Fetch the user's album page.
wget $1
# Strip the user's page down to a list of album hyperlinks.
cat $INDEX | grep imgur.com/a/ | sed 's/[\t]//g' | sed 's/\/\//http:\/\//g' > $ALIST
cat $ALIST | sed 's/<a href\="//g' | sed 's/">//g' > $ALIST

# Count how many albums we are to fetch.
COUNT=$(wc -l $ALIST | awk '{print $1}')

# Descend into a loop to fetch all of the images.
for i in $(seq 1 $COUNT)
do
	# Make a directory. I prefer a staight numerical name. Your choice.
	mkdir $i
	cd $i
	# Fetch this particular album index.
	wget $(cat ../$ALIST | sed -n "${i}{p}")
	# Variable-ize it.
	mv $(ls -1 .) $ILIST
	# Strip the file down to the actual image links.
	cat $ILIST | grep href | grep jpg | grep -v rel > $ILIST
	cat $ILIST | sed 's/<a href="//g' | sed 's/">//g' > $ILIST
	# Fetch the album's images.
	# sleep $DELAY
	wget -i $ILIST
	# Cleanup.
	rm $ILIST
	# Repeat.
	cd ..
done

# Cleanup.
rm $INDEX
rm $ALIST

Categorised as: linux, programming


5 Comments

  1. Supernatendo says:

    How would you add the ability to get .gif as well as .jpg?

  2. Mark says:

    Probably add another command line argument? Let’s say $2. By default make it .jpg, but you can override with a different extension, or descend into a switch.

  3. kenorb says:

    I’ve improved your script and it’s available here:
    http://pastebin.com/tmCW0X2y

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>