Hey everyone, I just joined this board and look forward to being a part of the community here. Last night I was bored and decided I'd learn some Python, so I created my first program last night...so here it is. Don't be too hard on me, as this is my first program (I'm sure it could be optimized and could handle exceptions and input errors)... but anyways, the program is for downloading sequenced images from websites. Here's a sample of some input:
If you have some ideas as to how to better automate the process, or maybe how to check whether the image actually exists on the server before the program tries to download it, or any other suggestions or tips for me please let me know!
Thanks!
Last edited by Cimperiali; January 28th, 2012 at 04:02 PM.
Reason: adding Code tags
Once again, here's the updated code with a couple of changes. Putting in a beginning image number actually works now, added comments, there can be infinite number of leading 0s with no limit on number of files to be downloaded **EDIT** - there is still a problem where you can only download files under 100...any ideas how to fix this? - **EDIT**
**EDIT** - code updated, no problem with leading 0s or limit on file numbers or amount of files - **EDIT**, also changed code to reflect the ability to download any type of sequenced file.
Quote:
Example:
Save directory: C:\Downloads\
URL of file folder on server: http://blob.perl.org/books/beginning-perl/
File prefix: 3145_Chap
Starting file number: 1
Ending file number: 14
Leading 0s: 1
File extension: pdf
Code:
import urllib
import os
import random
#get input - local save directory
savdir = raw_input('\nEnter the directory to save to: ')
#if local save folder doesn't exist, create it - change directory to local save directory
if not os.path.exists(savdir):
os.mkdir(savdir)
os.chdir(savdir)
#get input - URL, file prefix, first file number, last file number, number of leading 0s, file extension
file_folder = raw_input('Enter URL of containing folder: ')
fprefix = raw_input('Enter file prefix: ')
ffnum = raw_input('Enter starting file number: ')
flnum = raw_input('Enter ending file number: ')
num0 = raw_input('Number of leading 0s (on ones column ex: 001 would be 2 0s, 01 would be 1 0): ')
ext = raw_input('Enter file extension: ')
#create list of file numbers - add leading 0s
fmt = '{0}'
fmt = '{0:0' + str(int(num0)+1) + 'd}'
a = [fmt.format(x) for x in range(int(ffnum), int(flnum) + 1)]
#create random folder in local save directory to save files to
rfolder = str(int(random.random()*7384))
os.mkdir(rfolder)
#output file source and file destination - download files from URL and save to local save folder - output number of files downloaded / number of files
print('\n----Beginning download of ' + str(int(flnum) - int(ffnum) + 1) + ' files----\n')
for x in a:
print('Src: ' + file_folder + fprefix + str(a[int(x) - int(ffnum)]) + '.' + ext)
print('Dst: ' + os.getcwd() + '\\' + rfolder + '\\' + fprefix + str(a[int(x) - int(ffnum)]) + '.' + ext)
urllib.urlretrieve(file_folder + fprefix + str(a[int(x) - 1 - int(ffnum)]) + '.' + ext, os.getcwd() + '\\' + rfolder + '\\' + fprefix + str(a[int(x) - int(ffnum)]) + '.' + ext)
print('----Download ' + str(int(x) + 1 - int(ffnum)) + '/' + str(int(flnum) - int(ffnum) + 1) + ' complete----\n')
Actually, the whole business of computing the index for a looks unnecessary: isn't x already what you want? In fact, you don't even need a: you can loop through the range directly and then make use of string formatting within the loop.
Thanks for the suggestions, implementing new stuff is the best way to learn, for me. I'm just getting used to the 'for' loop style in Python and keep mixing up x being an iterator instead of the actual data. The link looks interesting, I'll check it out.
Bookmarks