How to download a file over HTTP?


I have a small utility that I use to download an MP3 file from a website on a schedule and then builds/updates a podcast XML file which I've added to iTunes.

The text processing that creates/updates the XML file is written in Python. However, I use wget inside a Windows .bat file to download the actual MP3 file. I would prefer to have the entire utility written in Python.

I struggled to find a way to actually download the file in Python, thus why I resorted to using wget.

So, how do I download the file using Python?

This question is tagged with python http urllib

~ Asked on 2008-08-22 15:34:13

25 Answers


Use urllib.request.urlopen():

import urllib.request
with urllib.request.urlopen('') as f:
    html ='utf-8')

This is the most basic way to use the library, minus any error handling. You can also do more complex stuff such as changing headers.

On Python 2, the method is in urllib2:

import urllib2
response = urllib2.urlopen('')
html =

~ Answered on 2008-08-22 15:38:22


One more, using urlretrieve:

import urllib
urllib.urlretrieve ("", "mp3.mp3")

(for Python 3+ use import urllib.request and urllib.request.urlretrieve)

Yet another one, with a "progressbar"

import urllib2

url = ""

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta =
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer =
    if not buffer:

    file_size_dl += len(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,


~ Answered on 2008-08-22 16:19:09


In 2012, use the python requests library

>>> import requests
>>> url = ""
>>> r = requests.get(url)
>>> print len(r.content)

You can run pip install requests to get it.

Requests has many advantages over the alternatives because the API is much simpler. This is especially true if you have to do authentication. urllib and urllib2 are pretty unintuitive and painful in this case.


People have expressed admiration for the progress bar. It's cool, sure. There are several off-the-shelf solutions now, including tqdm:

from tqdm import tqdm
import requests

url = ""
response = requests.get(url, stream=True)

with open("10MB", "wb") as handle:
    for data in tqdm(response.iter_content()):

This is essentially the implementation @kvance described 30 months ago.

~ Answered on 2012-05-24 20:08:29


import urllib2
mp3file = urllib2.urlopen("")
with open('test.mp3','wb') as output:

The wb in open('test.mp3','wb') opens a file (and erases any existing file) in binary mode so you can save data with it instead of just text.

~ Answered on 2008-08-22 15:58:17


Python 3

  • urllib.request.urlopen

    import urllib.request
    response = urllib.request.urlopen('')
    html =
  • urllib.request.urlretrieve

    import urllib.request
    urllib.request.urlretrieve('', 'mp3.mp3')

    Note: According to the documentation, urllib.request.urlretrieve is a "legacy interface" and "might become deprecated in the future" (thanks gerrit)

Python 2

  • urllib2.urlopen (thanks Corey)

    import urllib2
    response = urllib2.urlopen('')
    html =
  • urllib.urlretrieve (thanks PabloG)

    import urllib
    urllib.urlretrieve('', 'mp3.mp3')

~ Answered on 2015-08-06 13:30:31


use wget module:

import wget'url')

~ Answered on 2015-03-25 12:59:25


import os,requests
def download(url):
    get_response = requests.get(url,stream=True)
    file_name  = url.split("/")[-1]
    with open(file_name, 'wb') as f:
        for chunk in get_response.iter_content(chunk_size=1024):
            if chunk: # filter out keep-alive new chunks


~ Answered on 2018-11-05 11:28:18


An improved version of the PabloG code for Python 2/3:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import ( division, absolute_import, print_function, unicode_literals )

import sys, os, tempfile, logging

if sys.version_info >= (3,):
    import urllib.request as urllib2
    import urllib.parse as urlparse
    import urllib2
    import urlparse

def download_file(url, dest=None):
    Download and save a file specified by url to dest directory,
    u = urllib2.urlopen(url)

    scheme, netloc, path, query, fragment = urlparse.urlsplit(url)
    filename = os.path.basename(path)
    if not filename:
        filename = 'downloaded.file'
    if dest:
        filename = os.path.join(dest, filename)

    with open(filename, 'wb') as f:
        meta =
        meta_func = meta.getheaders if hasattr(meta, 'getheaders') else meta.get_all
        meta_length = meta_func("Content-Length")
        file_size = None
        if meta_length:
            file_size = int(meta_length[0])
        print("Downloading: {0} Bytes: {1}".format(url, file_size))

        file_size_dl = 0
        block_sz = 8192
        while True:
            buffer =
            if not buffer:

            file_size_dl += len(buffer)

            status = "{0:16}".format(file_size_dl)
            if file_size:
                status += "   [{0:6.2f}%]".format(file_size_dl * 100 / file_size)
            status += chr(13)
            print(status, end="")

    return filename

if __name__ == "__main__":  # Only run if this file is called directly
    print("Testing with 10MB download")
    url = ""
    filename = download_file(url)

~ Answered on 2013-05-13 08:59:44


Simple yet Python 2 & Python 3 compatible way comes with six library:

from six.moves import urllib
urllib.request.urlretrieve("", "mp3.mp3")

~ Answered on 2017-06-22 07:59:35


Wrote wget library in pure Python just for this purpose. It is pumped up urlretrieve with these features as of version 2.0.

~ Answered on 2013-09-25 17:55:16


Following are the most commonly used calls for downloading files in python:

  1. urllib.urlretrieve ('url_to_file', file_name)

  2. urllib2.urlopen('url_to_file')

  3. requests.get(url)

  4.'url', file_name)

Note: urlopen and urlretrieve are found to perform relatively bad with downloading large files (size > 500 MB). requests.get stores the file in-memory until download is complete.

~ Answered on 2016-09-19 12:45:10


I agree with Corey, urllib2 is more complete than urllib and should likely be the module used if you want to do more complex things, but to make the answers more complete, urllib is a simpler module if you want just the basics:

import urllib
response = urllib.urlopen('')
mp3 =

Will work fine. Or, if you don't want to deal with the "response" object you can call read() directly:

import urllib
mp3 = urllib.urlopen('').read()

~ Answered on 2008-08-22 15:58:52


In python3 you can use urllib3 and shutil libraires. Download them by using pip or pip3 (Depending whether python3 is default or not)

pip3 install urllib3 shutil

Then run this code

import urllib.request
import shutil

url = ""
output_file = "save_this_name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

Note that you download urllib3 but use urllib in code

~ Answered on 2018-02-08 17:30:45


If you have wget installed, you can use parallel_sync.

pip install parallel_sync

from parallel_sync import wget
urls = ['http://something.png', 'http://somthing.tar.gz', '']'/tmp', urls)
# or a single file:'/tmp', urls[0], filenames='', extract=True)


This is pretty powerful. It can download files in parallel, retry upon failure , and it can even download files on a remote machine.

~ Answered on 2015-11-19 23:48:06


You can get the progress feedback with urlretrieve as well:

def report(blocknr, blocksize, size):
    current = blocknr*blocksize

def downloadFile(url):
    print "\n",url
    fname = url.split('/')[-1]
    print fname
    urllib.urlretrieve(url, fname, report)

~ Answered on 2014-01-26 13:12:54


If speed matters to you, I made a small performance test for the modules urllib and wget, and regarding wget I tried once with status bar and once without. I took three different 500MB files to test with (different files- to eliminate the chance that there is some caching going on under the hood). Tested on debian machine, with python2.

First, these are the results (they are similar in different runs):

$ python 
urlretrive_test : starting
urlretrive_test : 6.56
wget_no_bar_test : starting
wget_no_bar_test : 7.20
wget_with_bar_test : starting
100% [......................................................................] 541335552 / 541335552
wget_with_bar_test : 50.49

The way I performed the test is using "profile" decorator. This is the full code:

import wget
import urllib
import time
from functools import wraps

def profile(func):
    def inner(*args):
        print func.__name__, ": starting"
        start = time.time()
        ret = func(*args)
        end = time.time()
        print func.__name__, ": {:.2f}".format(end - start)
        return ret
    return inner

url1 = ''
url2 = ''
url3 = ''

def do_nothing(*args):

def urlretrive_test(url):
    return urllib.urlretrieve(url)

def wget_no_bar_test(url):
    return, out='/tmp/', bar=do_nothing)

def wget_with_bar_test(url):
    return, out='/tmp/')

print '=============='

print '=============='

print '=============='

urllib seems to be the fastest

~ Answered on 2017-11-03 14:25:38


Just for the sake of completeness, it is also possible to call any program for retrieving files using the subprocess package. Programs dedicated to retrieving files are more powerful than Python functions like urlretrieve. For example, wget can download directories recursively (-R), can deal with FTP, redirects, HTTP proxies, can avoid re-downloading existing files (-nc), and aria2 can do multi-connection downloads which can potentially speed up your downloads.

import subprocess
subprocess.check_output(['wget', '-O', 'example_output_file.html', ''])

In Jupyter Notebook, one can also call programs directly with the ! syntax:

!wget -O example_output_file.html

~ Answered on 2018-08-29 12:24:49


I wrote the following, which works in vanilla Python 2 or Python 3.

import sys
    import urllib.request
    python3 = True
except ImportError:
    import urllib2
    python3 = False

def progress_callback_simple(downloaded,total):
        "\r" +
        (len(str(total))-len(str(downloaded)))*" " + str(downloaded) + "/%d"%total +
        " [%3.2f%%]"%(100.0*float(downloaded)/float(total))

def download(srcurl, dstfilepath, progress_callback=None, block_size=8192):
    def _download_helper(response, out_file, file_size):
        if progress_callback!=None: progress_callback(0,file_size)
        if block_size == None:
            buffer =

            if progress_callback!=None: progress_callback(file_size,file_size)
            file_size_dl = 0
            while True:
                buffer =
                if not buffer: break

                file_size_dl += len(buffer)

                if progress_callback!=None: progress_callback(file_size_dl,file_size)
    with open(dstfilepath,"wb") as out_file:
        if python3:
            with urllib.request.urlopen(srcurl) as response:
                file_size = int(response.getheader("Content-Length"))
            response = urllib2.urlopen(srcurl)
            meta =
            file_size = int(meta.getheaders("Content-Length")[0])

import traceback


  • Supports a "progress bar" callback.
  • Download is a 4 MB test .zip from my website.

~ Answered on 2017-05-13 21:33:30


You can use PycURL on Python 2 and 3.

import pycurl

FILE_DEST = 'pycurl.html'

with open(FILE_DEST, 'wb') as f:
    c = pycurl.Curl()
    c.setopt(c.URL, FILE_SRC)
    c.setopt(c.WRITEDATA, f)

~ Answered on 2018-08-08 03:51:42


Source code can be:

import urllib
sock = urllib.urlopen("")
htmlSource =                            
print htmlSource  

~ Answered on 2013-11-26 13:21:01


This may be a little late, But I saw pabloG's code and couldn't help adding a os.system('cls') to make it look AWESOME! Check it out :

    import urllib2,os

    url = ""

    file_name = url.split('/')[-1]
    u = urllib2.urlopen(url)
    f = open(file_name, 'wb')
    meta =
    file_size = int(meta.getheaders("Content-Length")[0])
    print "Downloading: %s Bytes: %s" % (file_name, file_size)
    file_size_dl = 0
    block_sz = 8192
    while True:
        buffer =
        if not buffer:

        file_size_dl += len(buffer)
        status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
        status = status + chr(8)*(len(status)+1)
        print status,


If running in an environment other than Windows, you will have to use something other then 'cls'. In MAC OS X and Linux it should be 'clear'.

~ Answered on 2013-10-14 02:54:01


urlretrieve and requests.get are simple, however the reality not. I have fetched data for couple sites, including text and images, the above two probably solve most of the tasks. but for a more universal solution I suggest the use of urlopen. As it is included in Python 3 standard library, your code could run on any machine that run Python 3 without pre-installing site-package

import urllib.request
url_request = urllib.request.Request(url, headers=headers)
url_connect = urllib.request.urlopen(url_request)

#remember to open file in bytes mode
with open(filename, 'wb') as f:
    while True:
        buffer =
        if not buffer: break

        #an integer value of size of written data
        data_wrote = f.write(buffer)

#you could probably use with-open-as manner

This answer provides a solution to HTTP 403 Forbidden when downloading file over http using Python. I have tried only requests and urllib modules, the other module may provide something better, but this is the one I used to solve most of the problems.

~ Answered on 2017-03-13 13:12:19


Late answer, but for python>=3.6 you can use:

import dload

Install dload with:

pip3 install dload

~ Answered on 2020-02-24 07:12:14


I wanted do download all the files from a webpage. I tried wget but it was failing so I decided for the Python route and I found this thread.

After reading it, I have made a little command line application, soupget, expanding on the excellent answers of PabloG and Stan and adding some useful options.

It uses BeatifulSoup to collect all the URLs of the page and then download the ones with the desired extension(s). Finally it can download multiple files in parallel.

Here it is:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from __future__ import (division, absolute_import, print_function, unicode_literals)
import sys, os, argparse
from bs4 import BeautifulSoup

# --- insert Stan's script here ---
# if sys.version_info >= (3,): 
# def download_file(url, dest=None): 

# --- new stuff ---
def collect_all_url(page_url, extensions):
    Recovers all links in page_url checking for all the desired extensions
    conn = urllib2.urlopen(page_url)
    html =
    soup = BeautifulSoup(html, 'lxml')
    links = soup.find_all('a')

    results = []    
    for tag in links:
        link = tag.get('href', None)
        if link is not None: 
            for e in extensions:
                if e in link:
                    # Fallback for badly defined links
                    # checks for missing scheme or netloc
                    if bool(urlparse.urlparse(link).scheme) and bool(urlparse.urlparse(link).netloc):
    return results

if __name__ == "__main__":  # Only run if this file is called directly
    # Command line arguments
    parser = argparse.ArgumentParser(
        description='Download all files from a webpage.')
        '-u', '--url', 
        help='Page url to request')
        '-e', '--ext', 
        help='Extension(s) to find')    
        '-d', '--dest', 
        help='Destination where to save the files')
        '-p', '--par', 
        action='store_true', default=False, 
        help="Turns on parallel download")
    args = parser.parse_args()

    # Recover files to download
    all_links = collect_all_url(args.url, args.ext)

    # Download
    if not args.par:
        for l in all_links:
                filename = download_file(l, args.dest)
            except Exception as e:
                print("Error while downloading: {}".format(e))
        from multiprocessing.pool import ThreadPool
        results = ThreadPool(10).imap_unordered(
            lambda x: download_file(x, args.dest), all_links)
        for p in results:

An example of its usage is:

python3 -p -e <list of extensions> -d <destination_folder> -u <target_webpage>

And an actual example if you want to see it in action:

python3 -p -e .xlsx .pdf .csv -u

~ Answered on 2020-03-06 00:17:19


Another way is to call an external process such as curl.exe. Curl by default displays a progress bar, average download speed, time left, and more all formatted neatly in a table. Put curl.exe in the same directory as your script

from subprocess import call
url = ""
call(["curl", {url}, '--output', "song.mp3"])

Note: You cannot specify an output path with curl, so do an os.rename afterwards

~ Answered on 2020-09-14 00:20:55

Most Viewed Questions: