urllib.request
— Extensible library for opening URLs
Source code: Lib/urllib/request.py
The urllib.request
module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.
{tip}The Requests package is recommended for a higher-level HTTP client interface.
The urllib.request
module defines the following functions:
urllib.request.
urlopen
(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
urllib.request.
urlopen
(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)Open the URL url, which can be either a string or a Request
object.
data must be an object specifying additional data to be sent to the server, or None
if no such data is needed. See Request
for details.
urllib.request module uses HTTP/1.1 and includes Connection:close
header in its HTTP requests.
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections.
If context is specified, it must be a ssl.SSLContext
instance describing the various SSL options. See HTTPSConnection
for more details.
The optional cafile and capath parameters specify a set of trusted CA certificates for HTTPS requests. cafile should point to a single file containing a bundle of CA certificates, whereas capath should point to a directory of hashed certificate files. More information can be found in ssl.SSLContext.load_verify_locations()
.
The cadefault parameter is ignored.
This function always returns an object which can work as a context manager and has the properties url, headers, and status. See urllib.response.addinfourl
for more detail on these properties.
For HTTP and HTTPS URLs, this function returns a http.client.HTTPResponse
object slightly modified. In addition to the three new methods above, the msg attribute contains the same information as the reason
attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation for HTTPResponse
.
For FTP, file, and data URLs and requests explicitly handled by legacy URLopener
and FancyURLopener
classes, this function returns a urllib.response.addinfourl
object.
Raises URLError
on protocol errors.
Note that None
may be returned if no handler handles the request (though the default installed global OpenerDirector
uses UnknownHandler
to ensure this never happens).
In addition, if proxy settings are detected (for example, when a *_proxy
environment variable like http_proxy
is set), ProxyHandler
is default installed and makes sure the requests are handled through the proxy.
The legacy urllib.urlopen
function from Python 2.6 and earlier has been discontinued; urllib.request.urlopen()
corresponds to the old urllib2.urlopen
. Proxy handling, which was done by passing a dictionary parameter to urllib.urlopen
, can be obtained by using ProxyHandler
objects.
The default opener raises an auditing event urllib.Request
with arguments fullurl
, data
, headers
, method
taken from the request object.
Changed in version 3.2: cafile and capath were added.
Changed in version 3.2: HTTPS virtual hosts are now supported if possible (that is, if ssl.HAS_SNI
is true).
New in version 3.2: data can be an iterable object.
Changed in version 3.3: cadefault was added.
Changed in version 3.4.3: context was added.
Deprecated since version 3.6: cafile, capath and cadefault are deprecated in favor of context. Please use ssl.SSLContext.load_cert_chain()
instead, or let ssl.create_default_context()
select the system’s trusted CA certificates for you.
urllib.request.
install_opener
(opener)
urllib.request.
install_opener
(opener)Install an OpenerDirector
instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call OpenerDirector.open()
instead of urlopen()
. The code does not check for a real OpenerDirector
, and any class with the appropriate interface will work.
urllib.request.
build_opener
([handler, ...])
urllib.request.
build_opener
([handler, ...])Return an OpenerDirector
instance, which chains the handlers in the order given. handlers can be either instances of BaseHandler
, or subclasses of BaseHandler
(in which case it must be possible to call the constructor without any parameters). Instances of the following classes will be in front of the handlers, unless the handlers contain them, instances of them or subclasses of them: ProxyHandler
(if proxy settings are detected), UnknownHandler
, HTTPHandler
, HTTPDefaultErrorHandler
, HTTPRedirectHandler
, FTPHandler
, FileHandler
, HTTPErrorProcessor
.
If the Python installation has SSL support (i.e., if the ssl
module can be imported), HTTPSHandler
will also be added.
A BaseHandler
subclass may also change its handler_order
attribute to modify its position in the handlers list.
urllib.request.
pathname2url
(path)
urllib.request.
pathname2url
(path)Convert the pathname path from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the quote()
function.
urllib.request.
url2pathname
(path)
urllib.request.
url2pathname
(path)Convert the path component path from a percent-encoded URL to the local syntax for a path. This does not accept a complete URL. This function uses unquote()
to decode path.
urllib.request.
getproxies
()
urllib.request.
getproxies
()This helper function returns a dictionary of scheme to proxy server URL mappings. It scans the environment for variables named <scheme>_proxy
, in a case insensitive approach, for all operating systems first, and when it cannot find it, looks for proxy information from Mac OSX System Configuration for Mac OS X and Windows Systems Registry for Windows. If both lowercase and uppercase environment variables exist (and disagree), lowercase is preferred.
{note}If the environment variable
REQUEST_METHOD
is set, which usually indicates your script is running in a CGI environment, the environment variableHTTP_PROXY
(uppercase_PROXY
) will be ignored. This is because that variable can be injected by a client using the “Proxy:” HTTP header. If you need to use an HTTP proxy in a CGI environment, either useProxyHandler
explicitly, or make sure the variable name is in lowercase (or at least the_proxy
suffix).
The following classes are provided:
class urllib.request.
Request
(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)
urllib.request.
Request
(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)This class is an abstraction of a URL request.
url should be a string containing a valid URL.
data must be an object specifying additional data to send to the server, or None
if no such data is needed. Currently HTTP requests are the only ones that use data. The supported object types include bytes, file-like objects, and iterables of bytes-like objects. If no Content-Length
nor Transfer-Encoding
header field has been provided, HTTPHandler
will set these headers according to the type of data. Content-Length
will be used to send bytes objects, while Transfer-Encoding: chunked
as specified in RFC 7230, Section 3.3.1 will be used to send files and other iterables.
For an HTTP POST request method, data should be a buffer in the standard application/x-www-form-urlencoded format. The urllib.parse.urlencode()
function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format. It should be encoded to bytes before being used as the data parameter.
headers should be a dictionary, and will be treated as if add_header()
was called with each key and value as arguments. This is often used to “spoof” the User-Agent
header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"
, while urllib
’s default user agent string is "Python-urllib/2.6"
(on Python 2.6).
An appropriate Content-Type
header should be included if the data argument is present. If this header has not been provided and data is not None, Content-Type: application/x-www-form-urlencoded
will be added as a default.
The next two arguments are only of interest for correct handling of third-party HTTP cookies:
origin_req_host should be the request-host of the origin transaction, as defined by RFC 2965. It defaults to http.cookiejar.request_host(self)
. This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.
unverifiable should indicate whether the request is unverifiable, as defined by RFC 2965. It defaults to False
. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.
method should be a string that indicates the HTTP request method that will be used (e.g. 'HEAD'
). If provided, its value is stored in the method
attribute and is used by get_method()
. The default is 'GET'
if data is None
or 'POST'
otherwise. Subclasses may indicate a different default method by setting the method
attribute in the class itself.
{note}The request will not work as expected if the data object is unable to deliver its content more than once (e.g. a file or an iterable that can produce the content only once) and the request is retried for HTTP redirects or authentication. The data is sent to the HTTP server right away after the headers. There is no support for a 100-continue expectation in the library.
Changed in version 3.3: Request.method
argument is added to the Request class.
Changed in version 3.4: Default Request.method
may be indicated at the class level.
Changed in version 3.6: Do not raise an error if the Content-Length
has not been provided and data is neither None
nor a bytes object. Fall back to use chunked transfer encoding instead.
class urllib.request.
OpenerDirector
urllib.request.
OpenerDirector
The OpenerDirector
class opens URLs via BaseHandler
s chained together. It manages the chaining of handlers, and recovery from errors.
class urllib.request.
BaseHandler
urllib.request.
BaseHandler
This is the base class for all registered handlers — and handles only the simple mechanics of registration.
class urllib.request.
HTTPDefaultErrorHandler
urllib.request.
HTTPDefaultErrorHandler
A class which defines a default handler for HTTP error responses; all responses are turned into HTTPError
exceptions.
class urllib.request.
HTTPRedirectHandler
urllib.request.
HTTPRedirectHandler
A class to handle redirections.
class urllib.request.
HTTPCookieProcessor
(cookiejar=None)
urllib.request.
HTTPCookieProcessor
(cookiejar=None)A class to handle HTTP Cookies.
class urllib.request.
ProxyHandler
(proxies=None)
urllib.request.
ProxyHandler
(proxies=None)Cause requests to go through a proxy. If proxies is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables <protocol>_proxy
. If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a Mac OS X environment proxy information is retrieved from the OS X System Configuration Framework.
To disable autodetected proxy pass an empty dictionary.
The no_proxy
environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma-separated list of hostname suffixes, optionally with :port
appended, for example cern.ch,ncsa.uiuc.edu,some.host:8080
.
{note}
HTTP_PROXY
will be ignored if a variableREQUEST_METHOD
is set; see the documentation ongetproxies()
.
class urllib.request.
HTTPPasswordMgr
urllib.request.
HTTPPasswordMgr
Keep a database of (realm, uri) -> (user, password)
mappings.
class urllib.request.
HTTPPasswordMgrWithDefaultRealm
urllib.request.
HTTPPasswordMgrWithDefaultRealm
Keep a database of (realm, uri) -> (user, password)
mappings. A realm of None
is considered a catch-all realm, which is searched if no other realm fits.
class urllib.request.
HTTPPasswordMgrWithPriorAuth
urllib.request.
HTTPPasswordMgrWithPriorAuth
A variant of HTTPPasswordMgrWithDefaultRealm
that also has a database of uri -> is_authenticated
mappings. Can be used by a BasicAuth handler to determine when to send authentication credentials immediately instead of waiting for a 401
response first.
New in version 3.5.
class urllib.request.
AbstractBasicAuthHandler
(password_mgr=None)
urllib.request.
AbstractBasicAuthHandler
(password_mgr=None)This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. If passwd_mgr also provides is_authenticated
and update_authenticated
methods (see HTTPPasswordMgrWithPriorAuth Objects), then the handler will use the is_authenticated
result for a given URI to determine whether or not to send authentication credentials with the request. If is_authenticated
returns True
for the URI, credentials are sent. If is_authenticated
is False
, credentials are not sent, and then if a 401
response is received the request is re-sent with the authentication credentials. If authentication succeeds, update_authenticated
is called to set is_authenticated
True
for the URI, so that subsequent requests to the URI or any of its super-URIs will automatically include the authentication credentials.
New in version 3.5: Added is_authenticated
support.
class urllib.request.
HTTPBasicAuthHandler
(password_mgr=None)
urllib.request.
HTTPBasicAuthHandler
(password_mgr=None)Handle authentication with the remote host. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. HTTPBasicAuthHandler will raise a ValueError
when presented with a wrong Authentication scheme.
class urllib.request.
ProxyBasicAuthHandler
(password_mgr=None)
urllib.request.
ProxyBasicAuthHandler
(password_mgr=None)Handle authentication with the proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.
class urllib.request.
AbstractDigestAuthHandler
(password_mgr=None)
urllib.request.
AbstractDigestAuthHandler
(password_mgr=None)This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. password_mgr, if given, should be something that is compatible with HTTPPasswordMgr
; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported.
urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=...
2574 0 1 years ago
urllib.request.install_opener(opener)Install an OpenerDirector instance as the default gl...
1140 0 1 years ago
urllib.request.build_opener([handler, ...])Return an OpenerDirector instance, which chain...
1509 0 1 years ago
urllib.request.pathname2url(path)Convert the pathname path from the local syntax for a pa...
1633 0 1 years ago
urllib.request.url2pathname(path)Convert the path component path from a percent-encoded U...
1263 0 1 years ago
urllib.request.getproxies()This helper function returns a dictionary of scheme to proxy s...
1679 0 1 years ago
class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiab...
1649 0 1 years ago
class urllib.request.OpenerDirectorThe OpenerDirector class opens URLs via BaseHandlers c...
1481 0 1 years ago
class urllib.request.BaseHandlerThis is the base class for all registered handlers — and...
1402 0 1 years ago
class urllib.request.HTTPDefaultErrorHandlerA class which defines a default handler for H...
1088 0 1 years ago
class urllib.request.HTTPRedirectHandlerA class to handle redirections.
924 0 1 years ago
class urllib.request.HTTPCookieProcessor(cookiejar=None)A class to handle HTTP Cookies.
943 0 1 years ago
class urllib.request.ProxyHandler(proxies=None)Cause requests to go through a proxy. If p...
1151 0 1 years ago
class urllib.request.HTTPPasswordMgrKeep a database of (realm, uri) -> (user, password) ...
854 0 1 years ago
class urllib.request.HTTPPasswordMgrWithDefaultRealmKeep a database of (realm, uri) -> (...
925 0 1 years ago
class urllib.request.HTTPPasswordMgrWithPriorAuthA variant of HTTPPasswordMgrWithDefaultR...
528 0 1 years ago
class urllib.request.AbstractBasicAuthHandler(password_mgr=None)This is a mixin class tha...
827 0 1 years ago
class urllib.request.HTTPBasicAuthHandler(password_mgr=None)Handle authentication with th...
877 0 1 years ago
class urllib.request.ProxyBasicAuthHandler(password_mgr=None)Handle authentication with t...
1123 0 1 years ago
class urllib.request.AbstractDigestAuthHandler(password_mgr=None)This is a mixin class th...
1185 0 1 years ago