URL Components¶
A URL is composed of several components. This guide explains each component and how to access and modify them according to the WHATWG URL Standard.
URL Structure¶
A complete URL has the following structure per the URL representation:
https://user:pass@example.com:8080/path/page?query=value#section
└─┬──┘ └──┬───┘ └────┬─────┘└─┬─┘└────┬────┘└─────┬────┘└───┬──┘
protocol username hostname port pathname search hash
password └────────┬───────┘
host
└────────────────┬───────────────────┘
origin
Component Properties¶
The following properties correspond to the URL interface in the WHATWG specification.
href¶
The serialized URL as a complete string:
url = URL("https://example.com/path")
print(url.href) # "https://example.com/path"
# Setting href re-parses the entire URL
url.href = "https://other.com/new"
print(url.hostname) # "other.com"
protocol¶
The scheme followed by ::
url = URL("https://example.com")
print(url.protocol) # "https:"
url.protocol = "http:"
print(url.href) # "http://example.com/"
Protocol Restrictions
Changing between special and non-special schemes may fail silently or produce unexpected results per the spec.
hostname¶
The host domain name or IP address:
url = URL("https://example.com:8080/path")
print(url.hostname) # "example.com"
url.hostname = "other.com"
print(url.href) # "https://other.com:8080/path"
port¶
The port number as a string (empty if default):
url = URL("https://example.com:8080/path")
print(url.port) # "8080"
url = URL("https://example.com/path")
print(url.port) # "" (empty, using default 443)
url.port = "9000"
print(url.href) # "https://example.com:9000/path"
host¶
The host combining hostname and port:
url = URL("https://example.com:8080/path")
print(url.host) # "example.com:8080"
url = URL("https://example.com/path")
print(url.host) # "example.com" (no port for default)
url.host = "other.com:3000"
print(url.hostname) # "other.com"
print(url.port) # "3000"
origin¶
The origin (scheme, hostname, and port) — read-only:
url = URL("https://example.com:8080/path?query#hash")
print(url.origin) # "https://example.com:8080"
# Origin is read-only per the spec
# url.origin = "..." # This would raise an error
Origin Serialization
Origin is serialized according to the origin serialization algorithm in the HTML Standard.
username and password¶
Credentials in the URL:
url = URL("https://user:pass@example.com/path")
print(url.username) # "user"
print(url.password) # "pass"
url.username = "newuser"
url.password = "newpass"
print(url.href) # "https://newuser:newpass@example.com/path"
Security Note
Embedding credentials in URLs is generally discouraged for security reasons. Consider using HTTP authentication headers instead.
pathname¶
The path portion of the URL:
url = URL("https://example.com/path/to/page")
print(url.pathname) # "/path/to/page"
url.pathname = "/new/path"
print(url.href) # "https://example.com/new/path"
search¶
The query string including ?:
url = URL("https://example.com/path?query=value")
print(url.search) # "?query=value"
url.search = "?new=query"
print(url.href) # "https://example.com/path?new=query"
# Remove query string
url.search = ""
print(url.href) # "https://example.com/path"
search_params¶
A URLSearchParams object for query manipulation (read-only property, but the object is mutable). Supports Pythonic dictionary-style access:
url = URL("https://example.com/path?a=1&b=2")
# Read parameters using dictionary syntax
print(url.search_params["a"]) # "1"
# Modify parameters (updates the URL automatically)
url.search_params["a"] = "100"
url.search_params.append("c", "3")
print(url.search) # "?a=100&b=2&c=3"
# Delete parameters
del url.search_params["b"]
print(url.search) # "?a=100&c=3"
hash¶
The fragment identifier including #:
url = URL("https://example.com/path#section")
print(url.hash) # "#section"
url.hash = "#new-section"
print(url.href) # "https://example.com/path#new-section"
# Remove hash
url.hash = ""
print(url.href) # "https://example.com/path"
IPv6 Addresses¶
IPv6 addresses are enclosed in brackets per host serialization:
Component Encoding¶
Components are automatically percent-encoded as needed:
url = URL("https://example.com/path")
# Spaces are encoded in paths
url.pathname = "/hello world"
print(url.pathname) # "/hello%20world"
# Special characters in query
url.search = "?name=John Doe&city=New York"
print(url.search) # "?name=John%20Doe&city=New%20York"
Encode Sets
Different URL components use different percent-encode sets. For example, the query component uses a different set than the path component.
Further Reading¶
- URL interface — Complete API specification
- URL record — Internal URL representation
Next Steps¶
- Learn about URLSearchParams for query string manipulation
- See the complete API Reference