URL Components¶

A URL is composed of several components. This guide explains each component and how to access and modify them according to the WHATWG URL Standard.

URL Structure¶

A complete URL has the following structure per the URL representation:

  https://user:pass@example.com:8080/path/page?query=value#section
  └─┬──┘ └──┬───┘ └────┬─────┘└─┬─┘└────┬────┘└─────┬────┘└───┬──┘
 protocol username   hostname  port  pathname    search     hash
           password            └────────┬───────┘
                                       host
          └────────────────┬───────────────────┘
                          origin

Component Properties¶

The following properties correspond to the URL interface in the WHATWG specification.

`href`¶

The serialized URL as a complete string:

url = URL("https://example.com/path")
print(url.href)  # "https://example.com/path"

# Setting href re-parses the entire URL
url.href = "https://other.com/new"
print(url.hostname)  # "other.com"

`protocol`¶

The scheme followed by ::

url = URL("https://example.com")
print(url.protocol)  # "https:"

url.protocol = "http:"
print(url.href)  # "http://example.com/"

Protocol Restrictions

Changing between special and non-special schemes may fail silently or produce unexpected results per the spec.

`hostname`¶

The host domain name or IP address:

url = URL("https://example.com:8080/path")
print(url.hostname)  # "example.com"

url.hostname = "other.com"
print(url.href)  # "https://other.com:8080/path"

`port`¶

The port number as a string (empty if default):

url = URL("https://example.com:8080/path")
print(url.port)  # "8080"

url = URL("https://example.com/path")
print(url.port)  # "" (empty, using default 443)

url.port = "9000"
print(url.href)  # "https://example.com:9000/path"

`host`¶

The host combining hostname and port:

url = URL("https://example.com:8080/path")
print(url.host)  # "example.com:8080"

url = URL("https://example.com/path")
print(url.host)  # "example.com" (no port for default)

url.host = "other.com:3000"
print(url.hostname)  # "other.com"
print(url.port)      # "3000"

`origin`¶

The origin (scheme, hostname, and port) — read-only:

url = URL("https://example.com:8080/path?query#hash")
print(url.origin)  # "https://example.com:8080"

# Origin is read-only per the spec
# url.origin = "..."  # This would raise an error

Origin Serialization

Origin is serialized according to the origin serialization algorithm in the HTML Standard.

`username` and `password`¶

Credentials in the URL:

url = URL("https://user:pass@example.com/path")
print(url.username)  # "user"
print(url.password)  # "pass"

url.username = "newuser"
url.password = "newpass"
print(url.href)  # "https://newuser:newpass@example.com/path"

Security Note

Embedding credentials in URLs is generally discouraged for security reasons. Consider using HTTP authentication headers instead.

`pathname`¶

The path portion of the URL:

url = URL("https://example.com/path/to/page")
print(url.pathname)  # "/path/to/page"

url.pathname = "/new/path"
print(url.href)  # "https://example.com/new/path"

`search`¶

The query string including ?:

url = URL("https://example.com/path?query=value")
print(url.search)  # "?query=value"

url.search = "?new=query"
print(url.href)  # "https://example.com/path?new=query"

# Remove query string
url.search = ""
print(url.href)  # "https://example.com/path"

`search_params`¶

A URLSearchParams object for query manipulation (read-only property, but the object is mutable). Supports Pythonic dictionary-style access:

url = URL("https://example.com/path?a=1&b=2")

# Read parameters using dictionary syntax
print(url.search_params["a"])  # "1"

# Modify parameters (updates the URL automatically)
url.search_params["a"] = "100"
url.search_params.append("c", "3")
print(url.search)  # "?a=100&b=2&c=3"

# Delete parameters
del url.search_params["b"]
print(url.search)  # "?a=100&c=3"

`hash`¶

The fragment identifier including #:

url = URL("https://example.com/path#section")
print(url.hash)  # "#section"

url.hash = "#new-section"
print(url.href)  # "https://example.com/path#new-section"

# Remove hash
url.hash = ""
print(url.href)  # "https://example.com/path"

IPv6 Addresses¶

IPv6 addresses are enclosed in brackets per host serialization:

url = URL("https://[::1]:8080/path")
print(url.hostname)  # "[::1]"
print(url.host)      # "[::1]:8080"

Component Encoding¶

Components are automatically percent-encoded as needed:

url = URL("https://example.com/path")

# Spaces are encoded in paths
url.pathname = "/hello world"
print(url.pathname)  # "/hello%20world"

# Special characters in query
url.search = "?name=John Doe&city=New York"
print(url.search)  # "?name=John%20Doe&city=New%20York"

Encode Sets

Different URL components use different percent-encode sets. For example, the query component uses a different set than the path component.

Next Steps¶

Learn about URLSearchParams for query string manipulation
See the complete API Reference

URL Components¶

URL Structure¶

Component Properties¶

href¶

protocol¶

hostname¶

port¶

host¶

origin¶

username and password¶

pathname¶

search¶

search_params¶

hash¶