Skip to content

URL Components

A URL is composed of several components. This guide explains each component and how to access and modify them according to the WHATWG URL Standard.

URL Structure

A complete URL has the following structure per the URL representation:

  https://user:pass@example.com:8080/path/page?query=value#section
  └─┬──┘ └──┬───┘ └────┬─────┘└─┬─┘└────┬────┘└─────┬────┘└───┬──┘
 protocol username   hostname  port  pathname    search     hash
           password            └────────┬───────┘
                                       host
          └────────────────┬───────────────────┘
                          origin

Component Properties

The following properties correspond to the URL interface in the WHATWG specification.

href

The serialized URL as a complete string:

url = URL("https://example.com/path")
print(url.href)  # "https://example.com/path"

# Setting href re-parses the entire URL
url.href = "https://other.com/new"
print(url.hostname)  # "other.com"

protocol

The scheme followed by ::

url = URL("https://example.com")
print(url.protocol)  # "https:"

url.protocol = "http:"
print(url.href)  # "http://example.com/"

Protocol Restrictions

Changing between special and non-special schemes may fail silently or produce unexpected results per the spec.

hostname

The host domain name or IP address:

url = URL("https://example.com:8080/path")
print(url.hostname)  # "example.com"

url.hostname = "other.com"
print(url.href)  # "https://other.com:8080/path"

port

The port number as a string (empty if default):

url = URL("https://example.com:8080/path")
print(url.port)  # "8080"

url = URL("https://example.com/path")
print(url.port)  # "" (empty, using default 443)

url.port = "9000"
print(url.href)  # "https://example.com:9000/path"

host

The host combining hostname and port:

url = URL("https://example.com:8080/path")
print(url.host)  # "example.com:8080"

url = URL("https://example.com/path")
print(url.host)  # "example.com" (no port for default)

url.host = "other.com:3000"
print(url.hostname)  # "other.com"
print(url.port)      # "3000"

origin

The origin (scheme, hostname, and port) — read-only:

url = URL("https://example.com:8080/path?query#hash")
print(url.origin)  # "https://example.com:8080"

# Origin is read-only per the spec
# url.origin = "..."  # This would raise an error

Origin Serialization

Origin is serialized according to the origin serialization algorithm in the HTML Standard.

username and password

Credentials in the URL:

url = URL("https://user:pass@example.com/path")
print(url.username)  # "user"
print(url.password)  # "pass"

url.username = "newuser"
url.password = "newpass"
print(url.href)  # "https://newuser:newpass@example.com/path"

Security Note

Embedding credentials in URLs is generally discouraged for security reasons. Consider using HTTP authentication headers instead.

pathname

The path portion of the URL:

url = URL("https://example.com/path/to/page")
print(url.pathname)  # "/path/to/page"

url.pathname = "/new/path"
print(url.href)  # "https://example.com/new/path"

The query string including ?:

url = URL("https://example.com/path?query=value")
print(url.search)  # "?query=value"

url.search = "?new=query"
print(url.href)  # "https://example.com/path?new=query"

# Remove query string
url.search = ""
print(url.href)  # "https://example.com/path"

search_params

A URLSearchParams object for query manipulation (read-only property, but the object is mutable). Supports Pythonic dictionary-style access:

url = URL("https://example.com/path?a=1&b=2")

# Read parameters using dictionary syntax
print(url.search_params["a"])  # "1"

# Modify parameters (updates the URL automatically)
url.search_params["a"] = "100"
url.search_params.append("c", "3")
print(url.search)  # "?a=100&b=2&c=3"

# Delete parameters
del url.search_params["b"]
print(url.search)  # "?a=100&c=3"

hash

The fragment identifier including #:

url = URL("https://example.com/path#section")
print(url.hash)  # "#section"

url.hash = "#new-section"
print(url.href)  # "https://example.com/path#new-section"

# Remove hash
url.hash = ""
print(url.href)  # "https://example.com/path"

IPv6 Addresses

IPv6 addresses are enclosed in brackets per host serialization:

url = URL("https://[::1]:8080/path")
print(url.hostname)  # "[::1]"
print(url.host)      # "[::1]:8080"

Component Encoding

Components are automatically percent-encoded as needed:

url = URL("https://example.com/path")

# Spaces are encoded in paths
url.pathname = "/hello world"
print(url.pathname)  # "/hello%20world"

# Special characters in query
url.search = "?name=John Doe&city=New York"
print(url.search)  # "?name=John%20Doe&city=New%20York"

Encode Sets

Different URL components use different percent-encode sets. For example, the query component uses a different set than the path component.

Further Reading

Next Steps