URL Utilities

The Link class is the key data structure of the darc project, it contains all information required to identify a URL’s proxy type, hostname, path prefix when saving, etc.

The link module also provides several wrapper function to the urllib.parse.

Bases: object

Parsed link.

Parameters
  • url (str) – original link

  • proxy (str) – proxy type

  • host (str) – URL’s hostname

  • base (str) – base folder for saving files

  • name (str) – hashed link for saving files

  • url_parse (urllib.parse.ParseResult) – parsed URL from urllib.parse.urlparse()

Returns

Parsed link object.

Return type

Link

Note

Link is a dataclass object. It is safely hashable, through hash(url).

__hash__()

Provide hash support to the Link object.

base: str = None

base folder for saving files

host: str = None

URL’s hostname

name: str = None

hashed link for saving files

proxy: str = None

proxy type

url: str = None

original link

url_parse: urllib.parse.ParseResult = None

parsed URL from urllib.parse.urlparse()

Parse link.

Parameters
  • link (str) – link to be parsed

  • host (Optional[str]) – hostname of the link

Returns

The parsed link object.

Return type

darc.link.Link

Note

If host is provided, it will override the hostname of the original link.

The parsing process of proxy type is as follows:

  1. If host is None and the parse result from urllib.parse.urlparse() has no netloc (or hostname) specified, then set hostname as (null); else set it as is.

  2. If the scheme is data, then the link is a data URI, set hostname as data and proxy as data.

  3. If the scheme is javascript, then the link is some JavaScript codes, set proxy as script.

  4. If the scheme is bitcoin, then the link is a Bitcoin address, set proxy as bitcoin.

  5. If the scheme is ed2k, then the link is an ED2K magnet link, set proxy as ed2k.

  6. If the scheme is magnet, then the link is a magnet link, set proxy as magnet.

  7. If the scheme is mailto, then the link is an email address, set proxy as mail.

  8. If the scheme is irc, then the link is an IRC link, set proxy as irc.

  9. If the scheme is NOT any of http or https, then set proxy to the scheme.

  10. If the host is None, set hostname to (null), set proxy to null.

  11. If the host is an onion (.onion) address, set proxy to tor.

  12. If the host is an I2P (.i2p) address, or any of localhost:7657 and localhost:7658, set proxy to i2p.

  13. If the host is localhost on ZERONET_PORT, and the path is not /, i.e. NOT root path, set proxy to zeronet; and set the first part of its path as hostname.

    Example:

    For a ZeroNet address, e.g. http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D, parse_link() will parse the hostname as 1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D.

  14. If the host is localhost on FREENET_PORT, and the path is not /, i.e. NOT root path, set proxy to freenet; and set the first part of its path as hostname.

    Example:

    For a Freenet address, e.g. http://127.0.0.1:8888/USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/77/, parse_link() will parse the hostname as USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE.

  15. If none of the cases above satisfied, the proxy will be set as null, marking it a plain normal link.

The base for parsed link Link object is defined as

<root>/<proxy>/<scheme>/<hostname>/

where root is PATH_DB.

The name for parsed link Link object is the sha256 hash (c.f. hashlib.sha256()) of the original link.

darc.link.quote(string, safe='/', encoding=None, errors=None)

Wrapper function for urllib.parse.quote().

Parameters
  • string (AnyStr) – string to be quoted

  • safe (AnyStr) – charaters not to escape

  • encoding (Optional[str]) – string encoding

  • errors (Optional[str]) – encoding error handler

Returns

The quoted string.

Return type

str

Note

The function suppressed possible errors when calling urllib.parse.quote(). If any, it will return the original string.

darc.link.unquote(string, encoding='utf-8', errors='replace')

Wrapper function for urllib.parse.unquote().

Parameters
  • string (AnyStr) – string to be unquoted

  • encoding (str) – string encoding

  • errors (str) – encoding error handler

Returns

The quoted string.

Return type

str

Note

The function suppressed possible errors when calling urllib.parse.unquote(). If any, it will return the original string.

darc.link.urljoin(base, url, allow_fragments=True)

Wrapper function for urllib.parse.urljoin().

Parameters
  • base (AnyStr) – base URL

  • url (AnyStr) – URL to be joined

  • allow_fragments (bool) – if allow fragments

Returns

The joined URL.

Return type

str

Note

The function suppressed possible errors when calling urllib.parse.urljoin(). If any, it will return base/url directly.

darc.link.urlparse(url, scheme='', allow_fragments=True)

Wrapper function for urllib.parse.urlparse().

Parameters
  • url (str) – URL to be parsed

  • scheme (str) – URL scheme

  • allow_fragments (bool) – if allow fragments

Returns

The parse result.

Return type

urllib.parse.ParseResult

Note

The function suppressed possible errors when calling urllib.parse.urlparse(). If any, it will return urllib.parse.ParseResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='') directly.