URL Utilities¶
The Link class is the key data structure
of the darc project, it contains all information
required to identify a URL’s proxy type, hostname, path prefix
when saving, etc.
The link module also provides several wrapper
function to the urllib.parse.
-
class
darc.link.Link(url, proxy, url_parse, host, base, name)¶ Bases:
objectParsed link.
- Parameters
url (str) – original link
proxy (str) – proxy type
host (str) – URL’s hostname
base (str) – base folder for saving files
name (str) – hashed link for saving files
url_parse (urllib.parse.ParseResult) – parsed URL from
urllib.parse.urlparse()
- Returns
Parsed link object.
- Return type
-
base: str = None¶ base folder for saving files
-
host: str = None¶ URL’s hostname
-
name: str = None¶ hashed link for saving files
-
proxy: str = None¶ proxy type
-
url: str = None¶ original link
-
url_parse: urllib.parse.ParseResult = None¶ parsed URL from
urllib.parse.urlparse()
-
darc.link.parse_link(link, host=None)¶ Parse link.
- Parameters
link (str) – link to be parsed
host (Optional[str]) – hostname of the link
- Returns
The parsed link object.
- Return type
Note
If
hostis provided, it will override the hostname of the originallink.The parsing process of proxy type is as follows:
If
hostisNoneand the parse result fromurllib.parse.urlparse()has nonetloc(or hostname) specified, then sethostnameas(null); else set it as is.If the scheme is
data, then thelinkis a data URI, sethostnameasdataandproxyasdata.If the scheme is
javascript, then the link is some JavaScript codes, setproxyasscript.If the scheme is
bitcoin, then the link is a Bitcoin address, setproxyasbitcoin.If the scheme is
ed2k, then the link is an ED2K magnet link, setproxyased2k.If the scheme is
magnet, then the link is a magnet link, setproxyasmagnet.If the scheme is
mailto, then the link is an email address, setproxyasmail.If the scheme is
irc, then the link is an IRC link, setproxyasirc.If the scheme is NOT any of
httporhttps, then setproxyto the scheme.If the host is
None, sethostnameto(null), setproxytonull.If the host is an onion (
.onion) address, setproxytotor.If the host is an I2P (
.i2p) address, or any oflocalhost:7657andlocalhost:7658, setproxytoi2p.If the host is localhost on
ZERONET_PORT, and the path is not/, i.e. NOT root path, setproxytozeronet; and set the first part of its path ashostname.Example:
For a ZeroNet address, e.g. http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D,
parse_link()will parse thehostnameas1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D.If the host is localhost on
FREENET_PORT, and the path is not/, i.e. NOT root path, setproxytofreenet; and set the first part of its path ashostname.Example:
For a Freenet address, e.g. http://127.0.0.1:8888/USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/77/,
parse_link()will parse thehostnameasUSK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE.If none of the cases above satisfied, the
proxywill be set asnull, marking it a plain normal link.
The
basefor parsed linkLinkobject is defined as<root>/<proxy>/<scheme>/<hostname>/
where
rootisPATH_DB.The
namefor parsed linkLinkobject is the sha256 hash (c.f.hashlib.sha256()) of the originallink.
-
darc.link.quote(string, safe='/', encoding=None, errors=None)¶ Wrapper function for
urllib.parse.quote().- Parameters
string (AnyStr) – string to be quoted
safe (AnyStr) – charaters not to escape
encoding (Optional[str]) – string encoding
errors (Optional[str]) – encoding error handler
- Returns
The quoted string.
- Return type
str
Note
The function suppressed possible errors when calling
urllib.parse.quote(). If any, it will return the original string.
-
darc.link.unquote(string, encoding='utf-8', errors='replace')¶ Wrapper function for
urllib.parse.unquote().- Parameters
string (AnyStr) – string to be unquoted
encoding (str) – string encoding
errors (str) – encoding error handler
- Returns
The quoted string.
- Return type
str
Note
The function suppressed possible errors when calling
urllib.parse.unquote(). If any, it will return the original string.
-
darc.link.urljoin(base, url, allow_fragments=True)¶ Wrapper function for
urllib.parse.urljoin().- Parameters
base (AnyStr) – base URL
url (AnyStr) – URL to be joined
allow_fragments (bool) – if allow fragments
- Returns
The joined URL.
- Return type
str
Note
The function suppressed possible errors when calling
urllib.parse.urljoin(). If any, it will returnbase/urldirectly.
-
darc.link.urlparse(url, scheme='', allow_fragments=True)¶ Wrapper function for
urllib.parse.urlparse().- Parameters
url (str) – URL to be parsed
scheme (str) – URL scheme
allow_fragments (bool) – if allow fragments
- Returns
The parse result.
- Return type
urllib.parse.ParseResult
Note
The function suppressed possible errors when calling
urllib.parse.urlparse(). If any, it will returnurllib.parse.ParseResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')directly.