URL Utilities¶
The Link
class is the key data structure
of the darc
project, it contains all information
required to identify a URL’s proxy type, hostname, path prefix
when saving, etc.
The link
module also provides several wrapper
function to the urllib.parse
module.
-
class
darc.link.
Link
(url, proxy, url_parse, host, base, name)[source]¶ Bases:
object
Parsed link.
- Parameters
url – original link
proxy – proxy type
host – URL’s hostname
base – base folder for saving files
name – hashed link for saving files
url_parse – parsed URL from
urllib.parse.urlparse()
- Returns
Parsed link object.
- Return type
-
__init__
(url, proxy, url_parse, host, base, name)¶ Initialize self. See help(type(self)) for accurate signature.
-
url_parse
: urllib.parse.ParseResult¶ parsed URL from
urllib.parse.urlparse()
-
darc.link.
parse_link
(link, host=None)[source]¶ Parse link.
- Parameters
- Returns
The parsed link object.
- Return type
Note
If
host
is provided, it will override the hostname of the originallink
.The parsing process of proxy type is as follows:
If
host
isNone
and the parse result fromurllib.parse.urlparse()
has nonetloc
(or hostname) specified, then sethostname
as(null)
; else set it as is.If the scheme is
data
, then thelink
is a data URI, sethostname
asdata
andproxy
asdata
.If the scheme is
javascript
, then the link is some JavaScript codes, setproxy
asscript
.If the scheme is
bitcoin
, then the link is a Bitcoin address, setproxy
asbitcoin
.If the scheme is
ed2k
, then the link is an ED2K magnet link, setproxy
ased2k
.If the scheme is
magnet
, then the link is a magnet link, setproxy
asmagnet
.If the scheme is
mailto
, then the link is an email address, setproxy
asmail
.If the scheme is
irc
, then the link is an IRC link, setproxy
asirc
.If the scheme is NOT any of
http
orhttps
, then setproxy
to the scheme.If the host is
None
, sethostname
to(null)
, setproxy
tonull
.If the host is an onion (
.onion
) address, setproxy
totor
.If the host is an I2P (
.i2p
) address, or any oflocalhost:7657
andlocalhost:7658
, setproxy
toi2p
.If the host is localhost on
ZERONET_PORT
, and the path is not/
, i.e. NOT root path, setproxy
tozeronet
; and set the first part of its path ashostname
.Example:
For a ZeroNet address, e.g. http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D,
parse_link()
will parse thehostname
as1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D
.If the host is localhost on
FREENET_PORT
, and the path is not/
, i.e. NOT root path, setproxy
tofreenet
; and set the first part of its path ashostname
.Example:
For a Freenet address, e.g. http://127.0.0.1:8888/USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/77/,
parse_link()
will parse thehostname
asUSK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE
.If none of the cases above satisfied, the
proxy
will be set asnull
, marking it a plain normal link.
The
base
for parsed linkLink
object is defined as<root>/<proxy>/<scheme>/<hostname>/
where
root
isPATH_DB
.The
name
for parsed linkLink
object is the sha256 hash (c.f.hashlib.sha256()
) of the originallink
.
-
darc.link.
quote
(string, safe='/', encoding=None, errors=None)[source]¶ Wrapper function for
urllib.parse.quote()
.- Parameters
- Returns
The quoted string.
- Return type
Note
The function suppressed possible errors when calling
urllib.parse.quote()
. If any, it will return the original string.
-
darc.link.
unquote
(string, encoding='utf-8', errors='replace')[source]¶ Wrapper function for
urllib.parse.unquote()
.- Parameters
- Returns
The quoted string.
- Return type
Note
The function suppressed possible errors when calling
urllib.parse.unquote()
. If any, it will return the original string.
-
darc.link.
urljoin
(base, url, allow_fragments=True)[source]¶ Wrapper function for
urllib.parse.urljoin()
.- Parameters
base (AnyStr) – base URL
url (AnyStr) – URL to be joined
allow_fragments (bool) – if allow fragments
- Returns
The joined URL.
- Return type
Note
The function suppressed possible errors when calling
urllib.parse.urljoin()
. If any, it will returnbase/url
directly.
-
darc.link.
urlparse
(url, scheme='', allow_fragments=True)[source]¶ Wrapper function for
urllib.parse.urlparse()
.- Parameters
- Returns
The parse result.
- Return type
Note
The function suppressed possible errors when calling
urllib.parse.urlparse()
. If any, it will returnurllib.parse.ParseResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')
directly.