URL Utilities¶
The Link
class is the key data structure
of the darc
project, it contains all information
required to identify a URL’s proxy type, hostname, path prefix
when saving, etc.
The link
module also provides several wrapper
function to the urllib.parse
module.
-
class
darc.link.
Link
(url, proxy, host, base, name, url_parse, url_backref=None)[source]¶ Bases:
object
Parsed link.
- Parameters
url (
str
) – original linkproxy (
str
) – proxy typebase (
str
) – base folder for saving filesname (
str
) – hashed link for saving filesurl_parse (
ParseResult
) – parsed URL fromurllib.parse.urlparse()
url_backref (
Optional
[Link
]) – optionalLink
instance from which current link was extracted
- Returns
Parsed link object.
- Return type
-
url_backref
: Optional[darc.link.Link] = None¶ optional
Link
instance from which current link was extracted
-
url_parse
: urllib.parse.ParseResult¶ parsed URL from
urllib.parse.urlparse()
-
darc.link.
parse_link
(link, host=None, *, backref=None)[source]¶ Parse link.
- Parameters
link (
str
) – link to be parsedbackref (Optional[darc.link.Link]) –
- Keyword Arguments
backref – optional
Link
instance from which current link was extracted- Return type
- Returns
The parsed link object.
Note
If
host
is provided, it will override the hostname of the originallink
.The parsing process of proxy type is as follows:
If
host
isNone
and the parse result fromurllib.parse.urlparse()
has nonetloc
(or hostname) specified, then sethostname
as(null)
; else set it as is.If the scheme is
data
, then thelink
is a data URI, sethostname
asdata
andproxy
asdata
.If the scheme is
javascript
, then the link is some JavaScript codes, setproxy
asscript
.If the scheme is
bitcoin
, then the link is a Bitcoin address, setproxy
asbitcoin
.If the scheme is
ethereum
, then the link is an Ethereum address, setproxy
asethereum
.If the scheme is
ed2k
, then the link is an ED2K magnet link, setproxy
ased2k
.If the scheme is
magnet
, then the link is a magnet link, setproxy
asmagnet
.If the scheme is
mailto
, then the link is an email address, setproxy
asmail
.If the scheme is
irc
, then the link is an IRC link, setproxy
asirc
.If the scheme is NOT any of
http
orhttps
, then setproxy
to the scheme.If the host is
None
, sethostname
to(null)
, setproxy
tonull
.If the host is an onion (
.onion
) address, setproxy
totor
.If the host is an I2P (
.i2p
) address, or
- any of
localhost:7657
andlocalhost:7658
, set
proxy
toi2p
.
If the host is localhost on
ZERONET_PORT
, and the path is not/
, i.e. NOT root path, setproxy
tozeronet
; and set the first part of its path ashostname
.Example:
For a ZeroNet address, e.g. http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D,
parse_link()
will parse thehostname
as1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D
.If the host is localhost on
FREENET_PORT
, and the path is not/
, i.e. NOT root path, setproxy
tofreenet
; and set the first part of its path ashostname
.Example:
For a Freenet address, e.g. http://127.0.0.1:8888/USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/77/,
parse_link()
will parse thehostname
asUSK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE
.If the host is a proxied onion (
.onion.sh
) address, setproxy
totor2web
.If none of the cases above satisfied, the
proxy
will be set asnull
, marking it a plain normal link.
The
base
for parsed linkLink
object is defined as<root>/<proxy>/<scheme>/<hostname>/
where
root
isPATH_DB
.The
name
for parsed linkLink
object is the sha256 hash (c.f.hashlib.sha256()
) of the originallink
.
-
darc.link.
quote
(string, safe='/', encoding=None, errors=None)[source]¶ Wrapper function for
urllib.parse.quote()
.- Parameters
- Return type
- Returns
The quoted string.
Note
The function suppressed possible errors when calling
urllib.parse.quote()
. If any, it will return the original string.
-
darc.link.
unquote
(string, encoding='utf-8', errors='replace')[source]¶ Wrapper function for
urllib.parse.unquote()
.- Parameters
- Return type
- Returns
The quoted string.
Note
The function suppressed possible errors when calling
urllib.parse.unquote()
. If any, it will return the original string.
-
darc.link.
urljoin
(base, url, allow_fragments=True)[source]¶ Wrapper function for
urllib.parse.urljoin()
.- Parameters
- Return type
- Returns
The joined URL.
Note
The function suppressed possible errors when calling
urllib.parse.urljoin()
. If any, it will returnbase/url
directly.
-
darc.link.
urlparse
(url, scheme='', allow_fragments=True)[source]¶ Wrapper function for
urllib.parse.urlparse()
.- Parameters
- Return type
- Returns
The parse result.
Note
The function suppressed possible errors when calling
urllib.parse.urlparse()
. If any, it will returnurllib.parse.ParseResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')
directly.
-
darc.link.
urlsplit
(url, scheme='', allow_fragments=True)[source]¶ Wrapper function for
urllib.parse.urlsplit()
.- Parameters
- Return type
- Returns
The split result.
Note
The function suppressed possible errors when calling
urllib.parse.urlsplit()
. If any, it will returnurllib.parse.SplitResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')
directly.