URL Utilities¶
The Link class is the key data structure
of the darc project, it contains all information
required to identify a URL’s proxy type, hostname, path prefix
when saving, etc.
The link module also provides several wrapper
function to the urllib.parse module.
- class darc.link.Link(url, proxy, host, base, name, url_parse, url_backref=None)[source]¶
Bases:
objectParsed link.
- Parameters
url (
str) – original linkproxy (
str) – proxy typebase (
str) – base folder for saving filesname (
str) – hashed link for saving filesurl_parse (
ParseResult) – parsed URL fromurllib.parse.urlparse()url_backref (
Optional[Link]) – optionalLinkinstance from which current link was extracted
- Returns
Parsed link object.
- Return type
- url_backref: Optional[darc.link.Link] = None¶
optional
Linkinstance from which current link was extracted
- url_parse: urllib.parse.ParseResult¶
parsed URL from
urllib.parse.urlparse()
- darc.link.parse_link(link, host=None, *, backref=None)[source]¶
Parse link.
- Parameters
link (
str) – link to be parsedbackref (Optional[darc.link.Link]) –
- Keyword Arguments
backref – optional
Linkinstance from which current link was extracted- Return type
- Returns
The parsed link object.
Note
If
hostis provided, it will override the hostname of the originallink.The parsing process of proxy type is as follows:
If
hostisNoneand the parse result fromurllib.parse.urlparse()has nonetloc(or hostname) specified, then sethostnameas(null); else set it as is.If the scheme is
data, then thelinkis a data URI, sethostnameasdataandproxyasdata.If the scheme is
javascript, then the link is some JavaScript codes, setproxyasscript.If the scheme is
bitcoin, then the link is a Bitcoin address, setproxyasbitcoin.If the scheme is
ethereum, then the link is an Ethereum address, setproxyasethereum.If the scheme is
ed2k, then the link is an ED2K magnet link, setproxyased2k.If the scheme is
magnet, then the link is a magnet link, setproxyasmagnet.If the scheme is
mailto, then the link is an email address, setproxyasmail.If the scheme is
irc, then the link is an IRC link, setproxyasirc.If the scheme is NOT any of
httporhttps, then setproxyto the scheme.If the host is
None, sethostnameto(null), setproxytonull.If the host is an onion (
.onion) address, setproxytotor.If the host is an I2P (
.i2p) address, or any oflocalhost:7657andlocalhost:7658, setproxytoi2p.If the host is localhost on
ZERONET_PORT, and the path is not/, i.e. NOT root path, setproxytozeronet; and set the first part of its path ashostname.Example:
For a ZeroNet address, e.g.,
http://127.0.0.1:43110/1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D,parse_link()will parse thehostnameas1HeLLo4uzjaLetFx6NH3PMwFP3qbRbTf3D.If the host is localhost on
FREENET_PORT, and the path is not/, i.e. NOT root path, setproxytofreenet; and set the first part of its path ashostname.Example:
For a Freenet address, e.g.,
http://127.0.0.1:8888/USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/77/,parse_link()will parse thehostnameasUSK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE.If the host is a proxied onion (
.onion.sh) address, setproxytotor2web.If none of the cases above satisfied, the
proxywill be set asnull, marking it a plain normal link.
The
basefor parsed linkLinkobject is defined as<root>/<proxy>/<scheme>/<hostname>/
where
rootisPATH_DB.The
namefor parsed linkLinkobject is the sha256 hash (c.f.hashlib.sha256()) of the originallink.
- darc.link.quote(string, safe='/', encoding=None, errors=None)[source]¶
Wrapper function for
urllib.parse.quote().- Parameters
- Return type
- Returns
The quoted string.
Note
The function suppressed possible errors when calling
urllib.parse.quote(). If any, it will return the original string.
- darc.link.unquote(string, encoding='utf-8', errors='replace')[source]¶
Wrapper function for
urllib.parse.unquote().- Parameters
- Return type
- Returns
The quoted string.
Note
The function suppressed possible errors when calling
urllib.parse.unquote(). If any, it will return the original string.
- darc.link.urljoin(base, url, allow_fragments=True)[source]¶
Wrapper function for
urllib.parse.urljoin().- Parameters
- Return type
- Returns
The joined URL.
Note
The function suppressed possible errors when calling
urllib.parse.urljoin(). If any, it will returnbase/urldirectly.
- darc.link.urlparse(url, scheme='', allow_fragments=True)[source]¶
Wrapper function for
urllib.parse.urlparse().- Parameters
- Return type
- Returns
The parse result.
Note
The function suppressed possible errors when calling
urllib.parse.urlparse(). If any, it will returnurllib.parse.ParseResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')directly.
- darc.link.urlsplit(url, scheme='', allow_fragments=True)[source]¶
Wrapper function for
urllib.parse.urlsplit().- Parameters
- Return type
- Returns
The split result.
Note
The function suppressed possible errors when calling
urllib.parse.urlsplit(). If any, it will returnurllib.parse.SplitResult(scheme=scheme, netloc='', path=url, params='', query='', fragment='')directly.