No Proxy¶
The darc.proxy.null
module contains the auxiliary functions
around managing and processing normal websites with no proxy.
- darc.proxy.null.fetch_sitemap(link, force=False)[source]¶
Fetch sitemap.
The function will first fetch the
robots.txt
, then fetch the sitemaps accordingly.- Parameters:
- Return type:
- Returns:
Contents of
robots.txt
and sitemaps.
See also
darc.parse.get_sitemap()
- darc.proxy.null.get_sitemap(link, text, host=None)[source]¶
Fetch link to other sitemaps from a sitemap.
- Parameters:
- Return type:
- Returns:
List of link to sitemaps.
Note
As specified in the sitemap protocol, it may contain links to other sitemaps. [*]
- darc.proxy.null.read_robots(link, text, host=None)[source]¶
Read
robots.txt
to fetch link to sitemaps.- Parameters:
- Return type:
- Returns:
List of link to sitemaps.
- darc.proxy.null.save_invalid(link)[source]¶
Save link with invalid scheme.
The function will save link with invalid scheme to the file as defined in
PATH
.
- darc.proxy.null.save_robots(link, text)[source]¶
Save
robots.txt
.- Parameters:
- Return type:
- Returns:
Saved path to
robots.txt
, i.e.<root>/<proxy>/<scheme>/<hostname>/robots.txt
.
See also
- darc.proxy.null.save_sitemap(link, text)[source]¶
Save sitemap.
- Parameters:
- Return type:
- Returns:
Saved path to sitemap, i.e.
<root>/<proxy>/<scheme>/<hostname>/sitemap_<hash>.xml
.
See also
- darc.proxy.null.PATH = '{PATH_MISC}/invalid.txt'¶
Path to the data storage of links with invalid scheme.
See also
- darc.proxy.null.LOCK: multiprocessing.Lock | threading.Lock | contextlib.nullcontext¶
I/O lock for saving links with invalid scheme
PATH
.See also