Sites Customisation¶
As websites may have authentication requirements, etc., over
its content, the darc.sites
module provides sites
customisation hooks to both requests
and selenium
crawling processes.
To customise behaviours over requests
, you sites customisation
module should have a crawler()
function, e.g.
crawler()
.
The function takes the requests.Session
object with proxy settings and
a Link
object representing the link to be
crawled, then returns a requests.Response
object containing the final
data of the crawling process.
-
darc.sites.
crawler_hook
(link, session)[source]¶ Customisation as to
requests
sessions.- Parameters
link (darc.link.Link) – Link object to be crawled.
session (
requests.Session
) – Session object with proxy settings.
- Returns
The final response object with crawled data.
- Return type
See also
darc.sites.SITE_MAP
To customise behaviours over selenium
, you sites customisation
module should have a loader()
function, e.g.
loader()
.
The function takes the selenium.webdriver.Chrome
object with proxy settings and
a Link
object representing the link to be
loaded, then returns the selenium.webdriver.Chrome
object containing the final
data of the loading process.
-
darc.sites.
loader_hook
(link, driver)[source]¶ Customisation as to
selenium
drivers.- Parameters
link (darc.link.Link) – Link object to be loaded.
driver (
selenium.webdriver.Chrome
) – Web driver object with proxy settings.
- Returns
The web driver object with loaded data.
- Return type
See also
darc.sites.SITE_MAP
To tell the darc
project which sites customisation
module it should use for a certain hostname, you can register
such module to the SITEMAP
mapping dictionary.
-
darc.sites.
SITEMAP
: DefaultDict[str, str]¶ SITEMAP = collections.defaultdict(lambda: 'default', { # 'www.sample.com': 'sample', # darc.sites.sample })
The mapping dictionary for hostname to sites customisation modules.
The fallback value is
default
, c.f.darc.sites.default
.
-
darc.sites.
_get_spec
(link)[source]¶ Load spec if any.
If the sites customisation failed to import, it will fallback to the default hooks,
default
.- Parameters
link (darc.link.Link) – Link object to fetch sites customisation module.
- Returns
The sites customisation module.
- Return type
- Warns
SiteNotFoundWarning – If the sites customisation failed to import.
See also