Sites Customisation

As websites may have authentication requirements, etc., over its content, the darc.sites module provides sites customisation hooks to both requests and selenium crawling processes.

To customise behaviours over requests, you sites customisation module should have a crawler() function, e.g. crawler().

The function takes the requests.Session object with proxy settings and a Link object representing the link to be crawled, then returns a requests.Response object containing the final data of the crawling process.

darc.sites.crawler_hook(link, session)

Customisation as to requests sessions.

Parameters
Returns

The final response object with crawled data.

Return type

requests.Response

See also

To customise behaviours over selenium, you sites customisation module should have a loader() function, e.g. loader().

The function takes the selenium.webdriver.Chrome object with proxy settings and a Link object representing the link to be loaded, then returns the selenium.webdriver.Chrome object containing the final data of the loading process.

darc.sites.loader_hook(link, driver)

Customisation as to selenium drivers.

Parameters
Returns

The web driver object with loaded data.

Return type

selenium.webdriver.Chrome

See also

To tell the darc project which sites customisation module it should use for a certain hostname, you can register such module to the SITEMAP mapping dictionary.

darc.sites.SITEMAP: DefaultDict[str, str]
SITEMAP = collections.defaultdict(lambda: 'default', {
    # 'www.sample.com': 'sample',  # darc.sites.sample
})

The mapping dictionary for hostname to sites customisation modules.

The fallback value is default, c.f. darc.sites.default.

darc.sites._get_spec(link)

Load spec if any.

If the sites customisation failed to import, it will fallback to the default hooks, default.

Parameters

link (darc.link.Link) – Link object to fetch sites customisation module.

Returns

The sites customisation module.

Return type

types.ModuleType

Warns

SiteNotFoundWarning – If the sites customisation failed to import.