Sites Customisation

As websites may have authentication requirements, etc., over its content, the darc.sites module provides sites customisation hooks to both requests and selenium crawling processes.

To customise behaviours over requests, you sites customisation module should have a crawler() function, e.g. crawler().

The function takes the requests.Session object with proxy settings and a Link object representing the link to be crawled, then returns a requests.Response object containing the final data of the crawling process.

To customise behaviours over selenium, you sites customisation module should have a loader() function, e.g. loader().

The function takes the WebDriver object with proxy settings and a Link object representing the link to be loaded, then returns the WebDriver object containing the final data of the loading process.

To tell the darc project which sites customisation module it should use for a certain hostname, you can register such module to the SITEMAP mapping dictionary.

darc.sites.SITEMAP: DefaultDict[str, str]
SITEMAP = collections.defaultdict(lambda: 'default', {
    # 'www.sample.com': 'sample',  # darc.sites.sample
})

The mapping dictionary for hostname to sites customisation modules.

The fallback value is default, c.f. darc.sites.default.