Default Hooks¶
The darc.sites.default module is the fallback for sites
customisation.
-
class
darc.sites.default.DefaultSite[source]¶ Bases:
darc.sites._abc.BaseSiteDefault hooks.
-
static
crawler(timestamp, session, link)[source]¶ Default crawler hook.
- Parameters
timestamp (datetime.datetime) – Timestamp of the worker node reference.
session (requests.Session) – Session object with proxy settings.
link (darc.link.Link) – Link object to be crawled.
- Returns
The final response object with crawled data.
- Return type
See also
-
static
loader(timestamp, driver, link)[source]¶ Default loader hook.
When loading, if
SE_WAITis a valid time lapse, the function will sleep for such time to wait for the page to finish loading contents.- Parameters
timestamp (datetime.datetime) – Timestamp of the worker node reference.
driver (selenium.webdriver.Chrome) – Web driver object with proxy settings.
link (darc.link.Link) – Link object to be loaded.
- Returns
The web driver object with loaded data.
- Return type
selenium.webdriver.Chrome
Note
Internally,
seleniumwill wait for the browser to finish loading the pages before return (i.e. the web API eventDOMContentLoaded). However, some extra scripts may take more time running after the event.See also
-
static