Submission Data Models¶
The darc.model.web module defines the data models
to store the data crawled from the darc project.
See also
Please refer to darc.submit module for more information
about data submission.
Hostname Records¶
The darc.model.web.hostname module defines the data model
representing hostnames, specifically from new_host submission.
See also
Please refer to darc.submit.submit_new_host() for more
information.
-
class
darc.model.web.hostname.HostnameModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for a hostname record.
Important
The alive of a hostname is toggled if
crawler()successfully requested a URL with such hostname.-
DoesNotExist¶ alias of
HostnameModelDoesNotExist
-
discovery: datetime.datetime = <DateTimeField: HostnameModel.discovery>¶ Timestamp of first
new_hostsubmission.
-
hosts¶
-
id= <AutoField: HostnameModel.id>¶
-
last_seen: datetime.datetime = <DateTimeField: HostnameModel.last_seen>¶ Timestamp of last related submission.
-
proxy: darc.model.utils.Proxy = <IntEnumField: HostnameModel.proxy>¶ Proxy type (c.f.
link.proxy).
-
robots¶
-
since: datetime.datetime = <DateTimeField: HostnameModel.since>¶ The hostname is active since this timestamp.
-
sitemaps¶
-
urls¶
-
URL Records¶
The darc.model.web.url module defines the data model
representing URLs, specifically from requests and
selenium submission.
See also
Please refer to darc.submit.submit_requests() and
darc.submit.submit_selenium() for more information.
-
class
darc.model.web.url.URLModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for a requested URL.
Important
The alive of a URL is toggled if
crawler()successfully requested such URL and the status code isok.-
DoesNotExist¶ alias of
URLModelDoesNotExist
-
discovery: datetime.datetime = <DateTimeField: URLModel.discovery>¶ Timestamp of first submission.
-
hostname: darc.model.web.hostname.HostnameModel = <ForeignKeyField: URLModel.hostname>¶ Hostname (c.f.
link.host).
-
hostname_id= <ForeignKeyField: URLModel.hostname>¶
-
id= <AutoField: URLModel.id>¶
-
last_seen: datetime.datetime = <DateTimeField: URLModel.last_seen>¶ Timestamp of last submission.
-
proxy: darc.model.utils.Proxy = <IntEnumField: URLModel.proxy>¶ Proxy type (c.f.
link.proxy).
-
requests¶
-
selenium¶
-
since: datetime.datetime = <DateTimeField: URLModel.since>¶ The hostname is active since this timestamp.
-
robots.txt Records¶
The darc.model.web.robots module defines the data model
representing robots.txt data, specifically from new_host
submission.
See also
Please refer to darc.submit.submit_new_host() for more
information.
-
class
darc.model.web.robots.RobotsModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for
robots.txtdata.-
DoesNotExist¶ alias of
RobotsModelDoesNotExist
-
host: darc.model.web.hostname.HostnameModel = <ForeignKeyField: RobotsModel.host>¶ Hostname (c.f.
link.host).
-
host_id= <ForeignKeyField: RobotsModel.host>¶
-
id= <AutoField: RobotsModel.id>¶
-
timestamp: datetime.datetime = <DateTimeField: RobotsModel.timestamp>¶ Timestamp of the submission.
-
sitemap.xml Records¶
The darc.model.web.sitemap module defines the data model
representing sitemap.xml data, specifically from new_host
submission.
See also
Please refer to darc.submit.submit_new_host() for more
information.
-
class
darc.model.web.sitemap.SitemapModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for
sitemap.xmldata.-
DoesNotExist¶ alias of
SitemapModelDoesNotExist
-
host: darc.model.web.hostname.HostnameModel = <ForeignKeyField: SitemapModel.host>¶ Hostname (c.f.
link.host).
-
host_id= <ForeignKeyField: SitemapModel.host>¶
-
id= <AutoField: SitemapModel.id>¶
-
timestamp: datetime.datetime = <DateTimeField: SitemapModel.timestamp>¶ Timestamp of the submission.
-
hosts.txt Records¶
The darc.model.web.hosts module defines the data model
representing hosts.txt data, specifically from new_host
submission.
See also
Please refer to darc.submit.submit_new_host() for more
information.
-
class
darc.model.web.hosts.HostsModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for
hosts.txtdata.-
DoesNotExist¶ alias of
HostsModelDoesNotExist
-
host: darc.model.web.hostname.HostnameModel = <ForeignKeyField: HostsModel.host>¶ Hostname (c.f.
link.host).
-
host_id= <ForeignKeyField: HostsModel.host>¶
-
id= <AutoField: HostsModel.id>¶
-
timestamp: datetime.datetime = <DateTimeField: HostsModel.timestamp>¶ Timestamp of the submission.
-
Crawler Records¶
The darc.model.web.requests module defines the data model
representing crawler, specifically
from requests submission.
See also
Please refer to darc.submit.submit_requests() for more
information.
-
class
darc.model.web.requests.RequestsHistoryModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for history records from
requestssubmission.-
DoesNotExist¶ alias of
RequestsHistoryModelDoesNotExist
Response cookies.
-
id= <AutoField: RequestsHistoryModel.id>¶
-
model: darc.model.web.requests.RequestsModel = <ForeignKeyField: RequestsHistoryModel.model>¶ Original record.
-
model_id= <ForeignKeyField: RequestsHistoryModel.model>¶
-
timestamp: datetime.datetime = <DateTimeField: RequestsHistoryModel.timestamp>¶ Timestamp of the submission.
-
-
class
darc.model.web.requests.RequestsModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for documents from
requestssubmission.-
DoesNotExist¶ alias of
RequestsModelDoesNotExist
Response cookies.
-
history¶
-
id= <AutoField: RequestsModel.id>¶
-
timestamp: datetime.datetime = <DateTimeField: RequestsModel.timestamp>¶ Timestamp of the submission.
-
url: darc.model.web.url.URLModel = <ForeignKeyField: RequestsModel.url>¶ Original URL (c.f.
link.url).
-
url_id= <ForeignKeyField: RequestsModel.url>¶
-
Loader Records¶
The darc.model.web.selenium module defines the data model
representing loader, specifically
from selenium submission.
See also
Please refer to darc.submit.submit_selenium() for more
information.
-
class
darc.model.web.selenium.SeleniumModel(*args, **kwargs)[source]¶ Bases:
darc.model.abc.BaseModelWebData model for documents from
seleniumsubmission.-
DoesNotExist¶ alias of
SeleniumModelDoesNotExist
-
id= <AutoField: SeleniumModel.id>¶
-
screenshot: Optional[bytes] = <BlobField: SeleniumModel.screenshot>¶ Screenshot in PNG format as
bytes.
-
timestamp: datetime.datetime = <DateTimeField: SeleniumModel.timestamp>¶ Timestamp of the submission.
-
url: darc.model.web.url.URLModel = <ForeignKeyField: SeleniumModel.url>¶ Original URL (c.f.
link.url).
-
url_id= <ForeignKeyField: SeleniumModel.url>¶
-