Task Queues

The darc.model.tasks module defines the data models required for the task queue of darc.

See also

Please refer to darc.db module for more information about the task queues.

Hostname Queue

Important

The hostname queue is a set named queue_hostname in a Redis based task queue.

The darc.model.tasks.hostname model contains the data model defined for the hostname queue.

class darc.model.tasks.hostname.HostnameQueueModel(*args, **kwargs)[source]

Bases: BaseModel

Hostname task queue.

DoesNotExist

alias of HostnameQueueModelDoesNotExist

hostname: str = <CharField: HostnameQueueModel.hostname>

Hostname (c.f. link.host).

id = <AutoField: HostnameQueueModel.id>
timestamp: datetime = <DateTimeField: HostnameQueueModel.timestamp>

Timestamp of last update.

Crawler Queue

Important

The crawler queue is a sorted set named queue_requests in a Redis based task queue.

The darc.model.tasks.requests model contains the data model defined for the crawler queue.

class darc.model.tasks.requests.RequestsQueueModel(*args, **kwargs)[source]

Bases: BaseModel

Task queue for crawler().

DoesNotExist

alias of RequestsQueueModelDoesNotExist

hash: str = <CharField: RequestsQueueModel.hash>

Sha256 hash value (c.f. Link.name).

id = <AutoField: RequestsQueueModel.id>

Pickled target Link instance.

text: str = <TextField: RequestsQueueModel.text>

URL as raw text (c.f. Link.url).

timestamp: datetime = <DateTimeField: RequestsQueueModel.timestamp>

Timestamp of last update.

Loader Queue

Important

The loader queue is a sorted set named queue_selenium in a Redis based task queue.

The darc.model.tasks.selenium model contains the data model defined for the loader queue.

class darc.model.tasks.selenium.SeleniumQueueModel(*args, **kwargs)[source]

Bases: BaseModel

Task queue for loader().

DoesNotExist

alias of SeleniumQueueModelDoesNotExist

hash: str = <CharField: SeleniumQueueModel.hash>

Sha256 hash value (c.f. Link.name).

id = <AutoField: SeleniumQueueModel.id>

Pickled target Link instance.

text: str = <TextField: SeleniumQueueModel.text>

URL as raw text (c.f. Link.url).

timestamp: datetime = <DateTimeField: SeleniumQueueModel.timestamp>

Timestamp of last update.