Link Database¶
The darc
project utilises Redis based database
to provide tele-process communication.
Note
In its first implementation, the darc
project used
Queue
to support such communication.
However, as noticed when runtime, the Queue
object will be much affected by the lack of memory.
There will be three databases, all following the save naming
convension with queue_
prefix:
the hostname database –
queue_hostname
(HostnameQueueModel
)the
requests
database –queue_requests
(RequestsQueueModel
)the
selenium
database –queue_selenium
(SeleniumQueueModel
)
For queue_hostname
, queue_requests
and queue_selenium
,
they are all Redis sorted set data type.
If FLAG_DB
is True
, then the
module uses the RDS storage described by the peewee
models as backend.
-
darc.db.
_db_operation
(operation, *args, **kwargs)[source]¶ Retry operation on database.
- Parameters
operation (
Callable
[…, ~_T]) – Callable / method to perform.*args – Arbitrary positional arguments.
args (Any) –
kwargs (Any) –
- Keyword Arguments
**kwargs – Arbitrary keyword arguments.
- Return type
~_T
- Returns
Any return value from a successful
operation
call.
-
darc.db.
_drop_hostname_db
(link)[source]¶ Remove link from the hostname database.
The function updates the
HostnameQueueModel
table.
-
darc.db.
_drop_hostname_redis
(link)[source]¶ Remove link from the hostname database.
The function updates the
queue_hostname
database.
-
darc.db.
_drop_requests_db
(link)[source]¶ Remove link from the
requests
database.The function updates the
RequestsQueueModel
table.
-
darc.db.
_drop_requests_redis
(link)[source]¶ Remove link from the
requests
database.The function updates the
queue_requests
database.
-
darc.db.
_drop_selenium_db
(link)[source]¶ Remove link from the
selenium
database.The function updates the
SeleniumQueueModel
table.
-
darc.db.
_drop_selenium_redis
(link)[source]¶ Remove link from the
selenium
database.The function updates the
queue_selenium
database.
-
darc.db.
_gen_arg_msg
(*args, **kwargs)[source]¶ Sanitise arguments representation string.
- Parameters
*args – Arbitrary arguments.
args (Any) –
kwargs (Any) –
- Keyword Arguments
**kwargs – Arbitrary keyword arguments.
- Return type
- Returns
Sanitised arguments representation string.
-
darc.db.
_have_hostname_db
(link)[source]¶ Check if current link is a new host.
The function checks the
HostnameQueueModel
table.
-
darc.db.
_have_hostname_redis
(link)[source]¶ Check if current link is a new host.
The function checks the
queue_hostname
database.
-
darc.db.
_load_requests_db
()[source]¶ Load link from the
requests
database.The function reads the
RequestsQueueModel
table.Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
_load_requests_redis
()[source]¶ Load link from the
requests
database.The function reads the
queue_requests
database.Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
_load_selenium_db
()[source]¶ Load link from the
selenium
database.The function reads the
SeleniumQueueModel
table.Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
_load_selenium_redis
()[source]¶ Load link from the
selenium
database.The function reads the
queue_selenium
database.- Parameters
check – If perform checks on loaded links, default to
CHECK
.- Return type
- Returns
List of loaded links from the
selenium
database.
Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
_redis_command
(command, *args, **kwargs)[source]¶ Wrapper function for Redis command.
- Parameters
command (
str
) – Command name.*args – Arbitrary arguments for the Redis command.
args (Any) –
kwargs (Any) –
- Keyword Arguments
**kwargs – Arbitrary keyword arguments for the Redis command.
- Return type
- Returns
Values returned from the Redis command.
- Warns
RedisCommandFailed – Warns at each round when the command failed.
See also
Between each retry, the function sleeps for
RETRY_INTERVAL
second(s) if such value is NOTNone
.
-
darc.db.
_redis_get_lock
(key)[source]¶ Get a lock for Redis operations.
- Parameters
key (
Literal
[‘queue_hostname’, ‘queue_requests’, ‘queue_selenium’]) – Lock target key.- Return type
Union
[Redlock
,AbstractContextManager
[+T_co]]- Returns
Return a new
pottery.redlock.Redlock
object using keykey
that mimics the behavior ofthreading.Lock
.
- Seel Also:
If
REDIS_LOCK
isFalse
, returns acontextlib.nullcontext
instead.
-
darc.db.
_save_requests_db
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
requests
database.The function updates the
RequestsQueueModel
table.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to therequests
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is alist
of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – Only create new elements and not to update scores for elements that already exist.xx (
bool
) – Only update scores of elements that already exist. New elements will not be added.
- Return type
-
darc.db.
_save_requests_redis
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
requests
database.The function updates the
queue_requests
database.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to therequests
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is alist
of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – ForcesZADD
to only create new elements and not to update scores for elements that already exist.xx (
bool
) – ForcesZADD
to only update scores of elements that already exist. New elements will not be added.
- Return type
-
darc.db.
_save_selenium_db
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
selenium
database.The function updates the
SeleniumQueueModel
table.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to theselenium
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is alist
of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – Only create new elements and not to update scores for elements that already exist.xx (
bool
) – Only update scores of elements that already exist. New elements will not be added.
- Return type
-
darc.db.
_save_selenium_redis
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
selenium
database.The function updates the
queue_selenium
database.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to theselenium
database. It can be either an iterable of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is an iterable of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – ForcesZADD
to only create new elements and not to update scores for elements that already exist.xx (
bool
) – ForcesZADD
to only update scores of elements that already exist. New elements will not be added.
- Return type
When
entries
is a list ofLink
instances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE
.- Return type
- Parameters
entries (Union[darc.link.Link, List[darc.link.Link]]) –
single (bool) –
score (Optional[float]) –
nx (bool) –
xx (bool) –
-
darc.db.
drop_hostname
(link)[source]¶ Remove link from the hostname database.
- Return type
- Parameters
link (darc.link.Link) –
-
darc.db.
drop_requests
(link)[source]¶ Remove link from the
requests
database.- Return type
- Parameters
link (darc.link.Link) –
-
darc.db.
drop_selenium
(link)[source]¶ Remove link from the
selenium
database.- Return type
- Parameters
link (darc.link.Link) –
-
darc.db.
load_requests
(check=False)[source]¶ Load link from the
requests
database.- Parameters
check (
bool
) – If perform checks on loaded links, default toCHECK
.- Return type
- Returns
List of loaded links from the
requests
database.
Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
load_selenium
(check=False)[source]¶ Load link from the
selenium
database.- Parameters
check (
bool
) – If perform checks on loaded links, default toCHECK
.- Return type
- Returns
List of loaded links from the
selenium
database.
Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
save_requests
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
requests
database.The function updates the
queue_requests
database.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to therequests
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is alist
of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – Only create new elements and not to update scores for elements that already exist.xx (
bool
) – Only update scores of elements that already exist. New elements will not be added.
- Return type
When
entries
is a list ofLink
instances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE
.- Return type
- Parameters
entries (Union[darc.link.Link, List[darc.link.Link]]) –
single (bool) –
score (Optional[float]) –
nx (bool) –
xx (bool) –
-
darc.db.
save_selenium
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
selenium
database.- Parameters
entries (
Union
[Link
,List
[Link
]]) – Links to be added to theselenium
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (
bool
) – Indicate ifentries
is alist
of links or a single link string.score (
Optional
[float
]) – Score to for the Redis sorted set.nx (
bool
) – Only create new elements and not to update scores for elements that already exist.xx (
bool
) – Only update scores of elements that already exist. New elements will not be added.
- Return type
When
entries
is a list ofLink
instances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE
.- Return type
- Parameters
entries (Union[darc.link.Link, List[darc.link.Link]]) –
single (bool) –
score (Optional[float]) –
nx (bool) –
xx (bool) –
-
darc.db.
LOCK_TIMEOUT
: Optional[float]¶ - Default
10
- Environ
DARC_LOCK_TIMEOUT
Lock blocking timeout.
Note
If is an infinit
inf
, no timeout will be applied.See also
Get a lock from
darc.db.get_lock()
.
-
darc.db.
MAX_POOL
: int¶ - Default
1_000
- Environ
Maximum number of links loading from the database.
Note
If is an infinit
inf
, no limit will be applied.