Link Database¶
The darc
project utilises Redis based database
to provide tele-process communication.
Note
In its first implementation, the darc
project used
Queue
to support such communication.
However, as noticed when runtime, the Queue
object will be much affected by the lack of memory.
There will be three databases, all following the save naming
convension with queue_
prefix:
the hostname database –
queue_hostname
the
requests
database –queue_requests
the
selenium
database –queue_selenium
For queue_hostname
, it is a Redis set data type;
and for queue_requests
and queue_selenium
, they
are both Redis sorted set data type.
-
darc.db.
drop_hostname
(link)[source]¶ Remove link from the hostname database.
The function updates the
queue_hostname
database.- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.
drop_requests
(link)[source]¶ Remove link from the
requests
database.The function updates the
queue_requests
database.- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.
drop_selenium
(link)[source]¶ Remove link from the
selenium
database.The function updates the
queue_selenium
database.- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.
get_lock
(name, timeout=None, sleep=0.1, blocking_timeout=None, lock_class=None, thread_local=True)[source]¶ Get a lock for Redis operations.
- Parameters
name (str) – Lock name.
timeout (Optional[float]) – Maximum life for the lock.
sleep (float) – Amount of time to sleep per loop iteration when the lock is in blocking mode and another client is currently holding the lock.
blocking_timeout (Optional[float]) – Maximum amount of time in seconds to spend trying to acquire the lock.
lock_class (Optional[redis.lock.Lock]) – Lock implementation.
thread_local (bool) – Whether the lock token is placed in thread-local storage.
- Returns
Return a new
redis.lock.Lock
object using keyname
that mimics the behavior ofthreading.Lock
.- Return type
Union[redis.lock.Lock, contextlib.nullcontext]
- Seel Also:
If
REDIS_LOCK
isFalse
, returns acontextlib.nullcontext
instead.
-
darc.db.
have_hostname
(link)[source]¶ Check if current link is a new host.
The function checks the
queue_hostname
database.- Parameters
link (darc.link.Link) – Link to check against.
- Returns
If such link is a new host.
- Return type
-
darc.db.
load_requests
(check=False)[source]¶ Load link from the
requests
database.The function reads the
queue_requests
database.- Parameters
check (bool) – If perform checks on loaded links, default to
CHECK
.- Returns
List of loaded links from the
requests
database.- Return type
List[darc.link.Link]
Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
load_selenium
(check=False)[source]¶ Load link from the
selenium
database.The function reads the
queue_selenium
database.- Parameters
check (bool) – If perform checks on loaded links, default to
CHECK
.- Returns
List of loaded links from the
selenium
database.- Return type
List[darc.link.Link]
Note
At runtime, the function will load links with maximum number at
MAX_POOL
to limit the memory usage.
-
darc.db.
redis_command
(command, *args, **kwargs)[source]¶ Wrapper function for Redis command.
- Parameters
command (str) – Command name.
*args – Arbitrary arguments for the Redis command.
- Keyword Arguments
**kwargs – Arbitrary keyword arguments for the Redis command.
- Returns
Values returned from the Redis command.
- Warns
RedisCommandFailed – Warns at each round when the command failed.
- Return type
Any
See also
Between each retry, the function sleeps for
REDIS_RETRY
second(s) if such value is NOTNone
.
-
darc.db.
save_requests
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
requests
database.The function updates the
queue_requests
database.- Parameters
entries (List[darc.link.Link]) – Links to be added to the
requests
database. It can be either alist
of links, or a single link string (ifsingle
set asTrue
).single (bool) – Indicate if
entries
is alist
of links or a single link string.score – Score to for the Redis sorted set.
nx – Forces
ZADD
to only create new elements and not to update scores for elements that already exist.xx – Forces
ZADD
to only update scores of elements that already exist. New elements will not be added.
When
entries
is a list ofLink
instances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE
.
-
darc.db.
save_selenium
(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
selenium
database.The function updates the
queue_selenium
database.- Parameters
entries (List[darc.link.Link]) – Links to be added to the
selenium
database. It can be either an iterable of links, or a single link string (ifsingle
set asTrue
).single (bool) – Indicate if
entries
is an iterable of links or a single link string.score – Score to for the Redis sorted set.
nx – Forces
ZADD
to only create new elements and not to update scores for elements that already exist.xx – Forces
ZADD
to only update scores of elements that already exist. New elements will not be added.
When
entries
is a list ofLink
instances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE
.
-
darc.db.
LOCK_TIMEOUT
: Optional[float]¶ - Default
10
- Environ
DARC_LOCK_TIMEOUT
Lock blocking timeout.
Note
If is an infinit
inf
, no timeout will be applied.See also
Get a lock from
darc.db.get_lock()
.
-
darc.db.
MAX_POOL
: int¶ - Default
1_000
- Environ
Maximum number of links loading from the database.
Note
If is an infinit
inf
, no limit will be applied.
-
darc.db.
REDIS_LOCK
: bool¶ - Default
- Environ
DARC_REDIS_LOCK
If use Redis (Lua) lock to ensure process/thread-safely operations.
See also
Toggles the behaviour of
darc.db.get_lock()
.
-
darc.db.
REDIS_RETRY
: int¶ - Default
10
- Environ
DARC_REDIS_RETRY
Retry interval between each Redis command failure.
Note
If is an infinit
inf
, no interval will be applied.See also
Toggles the behaviour of
darc.db.redis_command()
.