Link Database¶
The darc project utilises Redis based database
to provide tele-process communication.
Note
In its first implementation, the darc project used
Queue to support such communication.
However, as noticed when runtime, the Queue
object will be much affected by the lack of memory.
There will be three databases, all following the save naming
convension with queue_ prefix:
the hostname database –
queue_hostname(HostnameQueueModel)the
requestsdatabase –queue_requests(RequestsQueueModel)the
seleniumdatabase –queue_selenium(SeleniumQueueModel)
For queue_hostname, queue_requests and queue_selenium,
they are all Redis sorted set data type.
If FLAG_DB is True, then the
module uses the RDS storage described by the peewee
models as backend.
- darc.db._drop_hostname_db(link)[source]¶
Remove link from the hostname database.
The function updates the
HostnameQueueModeltable.
- darc.db._drop_hostname_redis(link)[source]¶
Remove link from the hostname database.
The function updates the
queue_hostnamedatabase.
- darc.db._drop_requests_db(link)[source]¶
Remove link from the
requestsdatabase.The function updates the
RequestsQueueModeltable.
- darc.db._drop_requests_redis(link)[source]¶
Remove link from the
requestsdatabase.The function updates the
queue_requestsdatabase.
- darc.db._drop_selenium_db(link)[source]¶
Remove link from the
seleniumdatabase.The function updates the
SeleniumQueueModeltable.
- darc.db._drop_selenium_redis(link)[source]¶
Remove link from the
seleniumdatabase.The function updates the
queue_seleniumdatabase.
- darc.db._have_hostname_db(link)[source]¶
Check if current link is a new host.
The function checks the
HostnameQueueModeltable.
- darc.db._have_hostname_redis(link)[source]¶
Check if current link is a new host.
The function checks the
queue_hostnamedatabase.
- darc.db._load_requests_db()[source]¶
Load link from the
requestsdatabase.The function reads the
RequestsQueueModeltable.Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db._load_requests_redis()[source]¶
Load link from the
requestsdatabase.The function reads the
queue_requestsdatabase.Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db._load_selenium_db()[source]¶
Load link from the
seleniumdatabase.The function reads the
SeleniumQueueModeltable.Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db._load_selenium_redis()[source]¶
Load link from the
seleniumdatabase.The function reads the
queue_seleniumdatabase.- Parameters:
check – If perform checks on loaded links, default to
CHECK.- Return type:
- Returns:
List of loaded links from the
seleniumdatabase.
Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db._redis_command(command, *args, **kwargs)[source]¶
Wrapper function for Redis command.
- Parameters:
- Keyword Arguments:
**kwargs – Arbitrary keyword arguments for the Redis command.
- Return type:
- Returns:
Values returned from the Redis command.
- Warns:
RedisCommandFailed – Warns at each round when the command failed.
See also
Between each retry, the function sleeps for
RETRY_INTERVALsecond(s) if such value is NOTNone.
- darc.db._redis_get_lock(key)[source]¶
Get a lock for Redis operations.
- Parameters:
key (
Literal['queue_hostname','queue_requests','queue_selenium']) – Lock target key.- Return type:
Union[Redlock,AbstractContextManager[TypeVar(T_co, covariant=True)]]- Returns:
Return a new
pottery.redlock.Redlockobject using keykeythat mimics the behavior ofthreading.Lock.
- Seel Also:
If
REDIS_LOCKisFalse, returns acontextlib.nullcontextinstead.
- darc.db._save_requests_db(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
requestsdatabase.The function updates the
RequestsQueueModeltable.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to therequestsdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis alistof links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – Only create new elements and not to update scores for elements that already exist.xx (
bool) – Only update scores of elements that already exist. New elements will not be added.
- Return type:
- darc.db._save_requests_redis(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
requestsdatabase.The function updates the
queue_requestsdatabase.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to therequestsdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis alistof links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – ForcesZADDto only create new elements and not to update scores for elements that already exist.xx (
bool) – ForcesZADDto only update scores of elements that already exist. New elements will not be added.
- Return type:
- darc.db._save_selenium_db(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
seleniumdatabase.The function updates the
SeleniumQueueModeltable.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to theseleniumdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis alistof links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – Only create new elements and not to update scores for elements that already exist.xx (
bool) – Only update scores of elements that already exist. New elements will not be added.
- Return type:
- darc.db._save_selenium_redis(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
seleniumdatabase.The function updates the
queue_seleniumdatabase.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to theseleniumdatabase. It can be either an iterable of links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis an iterable of links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – ForcesZADDto only create new elements and not to update scores for elements that already exist.xx (
bool) – ForcesZADDto only update scores of elements that already exist. New elements will not be added.
- Return type:
When
entriesis a list ofLinkinstances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE.
- darc.db.load_requests(check=False)[source]¶
Load link from the
requestsdatabase.- Parameters:
check (
bool) – If perform checks on loaded links, default toCHECK.- Return type:
- Returns:
List of loaded links from the
requestsdatabase.
Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db.load_selenium(check=False)[source]¶
Load link from the
seleniumdatabase.- Parameters:
check (
bool) – If perform checks on loaded links, default toCHECK.- Return type:
- Returns:
List of loaded links from the
seleniumdatabase.
Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
- darc.db.save_requests(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
requestsdatabase.The function updates the
queue_requestsdatabase.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to therequestsdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis alistof links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – Only create new elements and not to update scores for elements that already exist.xx (
bool) – Only update scores of elements that already exist. New elements will not be added.
- Return type:
When
entriesis a list ofLinkinstances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE.
- darc.db.save_selenium(entries, single=False, score=None, nx=False, xx=False)[source]¶
Save link to the
seleniumdatabase.- Parameters:
entries (
Union[Link,List[Link]]) – Links to be added to theseleniumdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (
bool) – Indicate ifentriesis alistof links or a single link string.score (
Optional[float]) – Score to for the Redis sorted set.nx (
bool) – Only create new elements and not to update scores for elements that already exist.xx (
bool) – Only update scores of elements that already exist. New elements will not be added.
- Return type:
When
entriesis a list ofLinkinstances, we tries to perform bulk update to easy the memory consumption. The bulk size is defined byBULK_SIZE.
- darc.db.LOCK_TIMEOUT: float | None¶
- Default:
10- Environ:
DARC_LOCK_TIMEOUT
Lock blocking timeout.
Note
If is an infinit
inf, no timeout will be applied.See also
Get a lock from
darc.db.get_lock().
- darc.db.MAX_POOL: int¶
- Default:
1_000- Environ:
Maximum number of links loading from the database.
Note
If is an infinit
inf, no limit will be applied.