Link Database¶
The darc project utilises file system based database
to provide tele-process communication.
Note
In its first implementation, the darc project used
multiprocessing.Queue to support such communication. However, as noticed
when runtime, the multiprocessing.Queue object will be much affected by
the lack of memory.
There will be two databases, both locate at root of the
data storage path PATH_DB:
At runtime, after reading such database, darc
will keep a backup of the database with .tmp suffix
to its file extension.
-
darc.db.drop_hostname(link)[source]¶ Remove link from the hostname database.
- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.drop_requests(link)[source]¶ Remove link from the
requestsdatabase.- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.drop_selenium(link)[source]¶ Remove link from the
seleniumdatabase.- Parameters
link (darc.link.Link) – Link to be removed.
-
darc.db.get_lock(name, timeout=None, sleep=0.1, blocking_timeout=None, lock_class=None, thread_local=True)[source]¶ Get a lock for Redis operations.
- Parameters
name (str) – Lock name.
timeout (Optional[float]) – Maximum life for the lock.
sleep (float) – Amount of time to sleep per loop iteration when the lock is in blocking mode and another client is currently holding the lock.
blocking_timeout (Optional[float]) – Maximum amount of time in seconds to spend trying to acquire the lock.
lock_class (Optional[redis.lock.Lock]) – Lock implementation.
thread_local (bool) – Whether the lock token is placed in thread-local storage.
- Returns
Return a new
redis.lock.Lockobject using keynamethat mimics the behavior ofthreading.Lock.- Return type
Union[redis.lock.Lock, contextlib.nullcontext]
Notes
If
REDIS_LOCKisFalse, returns acontextlib.nullcontextinstead.
-
darc.db.have_hostname(link)[source]¶ Check if current link is a new host.
- Parameters
link (darc.link.Link) – Link to check against.
- Returns
If such link is a new host.
- Return type
-
darc.db.load_requests(check=False)[source]¶ Load link from the
requestsdatabase.- Parameters
check (bool) – If perform checks on loaded links, default to
CHECK.- Returns
List of loaded links from the
requestsdatabase.- Return type
List[darc.link.Link]
Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
-
darc.db.load_selenium(check=False)[source]¶ Load link from the
seleniumdatabase.- Parameters
check (bool) – If perform checks on loaded links, default to
CHECK.- Returns
List of loaded links from the
seleniumdatabase.- Return type
List[darc.link.Link]
Note
At runtime, the function will load links with maximum number at
MAX_POOLto limit the memory usage.
-
darc.db.save_requests(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
requestsdatabase.- Parameters
entries (List[darc.link.Link]) – Links to be added to the
requestsdatabase. It can be either alistof links, or a single link string (ifsingleset asTrue).single (bool) – Indicate if
entriesis alistof links or a single link string.score – Score to for the Redis sorted set.
nx – Forces
ZADDto only create new elements and not to update scores for elements that already exist.xx – Forces
ZADDto only update scores of elements that already exist. New elements will not be added.
-
darc.db.save_selenium(entries, single=False, score=None, nx=False, xx=False)[source]¶ Save link to the
seleniumdatabase.- Parameters
entries (List[darc.link.Link]) – Links to be added to the
seleniumdatabase. It can be either an iterable of links, or a single link string (ifsingleset asTrue).single (bool) – Indicate if
entriesis an iterable of links or a single link string.score – Score to for the Redis sorted set.
nx – Forces
ZADDto only create new elements and not to update scores for elements that already exist.xx – Forces
ZADDto only update scores of elements that already exist. New elements will not be added.
-
darc.db.QR_LOCK: multiprocessing.Lock¶ I/O lock for the
requestsdatabase_queue_requests.txt.See also
-
darc.db.QS_LOCK: Union[multiprocessing.Lock, threading.Lock, contextlib.nullcontext]¶ I/O lock for the
seleniumdatabase_queue_selenium.txt.If
FLAG_MPisTrue, it will be an instance ofmultiprocessing.Lock. IfFLAG_THisTrue, it will be an instance ofthreading.Lock. If none above, it will be an instance ofcontextlib.nullcontext.