Auxiliary Scripts

Since the darc project can be deployed through Docker Integration, we provided some auxiliary scripts to help with the deployment.

Health Check

File location:
  • Entry point: extra/healthcheck.py

  • System V service: extra/healthcheck.service

usage: healthcheck [-h] [-f FILE] [-i INTERVAL] ...

health check running container

positional arguments:
  services              name of services

optional arguments:
  -h, --help            show this help message and exit
  -f FILE, --file FILE  path to compose file
  -i INTERVAL, --interval INTERVAL
                        interval (in seconds) of health check

This script will watch the running status of containers managed by Docker Compose. If the containers are stopped or of unhealthy status, it will bring the containers back alive.

Also, as the internal program may halt unexpectedly whilst the container remains healthy, the script will watch if the program is still active through its output messages. If inactive, the script will restart the containers.

Upload API Submission Files

File location:
  • Entry point: extra/upload.py

  • Helper script: extra/upload.sh

  • Cron sample: extra/upload.cron

usage: upload [-h] [-p PATH] -H HOST [-U USER]

upload API submission files

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  path to data storage
  -H HOST, --host HOST  upstream hostname
  -U USER, --user USER  upstream user credential

This script will automatically upload API submission files, c.f. darc.submit, using curl(1). The --user option is supplied for the same option of curl(1).

Important

As the darc.submit.save_submit() is categorising saved API submission files by its actual date, the script is also uploading such files by the saved dates. Therefore, as the cron(8) sample suggests, the script should better be run everyday slightly after 12:00 AM (0:00 in 24-hour format).

Remove Repeated Lines

File location:

extra/uniq.py

This script works the same as uniq(1), except it filters one input line at a time without putting pressure onto memory utilisation.

Redis Clinic

File location:
  • Entry point: extra/clinic.py

  • Helper script: extra/clinic.lua

  • Cron sample: extra/clinic.cron

usage: clinic [-h] -r REDIS [-f FILE] [-t TIMEOUT] ...

memory clinic for Redis

positional arguments:
  services              name of services

optional arguments:
  -h, --help            show this help message and exit
  -r REDIS, --redis REDIS
                        URI to the Redis server
  -f FILE, --file FILE  path to compose file
  -t TIMEOUT, --timeout TIMEOUT
                        shutdown timeout in seconds

Since Redis may take more and more memory as the growth of crawled data and task queues, this script will truncate the Redis task queues (queue_requests & queue_selenium), as well as the corresponding pickle caches of darc.link.Link.

Note

We used Lua scrpit to slightly accelerate the whole procedure, as it may bring burden to the host server if running through Redis client.

Warning

Due to restriction on the Alibaba Cloud (Aliyun) customised version of Redis, i.e. AsparaDB for Redis, this Lua script is not allowed be to executed. It is recommended to manually cleanup the database before we find out an alternative solution.