Skip to main content

A collection of scripts to easily use the API of OCD Datalake

Project description

 ____        _        _       _          ____            _       _ 
|  _ \  __ _| |_ __ _| | __ _| | _____  / ___|  ___ _ __(_)_ __ | |_ ___
| | | |/ _` | __/ _` | |/ _` | |/ / _ \ \___ \ / __| '__| | '_ \| __/ __|
| |_| | (_| | || (_| | | (_| |   <  __/  ___) | (__| |  | | |_) | |_\__ \
|____/ \__,_|\__\__,_|_|\__,_|_|\_\___| |____/ \___|_|  |_| .__/ \__|___/
                                                          |_|

How to use

Datalake scripts is developed by Datalake developers to help use the Datalake API

You can use this repository either as a library or as a CLI

Installation

With Python 3.6+:

$ pip install datalake-scripts
$ pip3 install datalake-scripts

Using as a library

The library requires to first create a Datalake instance and then to use the defined Classes' methods

The library tutorial is available in the following link

Example :

from datalake import Datalake, AtomType, Output

dtl = Datalake(username='username', password='password')
dtl.Threats.lookup(
    atom_value='mayoclinic.org',
    atom_type=AtomType.DOMAIN,
    hashkey_only=False,
    output=Output.JSON
)

Using as a CLI

The cli can be used with:

$ ocd-dtl <command> <parameter>

Check ocd-dtl -h for help, including the list of commands available.

You can also use a script directly by using the following command: <script_name> <script_options>.

/!\ Make sure to use utf-8 without BOM when providing a file as input (-i, --input parameter)

Cli parameters

Common parameters for all commands:

  • -e, --env <preprod|prod> : Datalake environment. Default is prod
  • -o, --output <OUTPUT_PATH> : will set the output file as the API gives it. No default
  • -D, --debug : will raise the verbosity of the program (by displaying additional DEBUG messages). Default log level is INFO
  • -q, --quiet : will quiet the verbosity of the program (but will still show ERROR / WARNING messages). Default log level is INFO

Commands can also have additionary mandatory or optional parameters

For information about each command and more, please check the documentation directory

Environment variables

Authentication

There are two methods of authentication:

  • The first one is the use of the username and password. Every request to the API, will then use fresh tokens periodically created with these credentials.
  • The second one is the use of a long term token. You can create long term token through the GUI, it can have more restricted permissions than your account. You can create several long term tokens for one account.

In case you don't want to enter credentials for each commands and you are on a secured terminal, set those variables:

  • OCD_DTL_LONGTERM_TOKEN a long term token associated to your Datalake account. Please note that if this variable is set, then the long term token will be used for every request to the Datalake API, even if you set the username and passsword environment variables below. This is important because some endpoints / requests do not accept long term tokens but need fresh tokens (ie a Datalake instance with username and password). Check for the need of fresh tokens in each endpoint description here

or

  • OCD_DTL_USERNAME email address used to login on Datalake API/GUI.
  • OCD_DTL_PASSWORD password used to login on Datalake API/GUI.

These last two are independent and one can be used without the other if you wish.

Using a Proxy

You can set up following environment variables :

  • HTTP_PROXY
  • HTTPS_PROXY

We use the format accepted by the requests python library. See its documenation for other possible kinds of proxy to set up.

Throttling and retries

For throttling the requests, those two environment variables can be used:

  • OCD_DTL_QUOTA_TIME defines, in seconds, the time before resetting the requests limit, default is 1 second.
  • OCD_DTL_REQUESTS_PER_QUOTA_TIME defines the number of request to do at maximum for the given time, default is 5 queries. We recommend to lower the OCD_DTL_REQUESTS_PER_QUOTA_TIME value, if you encounter too many 429 errors.

Please don't exceed the quota marked here for each endpoint

Only network errors and HTTP response code 429, 500, 502, 503 and 504 trigger retries. You may control the number of retries using the environment variable OCD_DTL_MAX_RETRIES, which defaults to 3.

Contributing

To develop on this repository, please refer to this file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datalake_scripts-3.0.0.tar.gz (67.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datalake_scripts-3.0.0-py3-none-any.whl (90.5 kB view details)

Uploaded Python 3

File details

Details for the file datalake_scripts-3.0.0.tar.gz.

File metadata

  • Download URL: datalake_scripts-3.0.0.tar.gz
  • Upload date:
  • Size: 67.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for datalake_scripts-3.0.0.tar.gz
Algorithm Hash digest
SHA256 5bcf9b8187098e15d7ca30381ff6165ac473beae582a8f22b0361b92aababfa6
MD5 5bbcdbdc97d27612841344c2bae8c067
BLAKE2b-256 3a262895f759de4403b3c5d406f700e98fb7744db63f20a081b1db154aea458d

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalake_scripts-3.0.0.tar.gz:

Publisher: python-publish.yml on cert-orangecyberdefense/datalake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datalake_scripts-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for datalake_scripts-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fadd4346f72806cbb22d006d08f621bf7e7fd68b13b7a0b488b6c25a2db57804
MD5 5212f2f32e7de3108bf44467cb74f5a9
BLAKE2b-256 5234dbd0ebaaf711dbf4468b6d58d3c9df9cf9a1f20fb2b887386355e85736e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for datalake_scripts-3.0.0-py3-none-any.whl:

Publisher: python-publish.yml on cert-orangecyberdefense/datalake

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page