Skip to main content

Classify trades using trade classification algorithms 🐍

Project description

Trade Classification With Python

GitHubActions codecov Quality Gate Status

Logo

Documentation ✒️: https://karelze.github.io/tclf/

Source Code 🐍: https://github.com/KarelZe/tclf

tclf is a scikit-learn-compatible implementation of trade classification algorithms to classify financial markets transactions into buyer- and seller-initiated trades.

The key features are:

  • Easy: Easy to use and learn.
  • Sklearn-compatible: Compatible to the sklearn API. Use sklearn metrics and visualizations.
  • Feature complete: Wide range of supported algorithms. Use the algorithms individually or stack them like LEGO blocks.

Installation

python -m pip install tclf

Supported Algorithms

  • (Rev.) CLNV rule[^1]
  • (Rev.) EMO rule[^2]
  • (Rev.) LR algorithm[^6]
  • (Rev.) Tick test[^5]
  • Depth rule[^3]
  • Quote rule[^4]
  • Tradesize rule[^3]

For a primer on trade classification rules visit the rules section 🆕 in our docs.

Minimal Example

Let's start simple: classify all trades by the quote rule and all other trades, which cannot be classified by the quote rule, randomly.

Create a main.py with:

import numpy as np
import pandas as pd

from tclf.classical_classifier import ClassicalClassifier

X = pd.DataFrame(
    [
        [1.5, 1, 3],
        [2.5, 1, 3],
        [1.5, 3, 1],
        [2.5, 3, 1],
        [1, np.nan, 1],
        [3, np.nan, np.nan],
    ],
    columns=["trade_price", "bid_ex", "ask_ex"],
)

clf = ClassicalClassifier(layers=[("quote", "ex")], strategy="random")
clf.fit(X)
probs = clf.predict_proba(X)

Run your script with

$ python main.py

In this example, input data is available as a pd.DataFrame with columns conforming to our naming conventions.

The parameter layers=[("quote", "ex")] sets the quote rule at the exchange level and strategy="random" specifies the fallback strategy for unclassified trades.

Advanced Example

Often it is desirable to classify both on exchange level data and nbbo data. Also, data might only be available as a numpy array. So let's extend the previous example by classifying using the quote rule at exchange level, then at nbbo and all other trades randomly.

import numpy as np
from sklearn.metrics import accuracy_score

from tclf.classical_classifier import ClassicalClassifier

X = np.array(
    [
        [1.5, 1, 3, 2, 2.5],
        [2.5, 1, 3, 1, 3],
        [1.5, 3, 1, 1, 3],
        [2.5, 3, 1, 1, 3],
        [1, np.nan, 1, 1, 3],
        [3, np.nan, np.nan, 1, 3],
    ]
)
y_true = np.array([-1, 1, 1, -1, -1, 1])
features = ["trade_price", "bid_ex", "ask_ex", "bid_best", "ask_best"]

clf = ClassicalClassifier(
    layers=[("quote", "ex"), ("quote", "best")], strategy="random", features=features
)
clf.fit(X)
acc = accuracy_score(y_true, clf.predict(X))

In this example, input data is available as np.arrays with both exchange ("ex") and nbbo data ("best"). We set the layers parameter to layers=[("quote", "ex"), ("quote", "best")] to classify trades first on subset "ex" and remaining trades on subset "best". Additionally, we have to set ClassicalClassifier(..., features=features) to pass column information to the classifier.

Like before, column/feature names must follow our naming conventions.

Other Examples

For more practical examples, see our examples section.

Development

We are using pixi as a dependency management and workflow tool.

pixi install
pixi run postinstall
pixi run test

Citation

If you are using the package in publications, please cite as:

@software{bilz_tclf_2023,
    author = {Bilz, Markus},
    license = {BSD 3},
    month = jan,
    title = {{tclf} -- trade classification with python},
    url = {https://github.com/KarelZe/tclf},
    version = {0.0.8},
    year = {2024}
}

Footnotes

[^1]:

Chakrabarty, B., Li, B., Nguyen, V., & Van Ness, R. A. (2007). Trade classification algorithms for electronic communications network trades. Journal of Banking & Finance, 31(12), 3806–3821. https://doi.org/10.1016/j.jbankfin.2007.03.003
[^2]:
Ellis, K., Michaely, R., & O’Hara, M. (2000). The accuracy of trade classification rules: Evidence from nasdaq. The Journal of Financial and Quantitative Analysis, 35(4), 529–551. https://doi.org/10.2307/2676254
[^3]:
Grauer, C., Schuster, P., & Uhrig-Homburg, M. (2023). Option trade classification. https://doi.org/10.2139/ssrn.4098475
[^4]:
Harris, L. (1989). A day-end transaction price anomaly. The Journal of Financial and Quantitative Analysis, 24(1), 29. https://doi.org/10.2307/2330746
[^5]:
Hasbrouck, J. (2009). Trading costs and returns for U.s. Equities: Estimating effective costs from daily data. The Journal of Finance, 64(3), 1445–1477. https://doi.org/10.1111/j.1540-6261.2009.01469.x
[^6]:
Lee, C., & Ready, M. J. (1991). Inferring trade direction from intraday data. The Journal of Finance, 46(2), 733–746. https://doi.org/10.1111/j.1540-6261.1991.tb02683.x

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tclf-0.0.8.tar.gz (17.8 kB view details)

Uploaded Source

Built Distribution

tclf-0.0.8-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file tclf-0.0.8.tar.gz.

File metadata

  • Download URL: tclf-0.0.8.tar.gz
  • Upload date:
  • Size: 17.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tclf-0.0.8.tar.gz
Algorithm Hash digest
SHA256 9c28a2d96ee613b3a0e5b44b498a8566024a23dab1de8bffa0a2debbba5ccb7f
MD5 c9fa2afad69c7792966ca8dc2d82814e
BLAKE2b-256 9d1601bcc78b856c8bbfcdd94ab5acee5f51aefba73d89f0cf649936f1f68454

See more details on using hashes here.

Provenance

File details

Details for the file tclf-0.0.8-py3-none-any.whl.

File metadata

  • Download URL: tclf-0.0.8-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for tclf-0.0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 1c27be5a207edeefd28e350d9da1d1a29124c99aebbdfc4df75490f0e0ee693b
MD5 3dab8672b780a86c6da1ebc4177fa728
BLAKE2b-256 49a1c3ff6c359929c5933e5ebc10f621b0628a6bb591d6d6a19a537d8b324c80

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page