TWItter STock market Machine Learning package
Project description
TwistML
=======
Disclaimer
----------
This package is still very much under developement.
At this point most of the intended functionality is in place, but
documentation is still very spotty.
Installation
------------
You can use pip to install TwistML like so::
$ pip install twistml
Please make you sure you **have numpy, scipy and gensim installed** as
well. I have opted out of adding them to the install_requires as this
has caused problems in my own tests on windows machines. (For numpy the
problem is described `here
<https://github.com/numpy/numpy/issues/2434>`_.) So these packages will
not be installed automatically by pip.
Known Issues & Planned Improvements
===================================
- Implement a DateRange class and replace all occurences of fromdate,
todate, dateformat.
- Implement find_files() without dateranges at all. It should be
possible to simply process all files within a directory (also
recursively)
- TwistML currently assumes raw twitter data to be avaialble as one
json file per day. Make sure the internet-archive's file scheme is
supported as well
- Add support for hourly time resolution instead of daily only.
- Evaluation subpackage can only deal with binary classification.
Possibly explore adding multiclass.
- The way logging is currently set up is weird and should be reworked.
- gensim's LabeledSentence is deprecated, use TaggedDocument instead
Changes
=======
Version 0.2.2
- Added sentiment features based on TextBlob sentiments
Version 0.2.1
-------------
- Added functionality for complex category subsets to
tml-generate-features
- Also improved documentation for tml-generate-features (on cmd line as
well as docstring)
- improved test coverage
Version 0.2.0
-------------
- Changed Development Status to Alpha
- Removed Sentence2Vec as that functionality is included in current
gensim versions' Doc2Vec class
- Added Changelog
=======
Disclaimer
----------
This package is still very much under developement.
At this point most of the intended functionality is in place, but
documentation is still very spotty.
Installation
------------
You can use pip to install TwistML like so::
$ pip install twistml
Please make you sure you **have numpy, scipy and gensim installed** as
well. I have opted out of adding them to the install_requires as this
has caused problems in my own tests on windows machines. (For numpy the
problem is described `here
<https://github.com/numpy/numpy/issues/2434>`_.) So these packages will
not be installed automatically by pip.
Known Issues & Planned Improvements
===================================
- Implement a DateRange class and replace all occurences of fromdate,
todate, dateformat.
- Implement find_files() without dateranges at all. It should be
possible to simply process all files within a directory (also
recursively)
- TwistML currently assumes raw twitter data to be avaialble as one
json file per day. Make sure the internet-archive's file scheme is
supported as well
- Add support for hourly time resolution instead of daily only.
- Evaluation subpackage can only deal with binary classification.
Possibly explore adding multiclass.
- The way logging is currently set up is weird and should be reworked.
- gensim's LabeledSentence is deprecated, use TaggedDocument instead
Changes
=======
Version 0.2.2
- Added sentiment features based on TextBlob sentiments
Version 0.2.1
-------------
- Added functionality for complex category subsets to
tml-generate-features
- Also improved documentation for tml-generate-features (on cmd line as
well as docstring)
- improved test coverage
Version 0.2.0
-------------
- Changed Development Status to Alpha
- Removed Sentence2Vec as that functionality is included in current
gensim versions' Doc2Vec class
- Added Changelog
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
twistml-0.2.2.zip
(30.7 MB
view hashes)