Download data using pandas with multi-threading and multi-processing.
Project description
Concurrent-Pandas
=================
Concurrent Pandas
-------------
**Concurrent Pandas** is a Python Library that allows you to use Pandas and / or Quandl to concurrently download bulk data using threads or processes. What does concurrency do for you? Download your data simultaneously instead of one key at a time, Concurrent Pandas automatically spawns an optimal number of processes or threads based on the number of processes available on your machine.
Note: Concurrent Pandas is not associated with Quandl or Python Pandas, it just allows you to access them faster.
---
####Features
- **Working in Python 2 and 3**
- **Sequential Downloading of Keys**
- **Concurrent downloading of keys using thread or process pools**
- **All Concurrent Downloading will automatically pick an optimal number of threads or processes to use for your system**
- **Recursive data structure unpacking for key insertion**
- Pass one or many:
- Lists
- Sets
- Deques
- Any other data structures that inherit from abstract base class *Container* provided it is not also inheriting from Python *basestring* and it allows for iteration.
- **Automatic re-attempts if the download fails or times out**
- Retries increase the time to try again with each successive failure
- **Variety of data sources supported**
- Quandl
- Federal Reserve Economic Data
- Google Finance
- Yahoo Finance
- More coming soon!
- **Data is returned in a hashmap for fast lookups** ( *O(1) average case* )
- Hash Map Keys are the strings entered for lookup, buckets contain your Panda data frame
---
####Easy to use
```
# Define your keys
yahoo_keys = ["aapl", "xom", "msft", "goog", "brk-b", "TSLA", "IRBT"]
# Instantiate Concurrent Pandas
fast_panda = concurrentpandas.ConcurrentPandas()
# Set your data source
fast_panda.set_source_yahoo_finance()
# Insert your keys
fast_panda.insert_keys(yahoo_keys)
# Choose either asynchronous threads, processes, or a single sequential download
fast_panda.consume_keys_asynchronous_threads()
# The Concurrent Pandas object contains a dict of your results now
mymap = fast_panda.return_map()
# Easily pull the data out of the map for your research
print(mymap["aapl"].head)
```
---
#####Installation Instructions
Note : only tested on Linux
To install execute:
```
pip install ConcurrentPandas
```
---
#####Updates
New in 0.1.2
Ability to interact with stock options
Now requires BeautifulSoup4, and Pandas 0.16 or newer.
---
#####Misc
Tested on Python 2.7.6 and Python 3.4.0
To see what else I'm building or follow / contact me check out my [github][1], [twitter][3], and my [personal site][2].
[1]: https://github.com/briwilcox
[2]: http://brianmwilcox.com/
[3]: https://twitter.com/brian_m_wilcox
Authors
==============
- Brian Wilcox
=================
Concurrent Pandas
-------------
**Concurrent Pandas** is a Python Library that allows you to use Pandas and / or Quandl to concurrently download bulk data using threads or processes. What does concurrency do for you? Download your data simultaneously instead of one key at a time, Concurrent Pandas automatically spawns an optimal number of processes or threads based on the number of processes available on your machine.
Note: Concurrent Pandas is not associated with Quandl or Python Pandas, it just allows you to access them faster.
---
####Features
- **Working in Python 2 and 3**
- **Sequential Downloading of Keys**
- **Concurrent downloading of keys using thread or process pools**
- **All Concurrent Downloading will automatically pick an optimal number of threads or processes to use for your system**
- **Recursive data structure unpacking for key insertion**
- Pass one or many:
- Lists
- Sets
- Deques
- Any other data structures that inherit from abstract base class *Container* provided it is not also inheriting from Python *basestring* and it allows for iteration.
- **Automatic re-attempts if the download fails or times out**
- Retries increase the time to try again with each successive failure
- **Variety of data sources supported**
- Quandl
- Federal Reserve Economic Data
- Google Finance
- Yahoo Finance
- More coming soon!
- **Data is returned in a hashmap for fast lookups** ( *O(1) average case* )
- Hash Map Keys are the strings entered for lookup, buckets contain your Panda data frame
---
####Easy to use
```
# Define your keys
yahoo_keys = ["aapl", "xom", "msft", "goog", "brk-b", "TSLA", "IRBT"]
# Instantiate Concurrent Pandas
fast_panda = concurrentpandas.ConcurrentPandas()
# Set your data source
fast_panda.set_source_yahoo_finance()
# Insert your keys
fast_panda.insert_keys(yahoo_keys)
# Choose either asynchronous threads, processes, or a single sequential download
fast_panda.consume_keys_asynchronous_threads()
# The Concurrent Pandas object contains a dict of your results now
mymap = fast_panda.return_map()
# Easily pull the data out of the map for your research
print(mymap["aapl"].head)
```
---
#####Installation Instructions
Note : only tested on Linux
To install execute:
```
pip install ConcurrentPandas
```
---
#####Updates
New in 0.1.2
Ability to interact with stock options
Now requires BeautifulSoup4, and Pandas 0.16 or newer.
---
#####Misc
Tested on Python 2.7.6 and Python 3.4.0
To see what else I'm building or follow / contact me check out my [github][1], [twitter][3], and my [personal site][2].
[1]: https://github.com/briwilcox
[2]: http://brianmwilcox.com/
[3]: https://twitter.com/brian_m_wilcox
Authors
==============
- Brian Wilcox
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ConcurrentPandas-0.1.2.tar.gz
(10.3 kB
view details)
Built Distribution
File details
Details for the file ConcurrentPandas-0.1.2.tar.gz
.
File metadata
- Download URL: ConcurrentPandas-0.1.2.tar.gz
- Upload date:
- Size: 10.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 30f08d8e9295f6dc5b16031f3463d4ca199598ea23ed627767eee9e3f0779017 |
|
MD5 | a7bc24ff34d9c5c4574e0494335830e5 |
|
BLAKE2b-256 | 418fc336d944f7f95d629844c437a414e58c277ce04025d70441aa83495b6dc6 |
File details
Details for the file ConcurrentPandas-0.1.2-py2.py3-none-any.whl
.
File metadata
- Download URL: ConcurrentPandas-0.1.2-py2.py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ed2d0239fd34fa7ac7e3567db67fc451895bba0ef88affec5e242f59461fb01 |
|
MD5 | a6c02d410f9476d5c431b61c4cb215df |
|
BLAKE2b-256 | 7ced8a24cb768c649bc0b90dd99d9861a0205c525c19b4f8b4bfb04cfbbb68d9 |