Skip to main content

Download data using pandas with multi-threading and multi-processing.

Project description

Concurrent-Pandas
=================


Concurrent Pandas
-------------

**Concurrent Pandas** is a Python Library that allows you to use Pandas and / or Quandl to concurrently download bulk data using threads or processes. What does concurrency do for you? Download your data simultaneously instead of one key at a time, Concurrent Pandas automatically spawns an optimal number of processes or threads based on the number of processes available on your machine.

Note: Concurrent Pandas is not associated with Quandl or Python Pandas, it just allows you to access them faster.

---
####Features

- **Working in Python 2 and 3**
- **Sequential Downloading of Keys**
- **Concurrent downloading of keys using thread or process pools**
- **All Concurrent Downloading will automatically pick an optimal number of threads or processes to use for your system**
- **Recursive data structure unpacking for key insertion**
- Pass one or many:
- Lists
- Sets
- Deques
- Any other data structures that inherit from abstract base class *Container* provided it is not also inheriting from Python *basestring* and it allows for iteration.
- **Automatic re-attempts if the download fails or times out**
- Retries increase the time to try again with each successive failure
- **Variety of data sources supported**
- Quandl
- Federal Reserve Economic Data
- Google Finance
- Yahoo Finance
- More coming soon!
- **Data is returned in a hashmap for fast lookups** ( *O(1) average case* )
- Hash Map Keys are the strings entered for lookup, buckets contain your Panda data frame


---
####Easy to use
```
# Define your keys
yahoo_keys = ["aapl", "xom", "msft", "goog", "brk-b", "TSLA", "IRBT"]
# Instantiate Concurrent Pandas
fast_panda = concurrentpandas.ConcurrentPandas()
# Set your data source
fast_panda.set_source_yahoo_finance()
# Insert your keys
fast_panda.insert_keys(yahoo_keys)
# Choose either asynchronous threads, processes, or a single sequential download
fast_panda.consume_keys_asynchronous_threads()
# The Concurrent Pandas object contains a dict of your results now
mymap = fast_panda.return_map()
# Easily pull the data out of the map for your research
print(mymap["aapl"].head)
```

---
#####Installation Instructions

Note : only tested on Linux

To install execute:

```
pip install ConcurrentPandas
```


---
#####Updates

New in 0.1.2
Ability to interact with stock options

Now requires BeautifulSoup4, and Pandas 0.16 or newer.

---
#####Misc

Tested on Python 2.7.6 and Python 3.4.0

To see what else I'm building or follow / contact me check out my [github][1], [twitter][3], and my [personal site][2].

[1]: https://github.com/briwilcox
[2]: http://brianmwilcox.com/
[3]: https://twitter.com/brian_m_wilcox


Authors
==============
- Brian Wilcox

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ConcurrentPandas-0.1.2.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

ConcurrentPandas-0.1.2-py2.py3-none-any.whl (17.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file ConcurrentPandas-0.1.2.tar.gz.

File metadata

File hashes

Hashes for ConcurrentPandas-0.1.2.tar.gz
Algorithm Hash digest
SHA256 30f08d8e9295f6dc5b16031f3463d4ca199598ea23ed627767eee9e3f0779017
MD5 a7bc24ff34d9c5c4574e0494335830e5
BLAKE2b-256 418fc336d944f7f95d629844c437a414e58c277ce04025d70441aa83495b6dc6

See more details on using hashes here.

File details

Details for the file ConcurrentPandas-0.1.2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for ConcurrentPandas-0.1.2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1ed2d0239fd34fa7ac7e3567db67fc451895bba0ef88affec5e242f59461fb01
MD5 a6c02d410f9476d5c431b61c4cb215df
BLAKE2b-256 7ced8a24cb768c649bc0b90dd99d9861a0205c525c19b4f8b4bfb04cfbbb68d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page