Skip to main content

Wrapper for Twitter's Premium and Enterprise search APIs

Project description

Python Twitter Search API

This library serves as a Python interface to the `Twitter premium and
enterprise search
APIs <>`__.
It provides a command-line utility and a library usable from within a
Python program. It comes with tools for assisting in dynamic generation
of search rules and for parsing tweets.

Pretty docs can be seen
`here <>`__.


- Command-line utility is pipeable to other tools (e.g., ``jq``).
- Automatically handles pagination of results with specifiable limits
- Delivers a stream of data to the user for low in-memory requirements
- Handles Enterprise and Premium authentication methods
- Flexible usage within a python program
- Compatible with our group's Tweet Parser for rapid extraction of
relevant data fields from each tweet payload
- Supports the Search Counts endpoint, which can reduce API call usage
and provide rapid insights if you only need volumes and not tweet


The ``searchtweets`` library is on Pypi:

.. code:: bash

pip install searchtweets

Or you can install the development version locally via

.. code:: bash

git clone
cd search-tweets-python
pip install -e .


Credential Handling

The premium and enterprise Search APIs use different authentication
methods and we attempt to provide a seamless way to handle
authentication for all customers. We support both YAML-file based
methods and environment variables for access.

A YAML credential file should look like this:

.. code:: .yaml

username: <USERNAME>
password: <PW>
bearer_token: <TOKEN>

Premium clients will require the ``bearer_token`` and ``endpoint``
fields; Enterprise clients require ``username``, ``password``, and
``endpoint``. If you do not specify the ``account_type``, we attempt to
discern the account type and declare a warning about this behavior. The
``load_credentials`` function also allows ``account_type`` to be set.

Our credential reader will look for this file at
``"~/.twitter_keys.yaml"``, but you can pass the relevant location as
needed. You can also specify a different key in the yaml file, which can
be useful if you have different endpoints, e.g., ``dev``, ``test``,
``prod``, etc. The file might look like this:

.. code:: .yaml

account_type: premium
bearer_token: <TOKEN>

account_type: premium
bearer_token: <TOKEN>

If you want or need to pass credentials via environment variables, you
can set the appropriate variables of the following:



The ``load_credentials`` function will attempt to find these variables
if it cannot load fields from the yaml file, and it will **overwrite any
found credentials from the YAML file** if they have been parsed. This
behavior can be changed by setting the ``load_credentials`` parameter
``env_overwrite`` to ``False``.

The following cells demonstrates credential handling, both in the
command line app and Python library.

.. code:: python

from searchtweets import load_credentials

.. code:: python



{'endpoint': '<MY_ENDPOINT>',
'password': '<MY_PASSWORD>',
'username': '<MY_USERNAME>'}

.. code:: python



{'bearer_token': '<A_VERY_LONG_MAGIC_STRING>',
'endpoint': ''}

Environment Variable Overrides

If we set our environment variables, the program will look for them
regardless of a YAML file’s validity or existence.

.. code:: python

import os
os.environ["SEARCHTWEETS_ENDPOINT"] = "https://endpoint"

load_credentials(filename="nothing", yaml_key="no_key_here")


cannot read file nothing
Error parsing YAML file; searching for valid environment variables


{'endpoint': 'https://endpoint',
'password': 'ENV_PW',
'username': 'ENV_USERNAME'}


Using the Comand Line Application

The library includes an application, ````, in the
``tools`` directory that provides rapid access to Tweets.

Note that the ``--results-per-call`` flag specifies an argument to the
API call ( ``maxResults``, results returned per CALL), not as a hard max
to number of results returned from this program. The argument
``--max-results`` defines the maximum number of results to return from a
given call. All examples assume that your credentials are set up
correctly in a default location - ``.twitter_keys.yaml`` or in
environment variables.

**Stream json results to stdout without saving**

.. code:: bash

python \
--max-results 1000 \
--results-per-call 100 \
--filter-rule "beyonce has:hashtags" \

**Stream json results to stdout and save to a file**

.. code:: bash

python \
--max-results 1000 \
--results-per-call 100 \
--filter-rule "beyonce has:hashtags" \
--filename-prefix beyonce_geo \

**Save to file without output**

.. code:: bash

python \
--max-results 100 \
--results-per-call 100 \
--filter-rule "beyonce has:hashtags" \
--filename-prefix beyonce_geo \

Options can be passed via a configuration file (either ini or YAML). An
example file can be found in the ``tools/api_config_example.config`` or
``./tools/api_yaml_example.yaml`` files, which might look like this:

.. code:: bash

from_date = 2017-06-01
to_date = 2017-09-01
pt_rule = beyonce has:geo

results_per_call = 500
max_results = 500

save_file = True
filename_prefix = beyonce
results_per_file = 10000000

Or this:

.. code:: yaml

from-date: 2017-06-01
to-date: 2017-09-01 01:01
pt-rule: kanye

results-per-call: 500
max-results: 500

save_file: True
filename_prefix: kanye
results_per_file: 10000000

When using a config file in conjunction with the command-line utility,
you need to specify your config file via the ``--config-file``
parameter. Additional command-line arguments will either be *added* to
the config file args or **overwrite** the config file args if both are
specified and present.



python \
--config-file myapiconfig.config \


Full options are listed below:


$ -h
usage: [-h] [--credential-file CREDENTIAL_FILE]
[--credential-file-key CREDENTIAL_YAML_KEY]
[--env-overwrite ENV_OVERWRITE]
[--config-file CONFIG_FILENAME]
[--account-type {premium,enterprise}]
[--count-bucket COUNT_BUCKET]
[--start-datetime FROM_DATE] [--end-datetime TO_DATE]
[--filter-rule PT_RULE]
[--results-per-call RESULTS_PER_CALL]
[--max-results MAX_RESULTS] [--max-pages MAX_PAGES]
[--results-per-file RESULTS_PER_FILE]
[--filename-prefix FILENAME_PREFIX]
[--no-print-stream] [--print-stream] [--debug]

optional arguments:
-h, --help show this help message and exit
--credential-file CREDENTIAL_FILE
Location of the yaml file used to hold your
--credential-file-key CREDENTIAL_YAML_KEY
the key in the credential file used for this session's
credentials. Defaults to search_tweets_api
--env-overwrite ENV_OVERWRITE
Overwrite YAML-parsed credentials with any set
environment variables. See API docs or readme for
--config-file CONFIG_FILENAME
configuration file with all parameters. Far, easier to
use than the command-line args version., If a valid
file is found, all args will be populated, from there.
Remaining command-line args, will overrule args found
in the config, file.
--account-type {premium,enterprise}
The account type you are using
--count-bucket COUNT_BUCKET
Bucket size for counts API. Options:, day, hour,
minute (default is 'day').
--start-datetime FROM_DATE
Start of datetime window, format 'YYYY-mm-DDTHH:MM'
(default: -30 days)
--end-datetime TO_DATE
End of datetime window, format 'YYYY-mm-DDTHH:MM'
(default: most recent date)
--filter-rule PT_RULE
PowerTrack filter rule (See:
--results-per-call RESULTS_PER_CALL
Number of results to return per call (default 100; max
500) - corresponds to 'maxResults' in the API
--max-results MAX_RESULTS
Maximum results to return for this session (defaults
to 500; see -a option
--max-pages MAX_PAGES
Maximum number of pages/api calls to use for this
--results-per-file RESULTS_PER_FILE
Maximum tweets to save per file.
--filename-prefix FILENAME_PREFIX
prefix for the filename where tweet json data will be
--no-print-stream disable print streaming
--print-stream Print tweet stream to stdout
--debug print all info and warning messages


Using the Twitter Search APIs' Python Wrapper

Working with the API within a Python program is straightforward both for
Premium and Enterprise clients.

We’ll assume that credentials are in the default location,

.. code:: python

from searchtweets import ResultStream, gen_rule_payload, load_credentials

Enterprise setup

.. code:: python

enterprise_search_args = load_credentials("~/.twitter_keys.yaml",

Premium Setup

.. code:: python

premium_search_args = load_credentials("~/.twitter_keys.yaml",

There is a function that formats search API rules into valid json
queries called ``gen_rule_payload``. It has sensible defaults, such as
pulling more Tweets per call than the default 100 (but note that a
sandbox environment can only have a max of 100 here, so if you get
errors, please check this) not including dates, and defaulting to hourly
counts when using the counts api. Discussing the finer points of
generating search rules is out of scope for these examples; I encourage
you to see the docs to learn the nuances within, but for now let’s see
what a rule looks like.

.. code:: python

rule = gen_rule_payload("beyonce", results_per_call=100) # testing with a sandbox account



This rule will match tweets that have the text ``beyonce`` in them.

>From this point, there are two ways to interact with the API. There is a
quick method to collect smaller amounts of Tweets to memory that
requires less thought and knowledge, and interaction with the
``ResultStream`` object which will be introduced later.

Fast Way

We’ll use the ``search_args`` variable to power the configuration point
for the API. The object also takes a valid PowerTrack rule and has
options to cutoff search when hitting limits on both number of Tweets
and API calls.

We’ll be using the ``collect_results`` function, which has three

- rule: a valid PowerTrack rule, referenced earlier
- max_results: as the API handles pagination, it will stop collecting
when we get to this number
- result_stream_args: configuration args that we’ve already specified.

For the remaining examples, please change the args to either premium or
enterprise depending on your usage.

Let’s see how it goes:

.. code:: python

from searchtweets import collect_results

.. code:: python

tweets = collect_results(rule,
result_stream_args=enterprise_search_args) # change this if you need to

By default, Tweet payloads are lazily parsed into a ``Tweet``
`object <>`__. An overwhelming
number of Tweet attributes are made available directly, as such:

.. code:: python

[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]];


Jay-Z &amp; Beyoncé sat across from us at dinner tonight and, at one point, I made eye contact with Beyoncé. My limbs turned to jello and I can no longer form a coherent sentence. I have seen the eyes of the lord.

Beyoncé and it isn't close.

As you could guess.. Signs by Beyoncé will always be my shit.

When Beyoncé adopts a dog 🙌🏾

Hold up, you can't just do that to Beyoncé

Why y'all keep using Rihanna and Beyoncé gifs to promote the show when y'all let Bey lose the same award she deserved 3 times and let Rihanna leave with nothing but the clothes on her back?

30) anybody tell you that you look like Beyoncé

Mi Beyoncé favorita
Beyoncé necesita ver esto. Que diosa @TiniStoessel 🔥🔥🔥

Joanne Pearce Is now playing IF I WAS A BOY - BEYONCE.mp3 by !

I'm trynna see beyoncé's finsta before I die

.. code:: python

[print(tweet.created_at_datetime) for tweet in tweets[0:10]];


2018-01-17 00:08:50
2018-01-17 00:08:49
2018-01-17 00:08:44
2018-01-17 00:08:42
2018-01-17 00:08:42
2018-01-17 00:08:42
2018-01-17 00:08:40
2018-01-17 00:08:38
2018-01-17 00:08:37
2018-01-17 00:08:37

.. code:: python

[print(tweet.generator.get("name")) for tweet in tweets[0:10]];


Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for Android
Twitter for iPhone
Airtime Pro
Twitter for iPhone

Voila, we have some Tweets. For interactive environments and other cases
where you don’t care about collecting your data in a single load or
don’t need to operate on the stream of Tweets or counts directly, I
recommend using this convenience function.

Working with the ResultStream

The ResultStream object will be powered by the ``search_args``, and
takes the rules and other configuration parameters, including a hard
stop on number of pages to limit your API call usage.

.. code:: python

rs = ResultStream(rule_payload=rule,




There is a function, ``.stream``, that seamlessly handles requests and
pagination for a given query. It returns a generator, and to grab our
500 Tweets that mention ``beyonce`` we can do this:

.. code:: python

tweets = list(

Tweets are lazily parsed using our `Tweet
Parser <>`__, so tweet data is
very easily extractable.

.. code:: python

# using unidecode to prevent emoji/accents printing
[print(tweet.all_text) for tweet in tweets[0:10]];


gente socorro kkkkkkkkkk BEYONCE
Jay-Z &amp; Beyoncé sat across from us at dinner tonight and, at one point, I made eye contact with Beyoncé. My limbs turned to jello and I can no longer form a coherent sentence. I have seen the eyes of the lord.
Beyoncé and it isn't close.
As you could guess.. Signs by Beyoncé will always be my shit.
When Beyoncé adopts a dog 🙌🏾
Hold up, you can't just do that to Beyoncé
Why y'all keep using Rihanna and Beyoncé gifs to promote the show when y'all let Bey lose the same award she deserved 3 times and let Rihanna leave with nothing but the clothes on her back?
30) anybody tell you that you look like Beyoncé
Mi Beyoncé favorita
Beyoncé necesita ver esto. Que diosa @TiniStoessel 🔥🔥🔥
Joanne Pearce Is now playing IF I WAS A BOY - BEYONCE.mp3 by !

Counts Endpoint

We can also use the Search API Counts endpoint to get counts of Tweets
that match our rule. Each request will return up to *30* results, and
each count request can be done on a minutely, hourly, or daily basis.
The underlying ``ResultStream`` object will handle converting your
endpoint to the count endpoint, and you have to specify the
``count_bucket`` argument when making a rule to use it.

The process is very similar to grabbing Tweets, but has some minor

*Caveat - premium sandbox environments do NOT have access to the Search
API counts endpoint.*

.. code:: python

count_rule = gen_rule_payload("beyonce", count_bucket="day")

counts = collect_results(count_rule, result_stream_args=enterprise_search_args)

Our results are pretty straightforward and can be rapidly used.

.. code:: python



[{'count': 366, 'timePeriod': '201801170000'},
{'count': 44580, 'timePeriod': '201801160000'},
{'count': 61932, 'timePeriod': '201801150000'},
{'count': 59678, 'timePeriod': '201801140000'},
{'count': 44014, 'timePeriod': '201801130000'},
{'count': 46607, 'timePeriod': '201801120000'},
{'count': 41523, 'timePeriod': '201801110000'},
{'count': 47056, 'timePeriod': '201801100000'},
{'count': 65506, 'timePeriod': '201801090000'},
{'count': 95251, 'timePeriod': '201801080000'},
{'count': 162883, 'timePeriod': '201801070000'},
{'count': 106344, 'timePeriod': '201801060000'},
{'count': 93542, 'timePeriod': '201801050000'},
{'count': 110415, 'timePeriod': '201801040000'},
{'count': 127523, 'timePeriod': '201801030000'},
{'count': 131952, 'timePeriod': '201801020000'},
{'count': 176157, 'timePeriod': '201801010000'},
{'count': 57229, 'timePeriod': '201712310000'},
{'count': 72277, 'timePeriod': '201712300000'},
{'count': 72051, 'timePeriod': '201712290000'},
{'count': 76371, 'timePeriod': '201712280000'},
{'count': 61578, 'timePeriod': '201712270000'},
{'count': 55118, 'timePeriod': '201712260000'},
{'count': 59115, 'timePeriod': '201712250000'},
{'count': 106219, 'timePeriod': '201712240000'},
{'count': 114732, 'timePeriod': '201712230000'},
{'count': 73327, 'timePeriod': '201712220000'},
{'count': 89171, 'timePeriod': '201712210000'},
{'count': 192381, 'timePeriod': '201712200000'},
{'count': 85554, 'timePeriod': '201712190000'},
{'count': 57829, 'timePeriod': '201712180000'}]

Dated searches / Full Archive Search

Let’s make a new rule and pass it dates this time.

``gen_rule_payload`` takes dates of the forms ``YYYY-mm-DD`` and

**Note that this will only work with the full archive search option**,
which is available to my account only via the enterprise options. Full
archive search will likely require a different endpoint or access
method; please see your developer console for details.

.. code:: python

rule = gen_rule_payload("from:jack",



.. code:: python

tweets = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

.. code:: python

[print(tweet.all_text) for tweet in tweets[0:10]];


More clarity on our private information policy and enforcement. Working to build as much direct context into the product too
To provide more clarity on our private information policy, we’ve added specific examples of what is/is not a violation and insight into what we need to remove this type of content from the service.
Launching violent groups and hateful images/symbols policy on November 22nd
We will now launch our policies on violent groups and hateful imagery and hate symbols on Nov 22. During the development process, we received valuable feedback that we’re implementing before these are published and enforced. See more on our policy development process here 👇
@WillStick @lizkelley Happy birthday Liz!
Off-boarding advertising from all accounts owned by Russia Today (RT) and Sputnik.

We’re donating all projected earnings ($1.9mm) to support external research into the use of Twitter in elections, including use of malicious automation and misinformation.
@TMFJMo @anthonynoto Thank you
@gasca @stratechery @Lefsetz letter
@gasca @stratechery Bridgewater’s Daily Observations
Yup!!!! ❤️❤️❤️❤️ #davechappelle
@ndimichino Sometimes
Setting up at @CampFlogGnaw

.. code:: python

rule = gen_rule_payload("from:jack",



.. code:: python

counts = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

.. code:: python

[print(c) for c in counts];


{'timePeriod': '201710290000', 'count': 0}
{'timePeriod': '201710280000', 'count': 0}
{'timePeriod': '201710270000', 'count': 3}
{'timePeriod': '201710260000', 'count': 6}
{'timePeriod': '201710250000', 'count': 4}
{'timePeriod': '201710240000', 'count': 4}
{'timePeriod': '201710230000', 'count': 0}
{'timePeriod': '201710220000', 'count': 0}
{'timePeriod': '201710210000', 'count': 3}
{'timePeriod': '201710200000', 'count': 2}
{'timePeriod': '201710190000', 'count': 1}
{'timePeriod': '201710180000', 'count': 6}
{'timePeriod': '201710170000', 'count': 2}
{'timePeriod': '201710160000', 'count': 2}
{'timePeriod': '201710150000', 'count': 1}
{'timePeriod': '201710140000', 'count': 64}
{'timePeriod': '201710130000', 'count': 3}
{'timePeriod': '201710120000', 'count': 4}
{'timePeriod': '201710110000', 'count': 8}
{'timePeriod': '201710100000', 'count': 4}
{'timePeriod': '201710090000', 'count': 1}
{'timePeriod': '201710080000', 'count': 0}
{'timePeriod': '201710070000', 'count': 0}
{'timePeriod': '201710060000', 'count': 1}
{'timePeriod': '201710050000', 'count': 3}
{'timePeriod': '201710040000', 'count': 5}
{'timePeriod': '201710030000', 'count': 8}
{'timePeriod': '201710020000', 'count': 5}
{'timePeriod': '201710010000', 'count': 0}
{'timePeriod': '201709300000', 'count': 0}
{'timePeriod': '201709290000', 'count': 0}
{'timePeriod': '201709280000', 'count': 9}
{'timePeriod': '201709270000', 'count': 41}
{'timePeriod': '201709260000', 'count': 13}
{'timePeriod': '201709250000', 'count': 6}
{'timePeriod': '201709240000', 'count': 7}
{'timePeriod': '201709230000', 'count': 3}
{'timePeriod': '201709220000', 'count': 0}
{'timePeriod': '201709210000', 'count': 1}
{'timePeriod': '201709200000', 'count': 7}

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchtweets-1.2.1.tar.gz (29.5 kB view hashes)

Uploaded Source

Built Distribution

searchtweets-1.2.1-py3-none-any.whl (32.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page