This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description
# DMArchiver
A tool to archive **all** the direct messages from your private conversations on Twitter.

## Introduction
Have you ever need to retrieve old information from a chat with your friends on Twitter? Or maybe you would just like to backup all these cheerful moments and keep them safe.

I have made this tool to retrieve all the tweets from my private conversations and transform them in an _IRC-like_ log for archiving.

**Output sample:**
```
[2016-09-07 10:35:55] <michael> [Media-image] https://ton.twitter.com/1.1/ton/data/dm/773125478562429059/773401254876366208/mfeDmXXj.jpg I am so a Dexter fan...
[2016-09-07 10:37:12] <kathy> He is so sexy. [Flushed face] I love him. [Heavy red heart]
[2016-09-07 10:38:10] <steve> You guys are ridiculous! [Face with tears of joy]
```

Emoji are currently kept with their description to prevent encoding issues.

This tool is also able to **download all the uploaded images** in their original resolution and, as a bonus, also retrieve the **GIFs** you used in your conversations as MP4 files (the format used by Twitter to optimized them and save space).

You may have found suggestions to use the Twitter's archive feature to do the same but Direct Messages are not included in the generated archive.

The script does not leverage the Twitter API because of its very restrictive limitations in regard of the handling of the Direct Messages. Actually, it is currently possible to retrieve only the latest 200 messages of a private conversation.

Because it is still possible to retrieve older messages from a Conversation by scrolling up, this script only simulates this behavior to automatically get the messages.

**Disclaimer:**
Using this tool will only behave like you using the Twitter web site with your browser, so there is nothing illegal to use it to retrieve your own data. However, depending on your conversations' length, it may trigger a lot of requests to the site that could be suspicious for Twitter. No one are reported issues upon now but use it at your own risk.

## Installation & Quick start
### Ubuntu


```
$ pip install dmarchiver
$ dmarchiver
```
### Windows
Download a Windows build from the [project releases](https://github.com/Mincka/DMArchiver/releases)

Then run the tool in a Command Prompt.
```
> C:\Temp\DMArchiver.exe
```

### Mac OS X / macOS

To build and run the package, you need to have **Xcode** (≈ 130 MB), **Homebrew** and **Python 3** (≈ 20 MB):

```
$ xcode-select --install
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
$ brew install python3
```

Once your environment is setup properly, you can install and run the tool:
```
$ pip3 install dmarchiver
$ dmarchiver
```

If Python bin path in not in your environment PATH variable, the program will not be found. Just run it with the complete path (location may vary...):
```
$ /Library/Frameworks/Python.framework/Versions/3.5/bin/dmarchiver
```

## Upgrade
```
$ pip install dmarchiver --upgrade
```

## Usage

### Command line tool
```
$ dmarchiver [-h] [-id CONVERSATION_ID] [-di] [-dg]

$ dmarchiver --help
usage: cmdline.py [-h] [-id CONVERSATION_ID] [-di] [-dg]

optional arguments:
-h, --help show this help message and exit
-id CONVERSATION_ID, --conversation_id CONVERSATION_ID
Conversation ID
-di, --download-images
Download images
-dg, --download-gifs Download GIFs (as MP4)
```

### Examples

#### Archive all conversations with images:
`$ dmarchiver -di`

The script output will be the `645754097571131337.txt` file with the conversation formatted in an _IRC-like_ style.

The images and GIFs files can be respectively found in the `645754097571131337/images` and `645754097571131337/mp4` folders.

#### Archive a specific conversation (only text):
To retrieve only one conversation with the ID `645754097571131337`:

`$ dmarchiver -id "645754097571131337"`

The script output will be the `645754097571131337.txt` file with the conversation formatted in an _IRC-like_ style.

#### How to get a `conversation_id`?

The `conversation-id` is the identifier of the conversation you want to backup. Here how to find it manually:
- Open the _Network_ panel in the _Development Tools_ of your favorite browser.
- Open the desired conversation on Twitter and have a look at the requests.
- Identify a request with the following arguments:
`https://twitter.com/messages/with/conversation?id=645754097571131337&max_entry_id=78473919348771337`
- Use the `id` value as your `conversation-id`. This identifier can contain special characters such as '-'.

### Module import
```python
>>> from dmarchiver.core import Crawler
>>> crawler = Crawler()
>>> crawler.authenticate('username', 'password')
>>> crawler.crawl('conversation_id')
```

## Development setup
```shell
$ git clone https://github.com/Mincka/DMArchiver.git
$ cd dmarchiver
$ virtualenv venv
$ source venv/bin/activate # "venv/Scripts/Activate.bat" on Windows
$ pip install -r requirements.txt
$ python setup.py install
```

### Windows binary build
You can build it with `pyinstaller`.

```
> pip install pyinstaller
> pyinstaller --onefile dmarchiver\cmdline.py -n dmarchiver.exe
> cd dist
> dmarchiver.exe
```

## Troubleshooting
You may encounter building issues with the `lxml` library on Windows (`error: Unable to find vcvarsall.bat`). The most simple and straightforward fix is to download and install a precompiled binary from [this site](http://www.lfd.uci.edu/~gohlke/pythonlibs/#lxml) and install the package locally:

`$ pip install lxml-3.6.4-cp35-cp35m-win_amd64.whl`

## FAQ

### What happens to my password and my messages? Are they sent to a third-party service?
Not at all. Everything happens on your computer. Your username and your password are only sent once to Twitter using a secured connection. Your messages are downloaded from your connection, and are written on your computer at the end of the script execution, so are the images and the GIFs if you chose to download them.

### I received an e-mail from Twitter saying a suspicious connection occured on Twitter, should I be worried about it?
Not at all. The tool simulates a Firefox browser on Windows 10. Consequently, if you do not use usually this configuration, Twitter warns you about this. You can safely ignore this message if you received it at the same time the tool was used.

## License

Copyright (C) 2016 Julien EHRHART

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program. If not, see <http: www.gnu.org="" licenses=""/>.
Release History

Release History

0.1.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.10

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.9

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.7

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.5

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
dmarchiver-0.1.0.zip (19.0 kB) Copy SHA256 Checksum SHA256 Source Nov 3, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting