Some command lines utilities to interact with files stored in AWS S3 incl. versioned ones.
Project description
If you use AWS S3, then this can be handy tool for you.
This package offers few command line utilities, which allow a bit more, then scripts provided by boto.
Some special features:
List versions in defined time period, see versioning
Fetch versions specified in CSV file (list-file)
Generate temporary url links for set of keys in buckets
Installation
Any of following methods shall work, pip being recommended one.
by easy_install or pip
$ pip install ttr.aws.utils.s3
or:
$ easy_install ttr.aws.utils.s3
Using setup.py
Unpack the package
go to directory where is setup.py located
run: python setup.py install
Resulting scripts are then located in Python Script directory.
Using buildout
being in root of source directory:
$ python bootstrap.py $ bin/buildout
And you get your scripts in directory bin/
Quick start
We want to fetch versions of feed in bucket mybucket named my/versioned/feed.xml
Be sure, you have your BOTO credentials configured. You shall have file in form:
[Credentials] aws_access_key_id = <your access key> aws_secret_access_key = <your secret key>
somewhere, and have set variable BOTO_CONFIG to value of complete path to this file. For more see BotoConfig.
create csv file for given feed and time period:
$ s3lsvers -from 2012-05-24T00:15 -to 2012-05-24T01:15 -list-file list.csv mybucket my/versioned/feed.xml
You shall then find file list.csv on your disk.
Review records in list.csv and delete all lines with version, which are not of your interest.
Using list.csv, ask s3getvers to fetch all versions specified in the file. Be sure to run it on empty directory:
$ s3getvers mybucket list.csv
You will see, how is each version downloaded and saved to your current directory.
Finally, you can try generating temorary url to your feed (showing latest of versions):
$ s3tmpgen mybucket my/versioned/feed.xml bucket_key_tmpurl: mybucket: my/versioned/feed.xml: https://mybucket.s3.amazonaws.com/my/versioned/feed.xml?Signature=o..A%3D&Expires=1342304229&AWSAccessKeyId=A....A expire: '2012-07-14T22:17:09Z'
Provided commands
s3lsvers
List versions of some feed. Could output into CSV file (-list-file) and/or html chart (-html-file).
$ s3lsvers.exe -h usage: s3lsvers-script.py [-h] [-from None] [-to None] [-list-file None] [-html-file None] [-version-id None] bucket_name key_name Lists all versions of given key, possibly filtering by from - to range for version last_modified time. Allows to put the listing into csv file and or into html chart. Listing shows: key_name "file name". Can repeat if the file has more versions version_id unique identifier for given version on given bucket. Has form of string and not a number. identifiers are "random", do not expect that they are sorten alphabetically. size size of file in bytes last_modified ISO 8601 formated time of file modification, e.g. `2011-06-22T03:05:09.000Z` age difference between last_modified or given version and preceding version. It is sort of current update interval for that version. Sample use: Lists to the screen all versions of file keynme in the bucketname bucket:: $ s3lsvers bucketname keyname Lists all versions younger then given time (from given time till now):: $ s3lsvers -from 2011-07-19T12:00:00 bucketname keyname Lists all versions older then given time (from very first versin till given date):: $ s3lsvers -to 2011-07-19T12:00:00 bucketname keyname Lists all versions in period betwen from and to time:: $ s3lsvers -from 2010-01-01 -to 2011-07-19T12:00:00 bucketname keyname Lists all versions and writes them into csv file named versions.csv:: $ s3lsvers -list-file versions.csv bucketname keyname Lists all versions and writes them into html chart file named chart.html:: $ s3lsvers -html-file chart.html bucketname keyname Prints to screen, writes to csv, creates html chart and this all for versions in given time period.:: $ s3lsvers -from 2010-01-01 -to 2011-07-19T12:00:00 -list-file versions.csv -html-file chart.html bucketname keyname positional arguments: bucket_name name of AWS S3 bucket, which is searched. key_name name of key to list. Typically it is complete name of a key, but also truncated name can be set, in this case all keys, sharing this prefix, will be listed. optional arguments: -h, --help show this help message and exit -from None, --from-time None Modification time of oldest expected version expressed in ISO 8601 format. Can be truncated. (default: goes to the oldest version) -to None, --to-time None Modification time of youngest expected version expressed in ISO 8601 format. Can be truncated. (default: goes to the latest version) -list-file None Name of file, where is result written in csv format. If set, the file is always overwritten. -html-file None Name of file, where is result written in html format (as a chart). If set, the file is always overwritten. -version-id None Optional version-id. If specified, listing does not start from the freshest version, but starts searching from given VERSION_ID and continues searching older and older versions. This could speed up listng in case, you need rather older files and you know VERSION_ID which came somehow later then is the time scope you are going to list.
s3getvers
$ s3getvers -h usage: s3getvers-script.py [-h] [-output-version-id-names] [-no-decompression] bucket_name csv_version_file Fetch file versions as listed in provided csv file Typical csv file (as by default produced by s3lsvers) is: my/versioned/feed.xml;OrUr6XO8KSKEHbd8mQ.MloGcGlsh7Sir;191345;2012-05-23T20:45:10.000Z;39 my/versioned/feed.xml;xhkVOy.dJfjSfUwse8tsieqjDicp0owq;192790;2012-05-23T20:44:31.000Z;62 my/versioned/feed.xml;oKneK.N2wS8pW8.EmLqjldYlgcFwxN3V;193912;2012-05-23T20:43:29.000Z;58 and has columns: :key_name: name of the feed (not containing the bucket name itself) :version_id: string, identifying unique version. Any following columns can contain anything. :size: size in bytes. This column is not used and can be missing. :last_modified: date, when the version was posted. This column is not used and can be missing. Typical use (assuming, above csv file is available under name verlist.csv):: $ s3getvers-script.py yourbucketname verlist.csv What will create following files in current directory: * my/versioned/feed.xml.2012-05-23T20_45_10.xml * my/versioned/feed.xml.2012-05-23T20_44_31.xml * my/versioned/feed.xml.2012-05-23T20_43_29.xml Even though these files are gzipped on server, they will be decompressed on local disk. positional arguments: bucket_name bucket name (default: None) csv_version_file name of CSV file with version_id optional arguments: -h, --help show this help message and exit -output-version-id-names Resulting file names shall use version_id to become distinguished (default is to use timestamp of file creation) -no-decompression Keeps the files as they come, do not decompress, if they come compressed
s3tmpgen
$ s3tmpgen.exe -h usage: s3tmpgen-script.py [-h] [-input-config INPUT_CONFIG] [-output <open file '<stdout>', mode 'w' at 0x01CD5078>] [-boto-config None] [-expiration None] [bucket_name] [key_names [key_names ...]] Generate temporary links for keys in given bucket, optionally using explicitly defined user credentials. Plain command line use, generating tmpurl for three keys in bucket mybucket:: $ s3tmpgen.py mybucket key/name/A key/name/B key/name/C bucket_key_tmpurl: mybucket: key/name/A: https://mybucket.s3.amazonaws.com/key/name/A?Signature=Y..bw%3D&Expires=1342299038&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/B: https://mybucket.s3.amazonaws.com/key/name/B?Signature=K..dE%3D&Expires=1342299038&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/C: https://mybucket.s3.amazonaws.com/key/name/C?Signature=b..c0%3D&Expires=1342299038&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA expire: '2012-07-14T20:50:38Z' Default credentials configured for boto (e.g. by BOTO_CONFIG) are used. Default expiration is set to plus 30 days. Output is printed to stdout. Reading key names form YAML file -------------------------------- Using option -input-config, key names and related buckets can be read form YAML formated file in following form:: mybucket: - key/name/A - key/name/B - key/name/C anotherbucket: - key/name/D - key/name/E - key/name/F Specify expiration date time ----------------------------- Using option -expiration, expiration date time can be specified. Many formats are recognized, try it, if spaces are included, use quotes. * 2012-06-14T23:01Z * Tuesday * Tue * "15th of June" Use explicit credential file ---------------------------- using option -boto_config, configuratin file in following format can be used:: [Credentials] aws_access_key_id=AxxxxxxxxxxxxxxxxxxA aws_secret_access_key=Mxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxl Write output to specific file ----------------------------- using option -output, output can be directly written into specific file. Complex call, using all possible options ======================================== $ s3tmpgen.py -input-config key_list.txt -exp "15th of May" -boto-config my/ini/file.cfg -output out.yaml cmdlinebucket key/from/cmd key/from/CMD nothing is printed out, result is written into out.yaml file:: bucket_key_tmpurl: anotherbucket: key/name/D: https://anotherbucket.s3.amazonaws.com/key/name/D?Signature=N...A%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/E: https://anotherbucket.s3.amazonaws.com/key/name/E?Signature=h..w%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/F: https://anotherbucket.s3.amazonaws.com/key/name/F?Signature=n..4%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA cmdlinebucket: key/from/CMD: https://cmdlinebucket.s3.amazonaws.com/key/from/CMD?Signature=B..bC%2Be9U%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/from/cmd: https://cmdlinebucket.s3.amazonaws.com/key/from/cmd?Signature=C..8%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA mybucket: key/name/A: https://mybucket.s3.amazonaws.com/key/name/A?Signature=X..Mk%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/B: https://mybucket.s3.amazonaws.com/key/name/B?Signature=y..zg%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA key/name/C: https://mybucket.s3.amazonaws.com/key/name/C?Signature=1..v4%3D&Expires=1337032800&AWSAccessKeyId=AxxxxxxxxxxxxxxxxxxA expire: '2012-05-15T00:00:00Z' .. note:: Command does not check, if bucket or keys exist, and if credentials are really usable and correct, test it yourself. .. note:: key names from input file and from command line are combined together. positional arguments: bucket_name name of AWS S3 bucket key_names key name(s) optional arguments: -h, --help show this help message and exit -input-config INPUT_CONFIG name of yaml formated file listing buckets and key names to process -output <open file '<stdout>', mode 'w' at 0x01CD5078> name of output file -boto-config None Patho to INI file with credentials, as used by BOTO (default: uses system value of BOTO_CONFIG) -expiration None Expiration date and time. Accepts many forms (default: plus 30 days from now)
Configuring AWS S3 credentials
Credentials for accessing AWS S3 must be set. Authorization is currently done by boto means as described in article BotoConfig incl. comment from May 22, 2011.
On Windows I recommend using BOTO_CONFIG variable pointing to the file with required credentials.
Credits
This work is built on top of boto module, great Python library for accessing AWS services created by Mitch Garnaat .
Copyright © 2011, Jan Vlcinsky
Copyright © 2012, TamTam Research s.r.o. http://www.tamtamresearch.com
All rights reserved.
News
0.3.0
Release date: 2012-06-15
command s3tmpgen - generating temporary urls for selected keys in buckets
0.2.3
Release date: 2012-05-28
command s3lsvers - to list key versions
command s3getvers - to fetch versions listed in csv file