Skip to main content

SDK For publishing to Socrata

Project description

# publish-py
experimental sdk for the socrata publishing api

## Documentation
[Is available here](https://socrata.github.io/publish-py/docs)

## Example
Try the command line example with
```bash
python -m examples.create 'Police Reports' ~/Desktop/catalog.data.gov/Seattle_Real_Time_Fire_911_Calls.csv 'pete-test.test-socrata.com' --username $SOCRATA_USERNAME --password $SOCRATA_PASSWORD
```
## Using

### Boilerplate
```python
# Import some stuff
from socrata.authorization import Authorization
from socrata.publish import Publish
import os

# Boilerplate...
# Make an auth object
auth = Authorization(
"pete-test.test-socrata.com",
os.environ['SOCRATA_USERNAME'],
os.environ['SOCRATA_PASSWORD']
)

```

### Simple usage
To create a dataset with as little code as possible, you can do this:

```python
with open('path/to/my/file.csv', 'rb') as file:
# Upload the data, validate it
output = Publish(auth).create(
name = "cool dataset",
description = "a description"
).csv(file)

# It's wise to make sure you have no type errors, if that's what you expect
assert output.attributes['error_count'] == 0

# Publish the dataset - this will make it public and available to make
# visualizations from
output.apply()
```

Similar to the `csv` method are the `xls`, `xlsx`, and `tsv` methods, which upload
those files.


### Advanced usage

#### Create a revision

```python
# This is our publishing object, using the auth variable from above
publishing = Publish(auth)

(ok, view) = publishing.new({'name': 'cool dataset'})
assert ok, view
# This is our view
fourfour = view['id']

# Make a revision
(ok, rev) = publishing.revisions.create(fourfour)
assert ok

# rev is a Revision object, we can print it
print(rev)
Revision({'created_by': {'display_name': 'rozap',
'email': 'chris.duranti@socrata.com',
'user_id': 'tugg-ikce'},
'fourfour': 'ij46-xpxe',
'id': 346,
'inserted_at': '2017-02-27T23:05:08.522796',
'metadata': None,
'update_seq': 285,
'upsert_jobs': []})

# We can also access the attributes of the revision
print(rev.attributes['fourfour'])
'ij46-xpxe'
```

### Make an upload, given a revision
```python
# Using that revision, we can create an upload
(ok, upload) = rev.create_upload({'filename': "foo.csv"})
assert ok

# And print it
print(upload)
Upload({'content_type': None,
'created_by': {'display_name': 'rozap',
'email': 'chris.duranti@socrata.com',
'user_id': 'tugg-ikce'},
'filename': 'foo.csv',
'finished_at': None,
'id': 290,
'inserted_at': '2017-02-27T23:07:18.309676',
'schemas': []})
```
### Upload a csv, given an upload
```python
# And using that upload we just created, we can put bytes into it
with open('test/fixtures/simple.csv', 'rb') as f:
(ok, input_schema) = upload.csv(f)
assert ok
```

### Transform a file, given an input schema
```python
# Putting bytes into an upload gives us an input schema. We can call `transform` on the
# input schema to get a new output schema with our transforms applied.
(ok, output_schema) = input_schema.transform({
'output_columns': [
{
"field_name": "b",
"display_name": "b, but as a number",
"position": 0,
"description": "b but with a bunch of errors",
"transform": {
"transform_expr": "to_number(b)"
}
}
]})
)

# Wait for the transformation to finish
(ok, output_schema) = output_schema.wait_for_finish()
assert ok, output_schema

# Look at how many validation errors happened while trying to transform our dataset
print(output_schema.attributes['error_count'])
```


### Validating rows
```python
(ok, rows) = output_schema.rows(offset = 0, limit = 20)
asssert ok,

self.assertEqual(rows, [
{'b': {'ok': ' bfoo'}},
{'b': {'ok': ' bfoo'}},
{'b': {'ok': ' bfoo'}},
{'b': {'ok': ' bfoo'}}
])
```

### Do the upsert!
```python
# Now we have transformed our data into the shape we want, let's do an upsert
(ok, upsert_job) = output_schema.apply()

# This will complete the upsert behind the scenes. If we want to
# re-fetch the current state of the upsert job, we can do so
(ok, upsert_job) = upsert_job.show()

# To get the progress
print(upsert_job.attributes['log'])
[
{'details': {'Errors': 0, 'Rows Created': 0, 'Rows Updated': 0, 'By RowIdentifier': 0, 'By SID': 0, 'Rows Deleted': 0}, 'time': '2017-02-28T20:20:59', 'stage': 'upsert_complete'},
{'details': {'created': 1}, 'time': '2017-02-28T20:20:59', 'stage': 'columns_created'},
{'details': {'created': 1}, 'time': '2017-02-28T20:20:59', 'stage': 'columns_created'},
{'details': None, 'time': '2017-02-28T20:20:59', 'stage': 'started'}
]


# So maybe we just want to wait here, printing the progress, until the job is done
upsert_job.wait_for_finish(progress = lambda job: sys.stdout.write(str(job.attributes['log'][0]) + "\n"))

# So now if we go look at our original four-four, our data will be there
```


# Testing
Configuration is set in `test/auth.py` for tests. It reads the domain, username, and password from environment variables. If you want to run the tests, set those environment variables to something that will work.

If I wanted to run the tests against my local instance, I would run:
```bash
SOCRATA_DOMAIN=localhost SOCRATA_USERNAME=$SOCRATA_LOCAL_USER SOCRATA_PASSWORD=$SOCRATA_LOCAL_PASS bin/test
```

# Docs
make the docs by running
```bash
make docs
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

socrata-publish-py-0.0.6.tar.gz (8.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page