Skip to main content

Brazil deaths by city as pandas dataframe or csv file

Project description

Web Scraping Package of Brazil Deaths.

Codecov Coverage Build Status Pypi contributions welcome

Installation

First install the package:

pip install brazil-monthly-deaths

Then install the chrome driver in order to use selenium, you can see more information in the selenium documentation and the chrome driver download page.

Usage

Assuming you have installed the chrome driver

from brazil_monthly_deaths import brazil_deaths, data, update_df

# data is the data from 2015 to 2020
print(data)

# Everyday there are new records,
# so you should get the most recent data.
# Depending on your internet connection
# it may take up to 6 minutes for each month
# if you run for all states. Consider selecting
# only the states you want to work on.
new_data = brazil_deaths(years=[2020], months=[5])

# update the lagging data provided by this package
current_data = update_df(data, new_data)
print(current_data)

Data example

city_id

year

month

region

state

city

deaths

3516805

2020

1

Southeast

Rio de Janeiro

Tracunhaém

8

21835289

2020

1

Southeast

Rio de Janeiro

Trindade

13

10791950

2020

1

Southeast

Rio de Janeiro

Triunfo

16

81875827

2020

1

Southeast

Rio de Janeiro

Tupanatinga

18

99521011

2020

1

Southeast

Rio de Janeiro

Tuparetama

4


API

Dataframes

This package exports some pandas dataframe with the following columns:

  • city_id : unique integer from state and city,

  • year : from 2015 to 2020,

  • month : from 1 to 12,

  • region : [North, Northeast, South, Southeast, Center_West],

  • state : one of the 27 states of Brazil, including country capital,

  • city : city name

  • deaths : number os deaths

from brazil_monthly_deaths import (
  data, # full data
  data_2015,
  data_2016,
  data_2017,
  data_2018,
  data_2019,
  data_2020 # always out of date, you need to update it
)

brazil_deaths

You can use this function to scrap new data directly from the Civil Registry Offices website. Just make sure you have installed the chrome driver, as pointed above.

Official note about the legal deadlines:

The family has up to 24 hours after the death to register the death in the Registry, which, in turn, has up to five days to perform the death registration, and then up to eight days to send the act done to the National Information Center of the Civil Registry ( CRC Nacional), which updates this platform.

It means: The last 13 days are always changing.

from brazil_monthly_deaths import brazil_deaths

Since it will access an external website, it will depend on your internet connection and world location. Consider selecting only the states you want to work on. For each month, for all states it may take up to 6 min to run for a single year.

df = brazil_deaths(
    years=[2015, 2016, 2017, 2018, 2019, 2020],
    months=range(1, 13, 1),
    regions=_regions_names,
    states=_states,
    filename="data",
    return_df=True,
    save_csv=True,
    verbose=True,
    *args,
    **kwargs
)

The _regions_names is:

["North", "Northeast", "South", "Southeast", "Center_West"]

The _states is:

[
  "Acre", "Amazonas", "Amapá", "Pará",
  "Rondônia", "Roraima", "Tocantins", "Paraná",
  "Rio Grande do Sul", "Santa Catarina", "Espírito Santo",
  "Minas Gerais", "Rio de Janeiro", "São Paulo",
  "Distrito Federal", "Goiás", "Mato Grosso do Sul",
  "Mato Grosso", "Alagoas", "Bahia", "Ceará",
  "Maranhão", "Paraíba", "Pernambuco",
  "Piauí", "Rio Grande do Norte", "Sergipe"
]

The *args and **kwargs are passed down to df.to_csv(..., *args, **kwargs)

update_df

Use this function after you have scraped recent data from the Civil Registry Offices website to update the data provided in this package.

from brazil_monthly_deaths import brazil_deaths, data, update_df

new_data = brazil_deaths(years=[2020], months=[5])
current_data = update_df(data, new_data)

It basically put the new data below the old data in the dataframe, then remove the duplicates (excluding deaths) keeping the most recent entries.

get_city_id

Get the unique id of the combination of the state and city.

from brazil_monthly_deaths import get_city_id

sao_paulo_id = get_city_id(state='São Paulo', city='São Paulo')

print(sao_paulo_id) # 89903871

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

brazil_monthly_deaths-1.4.0.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

brazil_monthly_deaths-1.4.0-py2.py3-none-any.whl (3.1 MB view details)

Uploaded Python 2 Python 3

File details

Details for the file brazil_monthly_deaths-1.4.0.tar.gz.

File metadata

  • Download URL: brazil_monthly_deaths-1.4.0.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.1

File hashes

Hashes for brazil_monthly_deaths-1.4.0.tar.gz
Algorithm Hash digest
SHA256 f82f6c063e286730cda463840a7398b16e16c818b643800e67d2a1747af4e24b
MD5 0fc12601b24b20695290d147e297a212
BLAKE2b-256 0c502fb44f785096dc2fa7a1a17be5f7da378fec7cacbbc5fce1caac2a0673b5

See more details on using hashes here.

File details

Details for the file brazil_monthly_deaths-1.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: brazil_monthly_deaths-1.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.1

File hashes

Hashes for brazil_monthly_deaths-1.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bb5f3dd14b1801bf63007c4eaecb329f94ffcf1fdd4863f498a182d9baadf1df
MD5 6e931fd294f2e533b0bd73b6a3926904
BLAKE2b-256 7e8696cca8fe66f8326670c7282dc8d7d3d49217a6c5e5dbc4397b63f631cf85

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page