bw2io

Tools for importing and export life cycle inventory databases

These details have not been verified by PyPI

Project links

Homepage

Project description

Brightway2 input and output
===========================

.. warning:: brightway2-io is under heavy development, and is not yet ready for you* to use (*`unless you're Dutch <https://www.python.org/dev/peps/pep-0020/>`__).

.. note:: You must create the core migrations files using ``bw2io.create_core_migrations()`` before doing anything else!

This package provides tools for the management of inventory databases and impact assessment methods. It is part of the `Brightway2 LCA framework <http://brightwaylca.org>`_. `Online documentation <https://brightway2.readthedocs.org/en/latest/>`_ is available, and the source code is hosted on `Bitbucket <https://bitbucket.org/cmutel/brightway2-io>`_.

In contrast with previous IO functionality in Brightway2, brightway2-io uses an iterative approach to importing and linking data. First, data is *extracted* into a common format. Next, a series of *strategies* is employed to uniquely identify each dataset and link datasets internally and to the biosphere. Following internal linking, linking to other background datasets can be performed. Finally, database data is written to disk.

This approach offers a number of benefits that help mitigate some of the serious problems in existing inventory data formats: the number of unlinked exchanges can be easily seen, linking strategies can be iteratively applied, and intermediate results can be saved.

Here is a typical usage:

.. code-block:: python

In [1]: from bw2io import *

In [2]: so = SingleOutputEcospold2Importer("/path/to/ecoinvent/3.1/cutoff/datasets", "ecoinvent 3.1 cutoff")
11301/11301 (100%) |||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Time: 0:01:56
Converting to unicode
Extracted 11301 datasets in 262.63 seconds

In [3]: so.apply_strategies()
Applying strategy: remove_zero_amount_coproducts
Applying strategy: remove_zero_amount_inputs_with_no_activity
Applying strategy: es2_assign_only_product_with_amount_as_reference_product
Applying strategy: assign_single_product_as_activity
Applying strategy: create_composite_code
Applying strategy: link_biosphere_by_flow_uuid
Applying strategy: link_internal_technosphere_by_composite_code
Applying strategy: delete_exchanges_missing_activity
Applying strategy: delete_ghost_exchanges
Applying strategy: mark_unlinked_exchanges

In [4]: so.statistics()
11301 datasets
521712 exchanges
0 unlinked exchanges
Out[4]: (11301, 521712, 0)

In [5]: so.write_database()

Note that brightway2-io can't magically make problems in databases go away.

Brightway2-io provides the following importers:

* Ecospold 1 (single & multioutput)
* Ecospold 1 impact assessment
* Ecospold 2
* SimaPro CSV (single & multioutput)
* SimaPro CSV impact assessment

As well as the following exporters:

* Excel
* Gephi GEXF
* Matlab

Additionally, data can be imported or exported into Brightway packages, and the entire data directory can be snapshotted.

Importing an LCI database
=========================

LCI database can be imported from ecospold 1 (both single- and multioutput), ecospold 2, and SimaPro CSV (single- and multioutput). Multioutput datasets are allocated to single-output datasets.

Importing from ecospold 1
-------------------------

Importing from ecospold 1 is relatively simple. Multioutput products are allocated to single output products using the given allocation factors using the strategy ``es1_allocate_multioutput``. The reference product is then assigned using the strategy ``assign_only_product_as_production``.

Next, some basic data cleanup is performed. Integer codes are removed, as these are not used consistently by different LCA software (``clean_integer_codes``). Unspecified subcategories are removed (i.e. ``('air', 'unspecified')`` is changed to ``('air',)``) using ``drop_unspecified_subcategories``. Biosphere exchange names and categories are normalized using ``normalize_biosphere_categories`` and ``normalize_biosphere_names``. Biosphere exchanges are removed, as biosphere flows do not have locations (``strip_biosphere_exc_locations``).

Next, a unique activity code is generated for each dataset, using a combination of the name, categories, location, and unit (``set_code_by_activity_hash``).

Finally, biosphere flows are linked to the default biosphere database, and internal technosphere flows are linked using ``link_technosphere_by_activity_hash``.

Importing from ecospold 2
-------------------------

Importing from ecospold 2 is a bit complex, because although ecospold 2 gives unique IDs for many fields, which helps in linking, the current implementation has some `known issues <http://www.ecoinvent.org/database/ecoinvent-version-3/ecoinvent-v30/known-data-issues/>`__ which have to be resolved or ignored by the importer.

.. warning:: Brightway2 cannot reproduce the LCI and LCIA results given by the ecoinvent centre. The technosphere matrix used by ecoinvent cannot be reproduced from the provided unit process datasets. However, the differences for most products are quite small.

We start by removing some exchanges from most datasets. Specifically, we remove exchanges with amounts of zero, both coproducts and technosphere or biosphere inputs (``remove_zero_amount_coproducts`` and ``remove_zero_amount_inputs_with_no_activity``).

We then assign reference products. Although each unit process should have a single output, coproducts which have been allcoated away are often still included, with amounts of zero. We use two strategies to choose the reference product: ``es2_assign_only_product_with_amount_as_reference_product`` and ``assign_only_product_as_production``.

Next, a composite code is generated, using the UUID of the activity and the product (``create_composite_code``).

Biosphere flow exchanges are now normalized (``drop_unspecified_subcategories``) and linked (``link_biosphere_by_flow_uuid``). Internal technosphere exchanges are also linked, using the composite codes (``link_internal_technosphere_by_composite_code``).

Not all technosphere exchanges are linked, however. We need to drop two different types of exchanges, as we have no way of linking them. First, there are some exchanges with listed products but no listed activities - and no activity in the database produces these products. Removal is done with the strategy ``delete_exchanges_missing_activity``.

Additionally, there are some exchanges with listed products and activities - but the given activity doesn't produce the listed product. These exchanges also have to be deleted, using the strategy ``delete_ghost_exchanges``.

.. note:: As of March 2015, only the cutoff version completely avoids the two problems listed above.

Importing from SimaPro
----------------------

Importing SimaPro CSV files is also a bit of a headache. Pré, the makers of SimaPro, have done a lot of work to make LCA software accessible and understandable. This work includes making changes to process names and other metadata, which makes linking these processes back to original ecoinvent data difficult. Fortunately, Pré has been very helpful is supplying correspondence files, which we can use to move (to the best of our ability) from the "SimaPro world" to "ecoinvent world".

.. note:: Importing SimaPro XML export files is not recommended, as there are bugs with exporting ecoinvent 3 processes.

What to do with unmatched exchanges?
------------------------------------

If there are unlinked exchanges, you have several options. If you aren't sure what to do yet, you can save a temporary copy (that can be loaded later) using ``.write_unlinked("some name")``.

Calling ``.statistics()`` will show what kind of exchanges aren't linked, e.g.:

.. code-block:: python

In [4]: sp.statistics()
366 datasets
3991 exchanges
2639 unlinked exchanges
Type biosphere: 170 unique unlinked exchanges
Type technosphere: 330 unique unlinked exchanges

The options to examine or resolve the unlinked exchanges are:

* You can write a spreadsheet of the characterization factors, including their linking status, with ``.write_excel("some name")``.
* You can apply new linking strategies with ``.apply_strategies([some_new_strategy])``. Note that this method requires a *list* of strategies.
* You can match technosphere or biosphere exchanges to other background databases using ``.match_database("another database")``.
* TODO: Add unlinked tech processes to current database
* To resolve unlinked biosphere exchanges which simply don't exist in your current biosphere database, you can:

* Add them to the biosphere database with ``add_unlinked_flows_to_biosphere_database()``
* Create a new biosphere database with ``create_new_biosphere("new biosphere name")``
* Add the biosphere flows to the database you are currently working on (LCI databases can include both process and biosphere flows) with TODO: ``add_unlinked_biosphere_flows_to_current_database()``

.. note:: These methods have several options, and you should understand what they do and read their documentation before choosing between them.

.. note:: You can't write an LCI database with unlinked exchanges.

Migrations
==========

Sometimes the only way to correctly link activities or biosphere flows is by applying a list of name (or other field) transforms. For example, SimaPro will export a process named "[sulfonyl]urea-compound {RoW}| production | Alloc Rec, S", which corresponds to the ecoinvent process "[sulfonyl]urea-compound production", with reference product "[sulfonyl]urea-compound" and location "RoW". In another example, in ecoinvent 2, emissions of water to air were measured in kilograms, and in ecoinvent 3, emissions of water to air are measured in cubic meters. In this case, our migration would look like this:

.. code-block:: python

{
'fields': ['name', 'categories', 'type', 'unit'],
'data': [
(
# First element is input data in the order of `fields` above
('Water', ('air',), 'biosphere', 'kilogram'),
# Second element is new values to substitute
{
'unit': 'cubic meter',
'multiplier': 0.001
}
)
}
}

We call the application of transform lists "migrations", and they are applied with the ``.migrate(migrations_name)`` method.

TODO: Because migrations can be tricky, a log file is kept for each migration, and should be examined.

If the numeric values in an exchange need to changed, the special key 'multiplier' is used, where new_amount = multiplier * old_amount. Uncertainty information and formulas are adjusted automatically, if possible (see ``utils.rescale_exchange``).

A few additional notes:

* Migrations change the underlying data, but do not do any linking - you will also have to apply linking strategies after a migration.
* Migrations can specify any number of fields, but of course the fields must be present in the importing database.
* TODO: Migrations can be specified in an excel template. Template files must be processed using ``convert_migration_file``.
* Subcategories are not expanded automatically, so a separate row in the migrations file would be needed for e.g. ``water (air, non-urban air or from high stacks)``.

Importing an LCIA method
========================

LCIA methods can be imported from ecospold 1 XML files (``EcoinventLCIAImporter``) and SimaPro CSV files (``SimaProLCIACSVImporter``).

When importing an LCIA method or set of LCIA methods, you should specify the biosphere database to link against e.g. ``EcoinventLCIAImporter("some file path", "some biosphere database name")``. If no biosphere database name is provided, the default ``biosphere3`` database is used.

Both importers will attempt to normalize biosphere flow names and categories to the ecospold2 standard, using the strategies:

* ``normalize_simapro_lcia_biosphere_categories``
* ``normalize_simapro_biosphere_names``
* ``normalize_biosphere_names``
* ``normalize_biosphere_categories``

Next, the characterization factors are examined to see if they are only given for root categories, e.g. ``('air',)`` and not ``('air', 'urban air close to ground')``. If only root categories are characterized, then we assume that the characterization factors also apply to all subcategories, using the strategy ``match_subcategories``.

Finally, linking to the given or default biosphere database is attempted, using the strategy ``link_iterable_by_fields`` and the standard fields: name, categories, unit, location. Note that biosphere flows do not actually have a location.

You can now check the linking statistics. If all biosphere flows are linked, write the LCIA methods with ``.write_methods()``. Note that attempting to write an existing method will raise a ``ValueError`` unless you use ``.write_methods(overwrite=True)``, and trying to write methods which aren't completely linked will also raise a ``ValueError``.

If there are unlinked characterization factors, you have several options. If you aren't sure what to do yet, you can save a temporary copy (that can be loaded later) using ``.write_unlinked("some name")``. The options to examine or resolve the unlinked characterization factors are:

* You can write a spreadsheet of the characterization factors, including their linking status, with ``.write_excel("some name")``.
* You can apply new linking strategies with ``.apply_strategies([some_new_strategy])``. Note that this method requires a *list* of strategies.
* TODO: You can write all biosphere flows to a new biosphere database with ``.create_new_biosphere("some name")``.
* If you are satisfied that you don't care about the unlinked characterization factors, you can drop them with ``.drop_unlinked()``.
* Alternatively, you can add the missing biosphere flows to the biosphere database using ``.add_missing_cfs()``.

Testing
=======

Tests should (eventually) have 100% coverage, with most effort going to testing edge cases for strategies, and for importing real-world databases.

Tests are run using `nose <https://nose.readthedocs.org/en/latest/>`__.

To run tests in parallel:

nosetests --processes=<num_cpus_desired> --process-timeout=20

To generate a test coverage report:

nosetests --with-coverage --cover-html --cover-package=bw2io

TODO
====

* Tests for each strategy
* New migrations module

- ecoinvent 2.2 > 3.01 (each system model)
- ecoinvent 3.01 > 3.1 (each system model)
- SimaPro > ecoinvent biosphere

* US LCI importer

- Add DUMMY processes (strategy to add unlinked activities)
- Fix names

+ Easy way to get missing and matching values in new version?

* SimaPro CSV: Can uncertainty values be specific if amount is a formula? What would that mean?
* SimaPro CSV: Extract and apply unit conversions

* Comparison chart of all freely available databases

- USDA
- US LCI
- GreenDelta nexus website

* Specific issues

- SimaPro LCIA importer - waste types seem incorrect
- Ned to find a clever way to replace formula names that conflict with Python keywords

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.9.14

Jan 13, 2026

0.9.13

Dec 31, 2025

0.9.12

Dec 17, 2025

0.9.11

Jun 26, 2025

0.9.10

Jun 26, 2025

0.9.9

Apr 10, 2025

0.9.8

Mar 27, 2025

0.9.7

Mar 21, 2025

0.9.6

Feb 1, 2025

0.9.5

Jan 14, 2025

0.9.4

Dec 16, 2024

0.9.3

Dec 5, 2024

0.9.1

Dec 2, 2024

0.9

Nov 27, 2024

0.9.dev41 pre-release

Oct 15, 2024

0.9.dev40 pre-release

Oct 14, 2024

0.9.dev39 pre-release

Oct 13, 2024

0.9.dev38 pre-release

Sep 13, 2024

0.9.dev37 pre-release

Sep 4, 2024

0.9.dev36 pre-release

Sep 4, 2024

0.9.dev35 pre-release

Sep 2, 2024

0.9.dev34 pre-release

Aug 21, 2024

0.9.dev33 pre-release

Aug 15, 2024

0.9.dev32 pre-release

Aug 15, 2024

0.9.dev31 pre-release

Aug 14, 2024

0.9.dev30 pre-release

Jul 23, 2024

0.9.dev29 pre-release

Jul 9, 2024

0.9.dev28 pre-release

Jun 21, 2024

0.9.dev27 pre-release

May 7, 2024

0.9.dev26 pre-release

Nov 12, 2023

0.9.dev25 pre-release

Nov 10, 2023

0.9.dev24 pre-release

Nov 9, 2023

0.9.dev23 pre-release

Sep 17, 2023

0.9.dev22 pre-release

Sep 15, 2023

0.9.dev21 pre-release

Aug 12, 2023

0.9.dev19 pre-release

Jun 8, 2023

0.9.dev18 pre-release

Jun 6, 2023

0.9.dev17 pre-release

Apr 18, 2023

0.9.dev14 pre-release

Mar 16, 2023

0.9.dev13 pre-release

Mar 16, 2023

0.9.dev12 pre-release

Mar 15, 2023

0.9.dev11 pre-release

Oct 23, 2022

0.9.dev10 pre-release

Oct 13, 2022

0.9.dev9 pre-release

Jun 19, 2022

0.9.dev8 pre-release

Jun 2, 2022

0.9.dev7 pre-release

Jan 11, 2022

0.9.dev6 pre-release

Oct 22, 2021

0.9.dev5 pre-release

Oct 20, 2021

0.9.dev4 pre-release

Oct 14, 2021

0.9.dev3 pre-release

Oct 1, 2021

0.9.dev2 pre-release

Sep 29, 2021

0.8.12

Dec 7, 2023

0.8.11

Dec 5, 2023

0.8.10

Sep 17, 2023

0.8.9

Sep 17, 2023

0.8.8

Oct 13, 2022

0.8.7

Oct 23, 2022

0.8.6

Sep 29, 2021

0.8.5

Sep 21, 2021

0.8.4

Jul 13, 2021

0.8.3.1

Mar 10, 2021

0.8.3

Mar 10, 2021

0.8.2

Feb 25, 2021

0.8.1

Feb 25, 2021

0.8.0

Feb 23, 2021

0.7.12.1

Mar 12, 2020

0.7.12

Feb 25, 2020

0.7.11.3

Oct 31, 2019

0.7.11.2

Oct 30, 2019

0.7.11.1

Oct 29, 2019

0.7.11

Oct 29, 2019

0.7.10

Oct 8, 2019

0.7.9

Sep 20, 2019

0.7.8

Sep 19, 2019

0.7.7

Sep 16, 2019

0.7.6

Jul 6, 2019

0.7.5

Jun 17, 2019

0.7.4

Feb 25, 2019

0.7.3

Dec 18, 2018

0.7.1

Sep 28, 2018

0.7

Sep 10, 2018

0.7.dev1 pre-release

Aug 31, 2018

0.7.dev0 pre-release

Aug 24, 2018

0.6

May 31, 2018

0.6rc5 pre-release

May 9, 2018

0.6rc4 pre-release

Apr 23, 2018

0.6rc3 pre-release

Nov 21, 2017

0.6rc2 pre-release

Nov 9, 2017

0.6rc1 pre-release

Nov 4, 2017

0.5.12

Oct 10, 2017

0.5.11

Oct 10, 2017

0.5.10

Jun 16, 2017

0.5.9.1

Apr 17, 2017

0.5.9

Apr 17, 2017

0.5.8.1

Apr 6, 2017

0.5.8

Apr 6, 2017

0.5.7

Jan 12, 2017

0.5.6

Dec 2, 2016

0.5.5

Nov 10, 2016

0.5.4

Sep 27, 2016

0.5.3

Jul 13, 2016

0.5.2

Jul 1, 2016

0.5.1

Jun 5, 2016

0.5

May 28, 2016

0.4.1

Apr 15, 2016

0.4

Apr 1, 2016

0.3.1

Feb 17, 2016

0.3

Jan 29, 2016

0.2.dev8 pre-release

Dec 9, 2015

This version

0.2.dev7 pre-release

Dec 9, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bw2io-0.2.dev7.tar.gz (14.5 MB view details)

Uploaded Dec 9, 2015 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bw2io-0.2.dev7-py3-none-any.whl (14.6 MB view details)

Uploaded Dec 9, 2015 Python 3

bw2io-0.2.dev7-py2-none-any.whl (14.6 MB view details)

Uploaded Dec 9, 2015 Python 2

File details

Details for the file bw2io-0.2.dev7.tar.gz.

File metadata

Download URL: bw2io-0.2.dev7.tar.gz
Upload date: Dec 9, 2015
Size: 14.5 MB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for bw2io-0.2.dev7.tar.gz
Algorithm	Hash digest
SHA256	`7a7497aea51cc791517c69fd2ff4ccc1dc363e56844222d248626e4a5500468d`
MD5	`e9fec6596faf3ef3f1b1f36c83487d88`
BLAKE2b-256	`1e50e380a474214a8236a3f1c4d3edc16455a48a662e4b941b7c68c1343ed545`

See more details on using hashes here.

File details

Details for the file bw2io-0.2.dev7-py3-none-any.whl.

File metadata

Download URL: bw2io-0.2.dev7-py3-none-any.whl
Upload date: Dec 9, 2015
Size: 14.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No

File hashes

Hashes for bw2io-0.2.dev7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9273494cabea3ff50aa2075db2db084866c1680083f564d94b2c5a052f0067e`
MD5	`43f0b8e5c83d8a6badbbc4db408d43e2`
BLAKE2b-256	`e334cc8d6a092deadaeafbd7dd19afe8c778c199cd312e852d828d6fdae4ad1a`

See more details on using hashes here.

File details

Details for the file bw2io-0.2.dev7-py2-none-any.whl.

File metadata

Download URL: bw2io-0.2.dev7-py2-none-any.whl
Upload date: Dec 9, 2015
Size: 14.6 MB
Tags: Python 2
Uploaded using Trusted Publishing? No

File hashes

Hashes for bw2io-0.2.dev7-py2-none-any.whl
Algorithm	Hash digest
SHA256	`83cde37e289a6de11f7bfc1eac133c209f73625efa3ad9243238e5c22a9a00d6`
MD5	`13ca530fca288bb443429da0e6a94a31`
BLAKE2b-256	`64535a535fa5ae8339a4e9e62593012960cec1ebf5fc70d2c865998ca1d0fa87`

See more details on using hashes here.

bw2io 0.2.dev7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes