Python functions for CSVW files providing extra functionality beyond the CSVW standards
Project description
csvw_functions_extra
Python functions for CSVW files providing extra functionality beyond the CSVW standards
Installation
pip install git+https://github.com/stevenkfirth/csvw_functions_extra
The python package csvw_functions
will also need to be installed.
API
get_normalized_metadata_table_group_dict
Description: Returns a normalized version of a CSVW metadata file.
csvw_functions_extra.get_normalized_metadata_table_group_dict(
metadata_document_location
)
- metadata_document_location (str): The filepath or url of the CSVW metadata file containing a Table Group object.
Returns (dict): A dictionary of the normalized CSVW Table Group object.
get_available_csv_file_names
Description: Returns the CSV file names of all tables in a CSVW metadata file.
csvw_extra_functions.get_available_csv_file_names(
metadata_document_location
)
- metadata_document_location (str): The filepath or url of the CSVW metadata file containing a Table Group object.
Returns (list): A list of the https://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
value in each table.
download_table_group
Description: Reads a CSVW metadata file and downloads the CSV files from remote locations. This makes use of the https://purl.org/berg/csvw_functions_extra vocabulary.
csvw_functions_extra.download_table_group(
metadata_document_location,
data_folder,
csv_file_names=None,
overwrite_existing_files=False,
verbose=False
)
For each table in the TableGroup object, the method is:
i) For individual CSV files:
- The CSV file is downloaded using the url in
https://purl.org/berg/csvw_functions_extra/vocab/csv_download_url
. - The CSV is saved in the
data_folder
using the filename inhttps://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
ii) For ZIP files:
- The ZIP file is downloaded using the url in
https://purl.org/berg/csvw_functions_extra/vocab/zip_download_url
. - The ZIP is saved in the
data_folder
using the filename inhttps://purl.org/berg/csvw_functions_extra/vocab/zip_file_name
- The CSV file is extracted from the ZIP file using the path in
https://purl.org/berg/csvw_functions_extra/vocab/csv_zip_extract_path
. - The CSV is saved in the
data_folder
using the filename inhttps://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
iii) If an associated metadata file is present (this is separate to the CSVW metadata file):
- This is downloaded using the url in
https://purl.org/berg/csvw_functions_extra/vocab/metadata_download_url
- If step 3 occurs, the associated metadata file is saved in the
data_folder
using the filename inhttps://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
with the additional suffix inhttps://purl.org/berg/csvw_functions_extra/vocab/metadata_file_suffix
.
iv) For all cases:
- A new version of the normalized CSVW metadata file is also saved in the data folder, with Table
url
values linking to the newly downloaded CSV files.
Arguments:
- metadata_document_location (str): The filepath or url of the CSVW metadata file containing a Table Group object.
- data_folder (str): The filepath of a local folder where the downloaded CSV data is to be saved to.
- csv_file_names (str or list): The csv_file_name values of the tables to be downloaded. If None then all tables are downloaded.
- overwrite_existing_files (bool): If True, then any existing CSV files in data_folder will be overwritten. If False, then no download occurs if there is an existing CSV file in data_folder.
- verbose (bool): If True, then this function prints intermediate variables and other useful information.
Returns (str): The local filename of the updated CSVW metadata file containing the new URLs for the newly downloaded tables.
get_metadata_table_group_dict
Description: Returns a CSVW metadata Table Group object.
csvw_functions_extra.get_metadata_table_group_dict(
data_folder,
metadata_filename
)
Arguments:
- data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Returns (dict): A dictionary of the CSVW Table Group object.
get_metadata_table_dict
Description: Returns a CSVW metadata Table object.
csvw_functions_extra.get_metadata_table_dict(
sql_table_name,
metadata_table_group_dict=None,
data_folder=None,
metadata_filename=None
)
Arguments:
- sql_table_name (str): The
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
value of the table. - metadata_table_group_dict (dict): A dictionary of a metadata Table Group object, such as the return value of
get_metadata_table_group_dict
. - data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Returns (dict): A dictionary of the CSVW Table object.
Notes: If supplied the metadata_table_group_dict
will be used to access the table object. If not supplied then the table object is accessed using the file located using data_folder
and metadata_filename
.
get_metadata_column_dict
Description: Returns a CSVW metadata Column object.
csvw_functions_extra.get_metadata_column_dict(
column_name,
sql_table_name,
metadata_table_group_dict=None,
data_folder=None,
metadata_filename=None
)
Arguments:
- column_name (str): The
name
value of a column in a CSVW TableSchema object. - sql_table_name (str): The
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
value of the table. - metadata_table_group_dict (dict): A dictionary of a metadata Table Group object, such as the return value of
get_metadata_table_group_dict
. - data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Notes: If supplied the metadata_table_group_dict
will be used to access the table object. If not supplied then the table object is accessed using the file located using data_folder
and metadata_filename
.
Returns (dict): A dictionary of the CSVW Column object.
get_metadata_sql_table_names
Description: Returns a list of the SQL table names in the CSVW metadata file.
csvw_functions_extra.get_metadata_sql_table_names(
metadata_table_group_dict=None,
data_folder=None,
metadata_filename=None
)
Arguments:
- metadata_table_group_dict (dict): A dictionary of a metadata Table Group object, such as the return value of
get_metadata_table_group_dict
. - data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Notes: If supplied the metadata_table_group_dict
will be used to access the table object. If not supplied then the table object is accessed using the file located using data_folder
and metadata_filename
.
Returns (list): A list of the https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
values of all tables in the CSVW metadata file.
get_metadata_columns_codes
Description: Returns lookup dictionaries for the lookup codes for one or more columns.
csvw_functions_extra.get_metadata_columns_codes(
sql_table_name,
column_names = None,
metadata_table_group_dict = None,
data_folder = None,
metadata_filename=None
)
Arguments:
- sql_table_name (str): The
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
value of the table. - column_names (str, list or None): The
name
value of one or more column in a CSVW TableSchema object. If None, then all columns are returned. - metadata_table_group_dict (dict): A dictionary of a metadata Table Group object, such as the return value of
get_metadata_table_group_dict
. - data_folder (str): The filepath of a local folder where the normalized CSVW metadata file is saved.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Returns (dict of dicts): A dictionary with:
- keys: the names of the column(s)
- values: a dictionary with keys as lookup codes and values as code descriptions.
import_table_group_to_sqlite
Description: Reads a CSVW metadata file and imports the CSV data into a SQLite database. This makes use of the https://purl.org/berg/csvw_functions_extra vocabulary.
Method:
- If not already present, a SQLite database named
database_name
is created in thedata_folder
. - For each table in the TableGroup object, the local CSV file is located in the
data_folder
usinghttps://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
. - The CSV file is imported into the SQLite database into a table named using
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
. - Primary key field(s) are set up using the information in the CSVW TableSchema
primaryKey
value. - Indexes are set up on columns if
https://purl.org/berg/csvw_functions_extra/vocab/sqlsetindex
is True.
Call signature:
csvw_functions_extra.import_table_group_to_sqlite(
metadata_filename,
data_folder,
database_name,
csv_file_names=None,
remove_existing_tables=False,
verbose=False
Arguments:
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder. - data_folder (str): The filepath of a local folder where the downloaded CSV data is located and the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- csv_file_names (str or list): The csv_file_name values of the tables to be imported. If None then all CSV files are imported.
- overwrite_existing_tables (bool): If True, then before importing the CSV data any associated existing table in the database is removed and recreated.
- verbose (bool): If True, then this function prints intermediate variables and other useful information.
Returns: None
add_index
Description: Adds an SQlite index to a column in a SQlite database.
csvw_functions_extra.add_index(
fields,
table_name,
data_folder,
database_name,
unique=False,
verbose=False
)
Arguments:
- fields *(str or list): The field(s) (i.e. columns) to add the index to.
- table_name (str): The name of the table in the SQLite database.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- unique (bool): If True, then a unique index is created.
get_all_table_names_in_database
Description: Returns a list of all table names in the database.
csvw_functions_extra.get_all_table_names_in_database(
data_folder,
database_name
)
Arguments:
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
Returns (list): A list of all table names in the SQLite database.
get_field_names
Description: Returns a list of the field names in a database table.
csvw_functions_extra.get_field_names(
table_name,
data_folder,
database_name,
verbose=False
)
Arguments:
- table_name (str): The name of the table in the SQLite database.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
Returns (list): A list of all field names of the table in the SQLite database.
get_row_count
Description: Returns the number of rows from a table.
csvw_functions_extra.get_row_count(
table_name,
data_folder
database_name,
filter_by=None,
group_by=None,
verbose=False
)
Arguments:
- table_name (str): The name of the table in the SQLite database.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- filter_by (dict): A dictionary with information to filter the rows - see
get_where_string
. - group_by (list): A list of field names to group by.
Returns (list): A list of result dictionaries.
get_rows
Description: Returns one or more rows from a table in the database.
csvw_functions_extra.get_rows(
table_name,
data_folder,
database_name,
filter_by = None,
fields = None,
limit = None,
replace_codes = False,
metadata_filename = None,
verbose = False
)
Arguments:
- table_name (str): The name of the table in the SQLite database.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- filter_by (dict): A dictionary with information to filter the rows - see
get_where_string
. - fields (list): A list of field names to return.
- limit (integer): The number of rows to return. If None, then all rows are returned.
- replace_codes *(bool):
- metadata_filename (str): Required only if
replace_codes
is True.
Returns (list): A list of result dictionaries.
get_sql_table_names_in_database
Description: Returns a list of table names in the database which are also present in the CSVW metadata file.
csvw_functions_extra.get_sql_table_names_in_database(
data_folder,
database_name,
metadata_filename
)
Arguments:
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
- metadata_filename (str): The filename of a CSVW metadata file which has been created by the
download_table_group
method and is located in the data folder.
Returns: A list of table names in the database which are also present as https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
values in the CSVW metadata file.
run_sql
Description: Runs an SQL query on the database and returns the result.
csvw_functions_extra.run_sql(
sql_query,
data_folder,
database_name,
verbose=False
)
Arguments:
- sql_query (str): A SQL query.
- data_folder (str): The filepath of a local folder where the SQLite database is stored.
- database_name (str): The name of the SQLite database, relative to the data_folder.
Returns (list): A list of dictionaries where each dictionary contains one set of results - keys are the field (column) names and values are the data values.
convert_to_iterator
Description: Converts a value to a list.
csvw_functions_extra.convert_to_iterator(
x
)
Arguments: x *(int, float, string, list, tuple, None): The value to be converted to an iterator.
Returns (list): A list of value(s), where:
- A number is converted to a list of the number, e.g.
2
->[2]
- A string to a list of the string, e.g.
'abc'
->['abc']
- A list (or other iterable) remains the same, e.g.
[1,2,3]
->[1,2,3]
None
is converted to an empty list, e.g.None
->[]
get_field_string
Description: Converts a list of field names into a string for use in a SQL query.
get_field_string(
fields = None
)
Arguments:
- fields (str, list or None): The field name(s)
Returns (str): A string of the field names, where:
None
is converted to*
- A string is converted to a quoted string, e.g.
'field'
->' "field1" '
- A list is converted to a series of quoted strings separated by commas, e.g.
['field1','field2']
->' "field1","field2" '
get_group_by_string
Description: Converts a list of field names into two strings for use in GROUP BY clauses in a SQL query.
get_group_by_string(
group_by = None
)
Arguments:
- group_by (str, list or None): The field name(s) to group by
Returns (tuple): A two-item tuple of (group_by_fields,group_by_string), where:
None
is converted to('', '')
'field1'
is converted to(' "field1", ', 'GROUP BY "field1" ')
['field1','field2']
is converted to(' "field1","field2", ', 'GROUP BY "field1","field2" ')
get_where_string
Description: Converts a dictionary of field names and values into a string for use in WHERE clauses in a SQL query.
get_where_string(
filter_by = None
)
Arguments:
- group_by (dict or None): The field name(s) and values to filter by
Returns (str): A string to use in a WHERE clause, where:
None
is converted to''
{'field1': 1}
is converted to' WHERE ("field1" = 1)'
{'field1': 1, 'field2': 'a', 'field3': None}
is converted to' WHERE ("field1" = 1) AND ("field2" = "a") AND ("field3" = Null)'
{'field1': [1,2]}
is converted to' WHERE ("field1" IN (1,2))'
{'field1': ['a','b']}
is converted to' WHERE ("field1" IN ("a","b"))'
{'field1': {'BETWEEN':[1,2]}}
is converted to' WHERE ("field1" BETWEEN (1 AND 2))'
{'field1': {'BETWEEN':['a','b']}}
is converted to' WHERE ("field1" BETWEEN ("a" AND "b"))'
TO DO: Include True and False
CSVW vocabulary
Vocabulary on CSVW column metadata objects
-
https://purl.org/berg/csvw_functions/vocab/codes
: (JSON object) An object (dictionary) relating any codes using in the column data to a string with the meaning of the codes. -
https://purl.org/berg/csvw_functions/vocab/column_notes
: (string) A description of the contents of the column. -
https://purl.org/berg/csvw_functions_extra/vocab/sqlsetindex
: Boolean. Iftrue
then an SQLite index is set up on this column when the data is imported into the database.
Vocabulary on CSVW table metadata objects
Required:
-
https://purl.org/berg/csvw_functions_extra/vocab/csv_file_name
: The name for the newly downloaded CSV file. The CSV file is saved in the data folder using this name. -
https://purl.org/berg/csvw_functions_extra/vocab/sql_table_name
: The name to be used for the database table when importing the CSV file into a SQLite database.
For CSV file downloads:
https://purl.org/berg/csvw_functions_extra/vocab/csv_download_url
: The url where the remote CSV file can be downloaded from.
For ZIP downloads:
-
https://purl.org/berg/csvw_functions_extra/vocab/zip_download_url
: The url where the remote ZIP file can be downloaded from. -
https://purl.org/berg/csvw_functions_extra/vocab/zip_file_name
: The name for the newly downloaded ZIP file. The ZIP file is saved in the data folder using this name. -
https://purl.org/berg/csvw_functions_extra/vocab/csv_zip_extract_path
: The path to extract the CSV file from the ZIP file.
Optional:
-
https://purl.org/berg/csvw_functions_extra/vocab/metadata_download_url
: The url where an associated metadata file for the CSV file can be downloaded from. -
https://purl.org/berg/csvw_functions_extra/vocab/metadata_file_suffix
: A suffix to use when saving the associated metadata file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for csvw_functions_extra-0.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1cb191eba5e27de68f46765093ddab40bc725df66432c25f236b30d0f6a9637f |
|
MD5 | 7ce8460cef6849f3243756a90fdcc955 |
|
BLAKE2b-256 | ca4255dbc43c599954cbf8594b0b61864808fab24e34aff0a59b84df8e6da0e9 |
Hashes for csvw_functions_extra-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47d352c6a4670d852b66fe29d2f3c278c7f1c596af568af9a0f070b802dbee16 |
|
MD5 | 52074784435ae5a379eff0ded097e3ce |
|
BLAKE2b-256 | f5facd7d801099a8d3953f56a06560a2770cbce05361594dedbec27766a2264a |