Converts TMX files to CSV-files and/or stores to HANA table
Project description
TMX Converter
tmxconverter reads tmx-files from an input folder and saves the outcome either
- as csv-files to an output folder or
- stores them into a database table
The language code is mapped to the 2-character code based on the given file 'language_code_mapping.csv' (specified in 'config.yaml')
The application is using a yaml-configuration file config.yaml to control the behaviour read from the working directory.
Command line options
``tmxconverter -log [loglevel]``` with 'warning','info' and 'debug'
Mapping
<tmx><header srclang="en-US">: source_lang<body><tu creationdate: created<body><tu creationid: creation_id<body><tu changeid: change_id<body><tu changedate: changed<body><tu lastusagedate: lastusage- From filename substring until '_' : domain
- Filename : origin
<body><tu><tuv xml:lang: target_lang if different from source_lang using the language mapping<body><tu><tuv><seg>: source_text or target_text depending lang-attribute`
Regular Expression
As a first basic filter a list of regular expressions separated by a 'line separator' can be passed that are stored in a text-file.
Examples:
\s*$\s*\d+\s*$\s*\d*\.\d+\s*$
Files Output
If the parameter FILES_OUTPUT is true all tmx-files are written to the OUTPUT_FOLDER taking the same filename but replacing the suffix.
The output is using a comma-separator and double quotes strings (pandas.to_csv used)
Database Output
If the parameter HDB_OUTPUT is True then the data is stored to the HANA Database for which the details are given in the
config.yaml-file.
The current table structure:
CREATE COLUMN TABLE "TMX"."DATA"(
"SOURCE_LANG" NVARCHAR(2),
"SOURCE_TEXT" NVARCHAR(5000),
"TARGET_LANG" NVARCHAR(2),
"TARGET_TEXT" NVARCHAR(5000),
"DOMAIN" NVARCHAR(15),
"ORIGIN" NVARCHAR(30),
"CREATION_ID" NVARCHAR(30),
"CREATED" LONGDATE,
"CHANGE_ID" NVARCHAR(30),
"CHANGED" LONGDATE,
"LAST_USAGE" LONGDATE,
"USAGE_COUNT" INTEGER
)
Example Config.YAML
# input folder
input_folder : /Users/Shared/data/tmx/input
#language coding map
lang_map_file : language_code_mapping.csv
# output files
OUTPUT_FILES : true # save to output folder
OUTPUT_FOLDER : /Users/Shared/data/tmx/output
# HANA DB
OUTPUT_HDB : false # Save to db
HDB_HOST : 'xxx.com'
HDB_USER : 'TMXUSER'
HDB_PWD : 'PassWord'
HDB_PORT : 111
# Test Parameter
TEST : true
MAX_NUMBER_FILES : 100 # max number of files processed. NOT used when EXCLUSIVE_FILE given
EXCLUSIVE_FILE : reviews.tmx # If not used leave empty
#EXCLUSIVE_FILE :
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tmxconverter-0.0.6-py2-none-any.whl.
File metadata
- Download URL: tmxconverter-0.0.6-py2-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6858f00c246842a1f94c440820aea022bfd9c010ebf56e736d9d835dd87a4f18
|
|
| MD5 |
5e4a855f46f992d17585359b1fb18d90
|
|
| BLAKE2b-256 |
e3f5a2a529beff193d61161e607afd0c2ce0803ab4ef89e43c7faf7306926430
|