content tagging and index generator with maxcompute

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

tagging & Index Generating for firm level data with maxcompute

initialize maxcompute account

Install Aliyun CLI: Install guide
run the aliyun configure command to setup account

$ aliyun configure
Configuring profile 'default' ...
Aliyun Access Key ID [None]: <Your AccessKey ID>
Aliyun Access Key Secret [None]: <Your AccessKey Secret>
Default Region Id [None]: cn-zhangjiakou
Default output format [json]: json
Default Language [zh]: zh

define Tags

add new configs

by csv, folder should include 3 files: - tag_list.csv - prefix.csv - suffix.csv

from tagging_index.tag_processor import TagProcessor
import os
processor = TagProcessor()
tag_config_folder = os.path.join(os.getcwd(), "tag_config")
processor.append_new_config_csv(tag_config_folder)

by json

from tagging_index.tag_processor import TagProcessor
import os
processor = TagProcessor()
tag_config_file = os.path.join(os.getcwd(), "tag_config.json")
processor.append_new_config_json(tag_config_file)

add current config

load existing config from maxcompute or local json file to compare with new config

# load the lastest version from maxcompute
processor.load_current_config()
# load the certain version from maxcompute
processor.load_current_config("202401010111")

validate config

validate and print tag tree

validate_result = processor.validate()
pprint(validate_result)
processor.show_tree(root_tag="tag_value",levels=1)

save config

processor.save_to_json(os.path.join(os.getcwd(), "new_config.json"))
# create and save to a new version in maxcompute
processor.save_to_version()

update tag config for udf resource

from tagging_index.maxcompute.udf_release import UdfRelease

udf = UdfRelease()
# release udf only when _udf module updated.
udf.release_udf()
# default to use lastest version
udf.update_dim_resource(version="")

index generation

please notice you need to update the tagging result in maxcompute before generate index

define index, refer to [index_tag_schema.json]

from tagging_index.index_generator import DemandIndexGenerator,TalentIndexGenerator
DemandIndexGenerator.get_index_schema()

generate index

demand_index = DemandIndexGenerator("index_tag_definition.json")
talent_index = TalentIndexGenerator("index_tag_definition.json")
# list index code with index type suffix
print(talent_index.index_codes)
# set index range
talent_index.start_year = 2018
talent_index.end_year = 2019
# datasource version
talent_index.tag_udf_version="20240604110353.8@6@6"
# check sql script
print(demand_index.generate_sql(['IT_total']).get('IT_total'))
# generate index data and return dataframe
# talent_index.get_index_data('IT_total')
# generate index data and save in maxcompute, ignore index_codes param to generate all
talent_index.generate_index()
# generate the firm level total count in the datasource
talent_index.generate_index_ttl()

generate panel data from index data and maxtrix varibles

from tagging_index.data_generator import PanelDataGenerator, VariableMapOther
from tagging_index.index_generator import DemandIndexGenerator

panel_data = PanelDataGenerator()
# set empty array for all comps
panel_data.comp_ids = ['603893.SH', '300158.SZ', "000001.SZ"]
panel_data.start_year=2019
panel_data.end_year=2020
panel_data.index_version = "<<index_version>>"
# add index
panel_data.add_index('IT_total_T')
panel_data.add_index('IT_total_D')
# add source base index (total count)
panel_data.add_source_base(DemandIndexGenerator,'demand_total')
# add performance matrix
panel_data.add_matrix(code='Y0601b',column_name='emp_no')
panel_data.add_matrix('F100801A', 'mkt_value')
# add additional variable from ods table
basic_info ="(select * from ods_csmar_ipo_cobasic where pt=max_pt('ods_csmar_ipo_cobasic'))"
panel_data.add_other_var(VariableMapOther(
    basic_info
    ,'estbdt'
    ,dim_comp_id='stock_id'
    ,col_comp_id='stkcd'))
print(panel_data.get_panel_sql())
panel_data.get_result_df().tail(500)
panel_data.save_to_csv("panel_data.csv")

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.5

Jul 11, 2024

0.0.4

Jun 19, 2024

0.0.4b5 pre-release

Jun 19, 2024

This version

0.0.3

Jun 13, 2024

0.0.2b0 pre-release

Jun 6, 2024

0.0.1a0 pre-release

Jun 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tagging_index-0.0.3.tar.gz (31.7 kB view details)

Uploaded Jun 13, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tagging_index-0.0.3-py3-none-any.whl (38.9 kB view details)

Uploaded Jun 13, 2024 Python 3

File details

Details for the file tagging_index-0.0.3.tar.gz.

File metadata

Download URL: tagging_index-0.0.3.tar.gz
Upload date: Jun 13, 2024
Size: 31.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for tagging_index-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`8b7f0227db8a68a5e93bb0d8e2eee508f6a323fe5802bdf268f0b1dcc03a34c7`
MD5	`156a066c1dbf7aca2cd4ddafc39140ee`
BLAKE2b-256	`a20343b2a5ec4991d74456d5d1d8690b6141aa59793dc4ab9aadbfdf9277add6`

See more details on using hashes here.

File details

Details for the file tagging_index-0.0.3-py3-none-any.whl.

File metadata

Download URL: tagging_index-0.0.3-py3-none-any.whl
Upload date: Jun 13, 2024
Size: 38.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for tagging_index-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`00441cc24dec269a7f0b29b82bd1739d4330ea4fb1dddc716fa8e1c651738a29`
MD5	`d3f64cfae14febebf857e3fd2f101681`
BLAKE2b-256	`99fceb4187d1ff013c4e8864ce14478aa6225a978a1f9a6947c470ef2d53e8e2`

See more details on using hashes here.

tagging-index 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tagging & Index Generating for firm level data with maxcompute

initialize maxcompute account

define Tags

add new configs

add current config

validate config

save config

update tag config for udf resource

index generation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes