Skip to main content

Japanese multi-task CNN trained on UD-Japanese BCCWJ r2.8 + GSK2014-A(2019). Assigns word2vec token vectors. Components: tok2vec, parser, ner, morphologizer, atteribute_ruler, compound_splitter, bunsetu_recognizer.

Project description

Japanese multi-task CNN trained on UD-Japanese BCCWJ r2.8 + GSK2014-A(2019). Assigns word2vec token vectors. Components: tok2vec, parser, ner, morphologizer, atteribute_ruler, compound_splitter, bunsetu_recognizer.

Feature | Description |
— | — |
Name | ja_ginza |
Version | 5.1.0 |
spaCy | >=3.2.0,<3.3.0 |
Default Pipeline | tok2vec, parser, attribute_ruler, ner, morphologizer, compound_splitter, bunsetu_recognizer |
Components | tok2vec, parser, attribute_ruler, ner, morphologizer, compound_splitter, bunsetu_recognizer |
Vectors | 480443 keys, 20000 unique vectors (300 dimensions) |
Sources | [UD_Japanese-BCCWJ r2.8](https://github.com/UniversalDependencies/UD_Japanese-BCCWJ) (Asahara, M., Kanayama, H., Tanaka, T., Miyao, Y., Uematsu, S., Mori, S., Matsumoto, Y., Omura, M., & Murawaki, Y.)<br />[GSK2014-A(2019)](https://www.gsk.or.jp/catalog/gsk2014-a/) (Tokyo Institute of Technology)<br />[SudachiDict_core](https://github.com/WorksApplications/SudachiDict) (Works Applications Enterprise Co., Ltd.)<br />[chiVe](https://github.com/WorksApplications/chiVe) (Works Applications Enterprise Co., Ltd.) |
License | MIT License |
Author | [Megagon Labs Tokyo.](https://github.com/megagonlabs/ginza) |

### Label Scheme

<details>

<summary>View label scheme (241 labels for 3 components)</summary>

Component | Labels |
— | — |
`parser` | ROOT, acl, acl_bunsetu, advcl, advcl_bunsetu, advmod, advmod_bunsetu, amod, amod_bunsetu, aux, aux_bunsetu, case, case_bunsetu, cc, cc_bunsetu, ccomp_bunsetu, compound, compound_bunsetu, cop, csubj_bunsetu, dep, dep_bunsetu, det_bunsetu, discourse_bunsetu, dislocated_bunsetu, fixed, mark, nmod, nmod_bunsetu, nsubj_bunsetu, nummod, obj_bunsetu, obl_bunsetu, punct, punct_bunsetu |
`ner` | Academic, Age, Aircraft, Airport, Amphibia, Amusement_Park, Animal_Disease, Animal_Part, Archaeological_Place_Other, Art_Other, Astral_Body_Other, Award, Bay, Bird, Book, Bridge, Broadcast_Program, Cabinet, Calorie, Canal, Car, Car_Stop, Character, City, Class, Clothing, Color_Other, Company, Company_Group, Compound, Conference, Constellation, Continental_Region, Corporation_Other, Country, Countx_Other, County, Culture, Date, Day_Of_Week, Disease_Other, Dish, Doctrine_Method_Other, Domestic_Region, Drug, Earthquake, Element, Email, Era, Ethnic_Group_Other, Event_Other, Facility_Other, Facility_Part, Family, Fish, Flora, Flora_Part, Food_Other, Frequency, Fungus, GOE_Other, GPE_Other, Game, Geological_Region_Other, God, Government, ID_Number, Incident_Other, Insect, Intensity, International_Organization, Island, Lake, Language_Other, Latitude_Longtitude, Law, Line_Other, Living_Thing_Other, Living_Thing_Part_Other, Location_Other, Magazine, Mammal, Material, Measurement_Other, Military, Mineral, Mollusc_Arthropod, Money, Money_Form, Mountain, Movement, Movie, Multiplication, Museum, Music, N_Animal, N_Country, N_Event, N_Facility, N_Flora, N_Location_Other, N_Natural_Object_Other, N_Organization, N_Person, N_Product, Name_Other, National_Language, Nationality, Natural_Disaster, Natural_Object_Other, Natural_Phenomenon_Other, Nature_Color, Newspaper, Numex_Other, Occasion_Other, Offense, Ordinal_Number, Organization_Other, Park, Percent, Period_Day, Period_Month, Period_Time, Period_Week, Period_Year, Periodx_Other, Person, Phone_Number, Physical_Extent, Plan, Planet, Point, Political_Organization_Other, Political_Party, Port, Position_Vocation, Postal_Address, Printing_Other, Pro_Sports_Organization, Product_Other, Province, Public_Institution, Railroad, Rank, Region_Other, Religion, Religious_Festival, Reptile, Research_Institute, River, Road, Rule_Other, School, School_Age, Sea, Ship, Show, Show_Organization, Spa, Space, Spaceship, Speed, Sport, Sports_Facility, Sports_League, Sports_Organization_Other, Station, Style, Temperature, Theater, Theory, Time, Time_Top_Other, Timex_Other, Title_Other, Train, Treaty, Tumulus, Tunnel, URL, Unit_Other, Vehicle_Other, Volume, War, Water_Route, Weapon, Weight, Worship_Place, Zoo |
`morphologizer` | POS=PUNCT, POS=NUM, POS=NOUN, POS=ADP, POS=AUX, POS=VERB, POS=CCONJ, POS=PART, POS=SCONJ, POS=SYM, POS=ADJ, POS=DET, POS=PRON, POS=PROPN, POS=ADV, POS=X, POS=INTJ |

</details>

### Accuracy

Type | Score |
— | — |
DEP_UAS | 90.95 |
DEP_LAS | 89.07 |
SENTS_P | 83.03 |
SENTS_R | 83.03 |
SENTS_F | 83.03 |
ENTS_F | 55.40 |
ENTS_P | 58.37 |
ENTS_R | 52.72 |
POS_ACC | 97.44 |
MORPH_MICRO_F | 0.00 |
MORPH_PER_FEAT | 0.00 |
MORPH_ACC | 0.00 |
TAG_ACC | 0.00 |
TOK2VEC_LOSS | 20475934.74 |
PARSER_LOSS | 855491.84 |
NER_LOSS | 124114.99 |
MORPHOLOGIZER_LOSS | 26714.42 |

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ja_ginza-5.2.0.tar.gz (59.0 MB view details)

Uploaded Source

Built Distribution

ja_ginza-5.2.0-py3-none-any.whl (59.1 MB view details)

Uploaded Python 3

File details

Details for the file ja_ginza-5.2.0.tar.gz.

File metadata

  • Download URL: ja_ginza-5.2.0.tar.gz
  • Upload date:
  • Size: 59.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for ja_ginza-5.2.0.tar.gz
Algorithm Hash digest
SHA256 c12e875fbf30144acd05ca0e17dd211372777e9b762a2167862456e1894c605d
MD5 7b246523a6a636498dce936caab1de02
BLAKE2b-256 c59d4b17a5b0e00f67fe69602af59d0779ac079076f0d82a84ffa8da0b18274f

See more details on using hashes here.

File details

Details for the file ja_ginza-5.2.0-py3-none-any.whl.

File metadata

  • Download URL: ja_ginza-5.2.0-py3-none-any.whl
  • Upload date:
  • Size: 59.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.11.7

File hashes

Hashes for ja_ginza-5.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c31a2ac57a7c2a1440f7e58111e37306c20dfaf08fd5a3f6f065948ff126cb30
MD5 a8b22ef400475331cb0cbc3fd25115c5
BLAKE2b-256 f0f8d1bbc8bd545e61cacb7c0f0fedcfc041e2853e573ec4825136a8bd85e586

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page