Scrapy the Zhihu content and user social network information. Now it contains 314400 questions and 261376 users..
Project description
Zhihu_Spider
Scrapy the Zhihu content and user social network information. Now it contains 314400 questions and 261376 users.
File Strcture
- ./zhihu/zhihu : The related files about crawling the zhihu.com
- ./zhihu/zhihu_dat/ : The structured data for baseline experiments on zhihu dataset
- ./zhihu/zhihu_dat/item.dat: the corpus(bag of words) of all questions, using Blei’s LDA-C format. The line number represents qid
- ./zhihu/zhihu_dat/users.dat: the corpus of all users, the features of users is the bag representations of all the questions they have answered.
- ./zhihu/zhihu_dat/vocab.dat: the vocabulary of zhihu dataset
- ./zhihu/zhihu_dat/item_adj.dat: the questions and their answerer ids, the first column is the number of answers, the line number is question id
- ./zhihu/zhihu_dat/user_adj.dat: the users and their answered question ids, the line number the user id,
- ./zhihu/zhihu_dat/truth.dat: the questions and their answers, each answer has a score with them
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zhihu_spider-1.2.5.tar.gz
(13.8 kB
view hashes)
Built Distribution
Close
Hashes for Zhihu_Spider-1.2.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d108952dd0af43979f7e7e839dee130329268aa48bae0d787a2358f6bc2c2ff |
|
MD5 | 4d4fddebb773b8f2eb8f39f5d190cdd8 |
|
BLAKE2b-256 | 50752e6b7877edd34aba1dd3de452fd57659cb3c4bf417008e65721f03381479 |