Scrapy the Zhihu content and user social network information. Now it contains 314400 questions and 261376 users..
Project description
Zhihu_Spider
Scrapy the Zhihu content and user social network information. Now it contains 314400 questions and 261376 users.
File Strcture
- ./zhihu/zhihu : The related files about crawling the zhihu.com
- ./zhihu/zhihu_dat/ : The structured data for baseline experiments on zhihu dataset
- ./zhihu/zhihu_dat/item.dat: the corpus(bag of words) of all questions, using Blei’s LDA-C format. The line number represents qid
- ./zhihu/zhihu_dat/users.dat: the corpus of all users, the features of users is the bag representations of all the questions they have answered.
- ./zhihu/zhihu_dat/vocab.dat: the vocabulary of zhihu dataset
- ./zhihu/zhihu_dat/item_adj.dat: the questions and their answerer ids, the first column is the number of answers, the line number is question id
- ./zhihu/zhihu_dat/user_adj.dat: the users and their answered question ids, the line number the user id,
- ./zhihu/zhihu_dat/truth.dat: the questions and their answers, each answer has a score with them
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
zhihu_spider-1.2.5.tar.gz
(13.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zhihu_spider-1.2.5.tar.gz.
File metadata
- Download URL: zhihu_spider-1.2.5.tar.gz
- Upload date:
- Size: 13.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4bc21a87ac224b4531a5dcd8c27e10c8258664baa19d8bd41d4a92c249930aff
|
|
| MD5 |
608021e35a7c6df9b4b7834052dfd53f
|
|
| BLAKE2b-256 |
ccd73792be7ce93cc5bfe478c3248fd36f9075b5613b342d3f158b520a520886
|
File details
Details for the file Zhihu_Spider-1.2.5-py3-none-any.whl.
File metadata
- Download URL: Zhihu_Spider-1.2.5-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d108952dd0af43979f7e7e839dee130329268aa48bae0d787a2358f6bc2c2ff
|
|
| MD5 |
4d4fddebb773b8f2eb8f39f5d190cdd8
|
|
| BLAKE2b-256 |
50752e6b7877edd34aba1dd3de452fd57659cb3c4bf417008e65721f03381479
|