Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mongo] 運行數據 #16

Open
1 of 3 tasks
Willian-Zhang opened this issue May 9, 2016 · 5 comments
Open
1 of 3 tasks

[mongo] 運行數據 #16

Willian-Zhang opened this issue May 9, 2016 · 5 comments

Comments

@Willian-Zhang
Copy link
Contributor

Willian-Zhang commented May 9, 2016

  • 從 Web 抓取 3 層
  • 從 DB 中抓取 3 層 (帶更新校驗)
  • 從 DB 中展示 3 層
@Willian-Zhang
Copy link
Contributor Author

Willian-Zhang commented May 9, 2016

從 DB 中抓取 3 層 (帶更新校驗)

> db.users.find({followers:{ $exists: true }}).count()+" / "+ db.users.find().count()
2162 / 274732

capture ended with last level counting: 437509
fetch: 335150ms

statistics

(running) last level efficiency :1,305.4 user/sec
last level expansion rate: 127.1 /user
last level duplicity: 1.59 (1 for none)

after commit

Willian-Zhang@fb630af

@Willian-Zhang Willian-Zhang changed the title [mongo] 關於運行數據 [mongo] 運行數據 May 9, 2016
@Willian-Zhang
Copy link
Contributor Author

Willian-Zhang commented May 9, 2016

從 DB 中抓取 3 層 (帶更新校驗)

> db.users.find({followers:{ $exists: true }}).count()+" / "+ db.users.find().count()
2162 / 274732

capture ended with last level counting: 437509
fetch: 245038ms (4mins 5secs)

statistics

(running) last level efficiency :1,785.47 user/sec
last level expansion rate: 127.1 /user
last level duplicity: 1.59 (1 for none)

after commit

Willian-Zhang@71cf0e3

@starkwang
Copy link
Owner

我感觉现在xxx.config.js有三个了略多,能不能合并成同一个js文件?

以及看了看你写的mongo的feature,写得很不错啊0.0

mongo是可选的还是必需的?我感觉做成可选的比较好,因为现在还基本是同一台机器上运行这个爬虫的服务器端和浏览器端,等将来把浏览器端分离出去之后再把mongo做成必需的比较好。

@Willian-Zhang
Copy link
Contributor Author

Willian-Zhang commented May 28, 2016

写得很不错啊0.0

0.0

0.0

0.0

@Willian-Zhang
Copy link
Contributor Author

Willian-Zhang commented Mar 21, 2017

mongo是可选的还是必需的?我感觉做成可选的比较好,因为现在还基本是同一台机器上运行这个爬虫的服务器端和浏览器端,等将来把浏览器端分离出去之后再把mongo做成必需的比较好。

mongo 是必須的,原因是內存不足可能導致運行失敗,多次運行需要儲存數據。
以及沒有測試過,有了mongo 應該會很大程度上提高效率。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants