You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While cyhy-data-extract.py is running, tickets that are in the process of being updated in the DB by the CyHy commander may be unintentionally excluded from the tickets query and therefore left out from the daily extract file. This behavior is not desired.
To reproduce
The following is theoretical - we have not replicated iit in a controlled environment, but we believe it happens in Production:
Run cyhy-data-extract.py while the CyHy commander is also running. If the cyhy-data-extract.py tickets query attempts to read a ticket that is being updated by the commander while the extract query is running, the ticket document will be locked by MongoDB and unreadable. That ticket will be excluded from the query results and the daily extract file.
See here for more info about Mongo concurrency limitations.
Expected behavior
Ideally, all tickets that were modified within the specified timeframe of the cyhy-data-extract.py query should be included in the daily extract file.
@mcdonnnj and I think that the easiest way to ensure this behavior is to stop the commander from making any DB updates while cyhy-data-extract.py is performing queries, though there may be other solutions to this problem.
Any helpful log output or screenshots
Here is some annotated log output from feeds.log-20231027.gz showing the ticket that alerted us to this issue:
2023-10-26T08:26:58Z - Ticket 653a30455de7cb1cd6740083 opened (initial vulnscan)2023-10-26T22:49:22Z - Ticket 653a30455de7cb1cd6740083 verified (2nd vulnscan)2023-10-27T00:00:00Z - cyhy-feeds script starts up** START: POTENTIAL WINDOW 1 FOR TICKETS TO BE UPDATED IN DB, BUT NOT INCLUDED IN DAILY EXTRACT **2023-10-27 00:00:01,994 INFO cyhy-feeds - Beginning data extraction process.2023-10-27 00:00:02,027 INFO cyhy-feeds - Creating cursors for query results.2023-10-27 00:00:02,028 INFO cyhy-feeds - Extracting data from database(s).** END: POTENTIAL WINDOW 1 FOR TICKETS TO BE UPDATED IN DB, BUT NOT INCLUDED IN DAILY EXTRACT **2023-10-26 00:00:02,124 INFO cyhy-feeds - Fetching from host_scans collection...2023-10-26 00:00:13,221 INFO cyhy-feeds - Finished writing host_scans to file.2023-10-26 00:00:26,779 INFO cyhy-feeds - Added host_scans_2023-10-26T000000+0000.json to cyhy_extract_2023-10-26T000000+0000.tbz2023-10-26 00:00:26,797 INFO cyhy-feeds - Deleted host_scans_2023-10-26T000000+0000.json as part of cleanup.2023-10-26 00:00:26,797 INFO cyhy-feeds - Fetching from hosts collection...2023-10-26 00:04:22,116 INFO cyhy-feeds - Finished writing hosts to file.2023-10-26 00:06:42,793 INFO cyhy-feeds - Added hosts_2023-10-26T000000+0000.json to cyhy_extract_2023-10-26T000000+0000.tbz2023-10-26 00:06:42,919 INFO cyhy-feeds - Deleted hosts_2023-10-26T000000+0000.json as part of cleanup.2023-10-26 00:06:42,919 INFO cyhy-feeds - Fetching from kevs collection...2023-10-26 00:06:42,939 INFO cyhy-feeds - Finished writing kevs to file.2023-10-26 00:06:42,940 INFO cyhy-feeds - Added kevs_2023-10-26T000000+0000.json to cyhy_extract_2023-10-26T000000+0000.tbz2023-10-26 00:06:42,940 INFO cyhy-feeds - Deleted kevs_2023-10-26T000000+0000.json as part of cleanup.2023-10-26 00:06:42,940 INFO cyhy-feeds - Fetching from port_scans collection...2023-10-26 00:27:51,508 INFO cyhy-feeds - Finished writing port_scans to file.2023-10-26 00:51:12,861 INFO cyhy-feeds - Added port_scans_2023-10-26T000000+0000.json to cyhy_extract_2023-10-26T000000+0000.tbz2023-10-26 00:51:14,184 INFO cyhy-feeds - Deleted port_scans_2023-10-26T000000+0000.json as part of cleanup.2023-10-26 00:51:14,187 INFO cyhy-feeds - Fetching from requests collection...2023-10-26 00:51:14,707 INFO cyhy-feeds - Finished writing requests to file.2023-10-26 00:51:15,574 INFO cyhy-feeds - Added requests_2023-10-26T000000+0000.json to cyhy_extract_2023-10-26T000000+0000.tbz2023-10-26 00:51:15,594 INFO cyhy-feeds - Deleted requests_2023-10-26T000000+0000.json as part of cleanup.** START: POTENTIAL WINDOW 2 FOR TICKETS TO BE UPDATED IN DB, BUT NOT INCLUDED IN DAILY EXTRACT **2023-10-26 00:51:15,594 INFO cyhy-feeds - Fetching from tickets collection...** END: POTENTIAL WINDOW 2 FOR TICKETS TO BE UPDATED IN DB, BUT NOT INCLUDED IN DAILY EXTRACT **2023-10-26 01:13:11,320 INFO cyhy-feeds - Finished writing tickets to file....
Note
#49 partially helps with "POTENTIAL WINDOW 1" above, but not with "POTENTIAL WINDOW 2".
The text was updated successfully, but these errors were encountered:
dav3r
added
the
bug
This issue or pull request addresses broken functionality
label
Nov 17, 2023
🐛 Summary
While
cyhy-data-extract.py
is running, tickets that are in the process of being updated in the DB by the CyHy commander may be unintentionally excluded from the tickets query and therefore left out from the daily extract file. This behavior is not desired.To reproduce
The following is theoretical - we have not replicated iit in a controlled environment, but we believe it happens in Production:
Run
cyhy-data-extract.py
while the CyHy commander is also running. If thecyhy-data-extract.py
tickets query attempts to read a ticket that is being updated by the commander while the extract query is running, the ticket document will be locked by MongoDB and unreadable. That ticket will be excluded from the query results and the daily extract file.See here for more info about Mongo concurrency limitations.
Expected behavior
Ideally, all tickets that were modified within the specified timeframe of the
cyhy-data-extract.py
query should be included in the daily extract file.@mcdonnnj and I think that the easiest way to ensure this behavior is to stop the commander from making any DB updates while
cyhy-data-extract.py
is performing queries, though there may be other solutions to this problem.Any helpful log output or screenshots
Here is some annotated log output from
feeds.log-20231027.gz
showing the ticket that alerted us to this issue:Note
#49 partially helps with "POTENTIAL WINDOW 1" above, but not with "POTENTIAL WINDOW 2".
The text was updated successfully, but these errors were encountered: