Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk replay capabilities to replay.py #730

Open
josehelps opened this issue Apr 11, 2022 · 4 comments
Open

Add bulk replay capabilities to replay.py #730

josehelps opened this issue Apr 11, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request

Comments

@josehelps
Copy link
Contributor

Today a user cannot point to a folder and ingest all datasets with the tool.

@josehelps josehelps added the enhancement New feature or request label Apr 11, 2022
@fryguy04
Copy link
Contributor

fryguy04 commented Apr 12, 2022

Initial Slack conversation over here

One idea to start the conversation is to split the current replay.yml into two parts ...

  1. config.yml which contains the Splunk params (host/user/pass) + default index + update_timestamp
  2. dataset.yml which would exists in each datasets directory (adding info to existing yml file) and contains name + source + sourcetype + index (if user wants to override default one in config.yml

Propose we standardize the per-directory yml filename to dataset.yml so it can easily be found/recognized.

Calling replay.py could look like this ...

python replay.py -h 
      -c config.yml         Splunk configuration (host/user/pass/index/override timestamp) (required)
       -d <directory>      Directory to recursively search for dataset.yml to start ingesting (required)

       -i <index>             Override index in config.yml (optional)
       -t                            Override config.yml and update timestamps (optional)
       -s <seconds>       Sleep seconds in between directory ingests (allow splunk to catchup indexing) (optional)      

Each directory's *.yml currently seems to have the sourctypes but not linked/ordered with filename. Here's an example

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
dataset:
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-powershell.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-security.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-sysmon_creddump.log
- https://media.githubusercontent.com/media/splunk/attack_data/master/datasets/attack_techniques/T1003.001/atomic_red_team/windows-system.log
sourcetypes:
- XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
- WinEventLog:Microsoft-Windows-PowerShell/Operational
- WinEventLog:System
- WinEventLog:Security
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

As you can see the 'dataset' files are in a different order than 'sourcetypes'. Propose we bring a formal linkage from the filename to the source/sourcetype (basically moving replay_parameters logic from replay.yml to each directory's dataset.yml file so it can be documented per dataset capture and replayed

author: Patrick Bareiss, Michael Haag
id: cc9b25d6-efc9-11eb-926b-550bf0943fbb
date: '2022-01-12'
description: 'Atomic Test Results: Successful Execution of test T1003.001-1 Windows
  Credential Editor Successful Execution of test T1003.001-2 Dump LSASS.exe Memory
  using ProcDump Return value unclear for test T1003.001-3 Dump LSASS.exe Memory using
  comsvcs.dll Successful Execution of test T1003.001-4 Dump LSASS.exe Memory using
  direct system calls and API unhooking Return value unclear for test T1003.001-6
  Offline Credential Theft With Mimikatz Return value unclear for test T1003.001-7
  LSASS read with pypykatz '
environment: attack_range
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.001/T1003.001.md
- https://github.com/splunk/security-content/blob/develop/tests/T1003_001.yml

replay_parameters:
  - name: atomic_red_team/windows-powershell.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog
       notes: <optional>
  - name: windows-sysmon.log
       source: XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
       sourcetype: xmlwineventlog


@josehelps
Copy link
Contributor Author

I really dig this proposal, although it will cause us to have to refactor a few aspects of our testing pipeline to read from the new yaml structures. With this approach we can/should also create a spec for the dataset.yml and run CI/CD validation for it on every PR. Similarly to security_content repo here. Let me bring this back to the team and think through it but at the surface looks absolutely doable 😄. Thank you so much for spending the time to write this up, super useful!

@josehelps josehelps self-assigned this Apr 15, 2022
@pong-rearc
Copy link

Not sure if there is still movement on this. Tried using today and finding the manual updating of the replay.yml file to be extremely cumbersome as we want to just ingest all the data. Can we just have this install an app into the etc/apps folder of the local splunk instance, which generates / deploys an inputs.conf which ingests the associated files one time? the user can then simply enable/disable the inputs as they feel fit.

@pong-rearc
Copy link

Alternatively- using the same logic as bots dataset may also work:
https://github.com/splunk/botsv3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants