Malcolm's runtime settings are stored (with a few exceptions) as environment variables in configuration files ending with a .env
suffix in the ./config
directory. The ./scripts/configure
script can help users configure and tune these settings.
Run ./scripts/configure
and answer the questions to configure Malcolm. For an in-depth treatment of these configuration questions, see the Configuration section in End-to-end Malcolm and Hedgehog Linux ISO Installation.
Although the configuration script automates many of the following configuration and tuning parameters, some environment variables of particular interest are listed here for reference.
arkime.env
andarkime-secret.env
- settings for ArkimeARKIME_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Arkime for analyzing PCAP files (default1
)ARKIME_PASSWORD_SECRET
- the password hash secret for the Arkime viewer cluster (seepasswordSecret
in Arkime INI Settings) used to secure the connection used when Arkime viewer retrieves a PCAP payload for display in its user interfaceARKIME_ROTATE_INDEX
- how often (based on network traffic timestamp) to create a new index in OpenSearchARKIME_QUERY_ALL_INDICES
- whether or not Arkime should query all indices instead of trying to calculate which ones pertain to the search time frame (defaultfalse
)ARKIME_SPI_DATA_MAX_INDICES
- the maximum number of indices for querying SPI data, or set to-1
to disable any max. The Arkime documentation warns "OpenSearch/Elasticsearch MAY blow up if we ... search too many indices." (default7
)MANAGE_PCAP_FILES
andARKIME_FREESPACEG
- these variables deal with PCAP deletion by Arkime, see Managing disk usage belowMAXMIND_GEOIP_DB_LICENSE_KEY
- Malcolm uses MaxMind's free GeoLite2 databases for GeoIP lookups. As of December 30, 2019, these databases are no longer available for download via a public URL. Instead, they must be downloaded using a MaxMind license key (available without charge from MaxMind). The license key can be specified here for GeoIP database downloads during build- and run-time.MAXMIND_GEOIP_DB_ALTERNATE_DOWNLOAD_URL
- As an alternative to (or fallback for)MAXMIND_GEOIP_DB_LICENSE_KEY
, a URL prefix may be specified in this variable (e.g.,https://example.org/foo/bar
) which will be used as a fallback. This URL should serve up.tar.gz
files in the same format as those provided by the official source (see the example here).- The following variables configure Arkime's use of OpenSearch Index State Management (ISM) or Elasticsearch Index Lifecycle Management (ILM):
INDEX_MANAGEMENT_ENABLED
- if set totrue
, Malcolm's instance of Arkime will use these features when indexing dataINDEX_MANAGEMENT_OPTIMIZATION_PERIOD
- the period in hours or days that Arkime will keep records in the hot state (default30d
)INDEX_MANAGEMENT_RETENTION_TIME
- the period in hours or days that Arkime will keep records before deleting them (default90d
)INDEX_MANAGEMENT_OLDER_SESSION_REPLICAS
- the number of replicas for older sessions indices (default0
)INDEX_MANAGEMENT_HISTORY_RETENTION_WEEKS
- the retention time period (weeks) for Arkime history data (default13
)INDEX_MANAGEMENT_SEGMENTS
- the number of segments Arlime will use to optimize sessions (default1
)INDEX_MANAGEMENT_HOT_WARM_ENABLED
- whether or not Arkime should use a hot/warm design (storing non-session data in a warm index); setting up hot/warm index policies also requires configuration on the local nodes in accordance with the Arkime documentation
arkime-live.env
- settings for live traffic capture with Arkime- See Tuning Arkime for variables related to managing Arkime's performance and resource utilization during live capture.
auth-common.env
- authentication-related settingsNGINX_BASIC_AUTH
- if set totrue
, use TLS-encrypted HTTP basic authentication (default); if set tofalse
, use Lightweight Directory Access Protocol (LDAP) authentication
auth.env
- stores the Malcolm administrator's username and password hash for its nginx reverse proxybeats-common.env
- settings for interactions between Logstash and FilebeatBEATS_SSL
– if set totrue
, Logstash will use require encrypted communications for any external Beats-based forwarders from which it will accept logs (defaulttrue
)LOGSTASH_HOST
– the host and port at which Beats-based forwarders will connect to Logstash (defaultlogstash:5044
); seeMALCOLM_PROFILE
below
dashboards.env
anddashboards-helper.env
- settings for the containers that configure and maintain OpenSearch and OpenSearch DashboardsDASHBOARDS_URL
- used primarily whenOPENSEARCH_PRIMARY
is set toelasticsearch-remote
(see OpenSearch and Elasticsearch instances), this variable stores the URL for the Kibana instance into which Malcolm's dashboard's and index templates will be importedDASHBOARDS_PREFIX
– a string to prepend to the titles of Malcolm's prebuilt dashboards prior upon import during Malcolm's initialization (default is an empty string)DASHBOARDS_DARKMODE
– if set totrue
, OpenSearch Dashboards will be set to dark mode upon initialization (defaulttrue
)OPENSEARCH_INDEX_SIZE_PRUNE_LIMIT
- the maximum cumulative size of OpenSearch indices are allowed to consume before the oldest indices are deleted, see Managing disk usage below
filebeat.env
- settings specific to Filebeat, particularly for how Filebeat watches for new log files to parse and how it receives and stores third-Party logsLOG_CLEANUP_MINUTES
andZIP_CLEANUP_MINUTES
- these variables deal cleaning up already-processed log files, see Managing disk usage below
logstash.env
- settings specific to LogstashLOGSTASH_OUI_LOOKUP
– if set totrue
, Logstash will map MAC addresses to vendors for all source and destination MAC addresses when analyzing Zeek logs (defaulttrue
)LOGSTASH_REVERSE_DNS
– if set totrue
, Logstash will perform a reverse DNS lookup for all external source and destination IP address values when analyzing Zeek logs (defaultfalse
)LOGSTASH_SEVERITY_SCORING
- if set totrue
, Logstash will perform severity scoring when analyzing Zeek logs (defaulttrue
)LS_JAVA_OPTS
- part of LogStash's JVM settings, the-Xmx
and-Xms
values set the size of LogStash's Java heap (we recommend somewhere between1500m
and4g
)
pipeline.workers
,pipeline.batch.size
andpipeline.batch.delay
- these settings are used to tune the performance and resource utilization of the thelogstash
container; see Tuning and Profiling Logstash Performance,logstash.yml
and Multiple Pipelines
lookup-common.env
- settings for enrichment lookups, including those used for customizing event severity scoringCONNECTION_SECONDS_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the duration threshold (in seconds) for assigning severity to long connections (default3600
)FREQ_LOOKUP
- if set totrue
, domain names (from DNS queries and SSL server names) will be assigned entropy scores as calculated byfreq
(defaultfalse
)FREQ_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the entropy threshold for assigning severity to events with entropy scores calculated byfreq
; a lower value will only assign severity scores to fewer domain names with higher entropy (e.g.,2.0
forNQZHTFHRMYMTVBQJE.COM
), while a higher value will assign severity scores to more domain names with lower entropy (e.g.,7.5
fornaturallanguagedomain.example.org
) (default2.0
)SENSITIVE_COUNTRY_CODES
- when severity scoring is enabled, this variable defines a comma-separated list of sensitive countries (using ISO 3166-1 alpha-2 codes) (default'AM,AZ,BY,CN,CU,DZ,GE,HK,IL,IN,IQ,IR,KG,KP,KZ,LY,MD,MO, PK,RU,SD,SS,SY,TJ,TM,TW,UA,UZ'
, taken from the U.S. Department of Energy Sensitive Country List)TOTAL_MEGABYTES_SEVERITY_THRESHOLD
- when severity scoring is enabled, this variable indicates the size threshold (in megabytes) for assigning severity to large connections or file transfers (default1000
)
netbox-common.env
,netbox.env
,netbox-secret.env
,netbox-postgres.env
,netbox-redis-cache.env
andnetbox-redis.env
- settings related to NetBox and Asset Interaction AnalysisNETBOX_DISABLED
- if set totrue
, Malcolm will not start and manage a NetBox instance (defaulttrue
)NETBOX_ENRICHMENT
- if set totrue
, Logstash will enrich network traffic metadata via NetBox API callsNETBOX_DEFAULT_SITE
- specifies the default NetBox site name for use when enriching network traffic metadata via NetBox lookups if a specific site is not otherwise specified for the source of the data (defaultMalcolm
)NETBOX_AUTO_POPULATE
- if set totrue
, Logstash will populate the NetBox inventory based on observed network trafficNETBOX_AUTO_CREATE_PREFIX
- if set totrue
, Logstash will automatically create private subnet prefixes in the NetBox inventory based on observed network trafficNETBOX_DEFAULT_AUTOCREATE_MANUFACTURER
- if set totrue
, new manufacturer entries will be created in the NetBox database when matching device manufacturers to OUIs (defaulttrue
)NETBOX_DEFAULT_FUZZY_THRESHOLD
- fuzzy-matching threshold for matching device manufacturers to OUIs (default0.95
)
nginx.env
- settings specific to Malcolm's nginx reverse proxyNGINX_LOG_ACCESS_AND_ERRORS
- if set totrue
, all access to Malcolm via its web interfaces will be logged to OpenSearch (defaultfalse
)NGINX_SSL
- if set totrue
, require HTTPS connections to Malcolm'snginx-proxy
container (default); if set tofalse
, use unencrypted HTTP connections (using unsecured HTTP connections is NOT recommended unless you are running Malcolm behind another reverse proxy such as Traefik, Caddy, etc.)
opensearch.env
- settings specific to OpenSearchOPENSEARCH_JAVA_OPTS
- one of OpenSearch's most important settings, the-Xmx
and-Xms
values set the size of OpenSearch's Java heap (we recommend setting this value to half of system RAM, up to 32 gigabytes)OPENSEARCH_PRIMARY
- one ofopensearch-local
,opensearch-remote
, orelasticsearch-remote
, to determine the OpenSearch or Elasticsearch instance Malcolm will use (defaultopensearch-local
)OPENSEARCH_URL
- when using Malcolm's internal OpenSearch instance (i.e.,OPENSEARCH_PRIMARY
isopensearch-local
) this should behttp://opensearch:9200
, otherwise this value specifies the primary remote instance URL in the formatprotocol://host:port
(defaulthttp://opensearch:9200
)OPENSEARCH_SSL_CERTIFICATE_VERIFICATION
- if set totrue
, connections to the primary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (defaultfalse
)OPENSEARCH_SECONDARY
- one ofopensearch-local
,opensearch-remote
,elasticsearch-remote
, or blank (unset) to indicate that Malcolm should forward logs to a secondary remote OpenSearch instance in addition to the primary OpenSearch instance (default is unset)OPENSEARCH_SECONDARY_URL
- when forwarding to a secondary remote OpenSearch instance (i.e.,OPENSEARCH_SECONDARY
is set) this value specifies the secondary remote instance URL in the formatprotocol://host:port
OPENSEARCH_SECONDARY_SSL_CERTIFICATE_VERIFICATION
- if set totrue
, connections to the secondary remote OpenSearch instance will require full TLS certificate validation (this may fail if using self-signed certificates) (defaultfalse
)- The following variables control the OpenSearch indices to which network traffic metadata are written. Changing them from their defaults may cause logs from non-Arkime data sources (i.e., Zeek, Suricata) to not show up correctly in Arkime.
MALCOLM_NETWORK_INDEX_PATTERN
- Index pattern for network traffic logs written via Logstash (default isarkime_sessions3-*
)MALCOLM_NETWORK_INDEX_TIME_FIELD
- Default time field to use for network traffic logs in Logstash and Dashboards (default isfirstPacket
)MALCOLM_NETWORK_INDEX_SUFFIX
- Suffix used to create index to which network traffic logs are written- supports Ruby
strftime
strings in%{}
) (e.g., hourly:%{%y%m%dh%H}
, twice daily:%{%P%y%m%d}
, daily (default):%{%y%m%d}
, weekly:%{%yw%U}
, monthly:%{%ym%m}
- supports expanding dot-delimited field names in
{{ }}
(e.g.,{{event.provider}}%{%y%m%d}
)
- supports Ruby
- The following variables control the OpenSearch indices to which other logs (third-party logs, resource utilization reports from network sensors, etc.) are written.
MALCOLM_OTHER_INDEX_PATTERN
- Index pattern for other logs written via Logstash (default ismalcolm_beats_*
)MALCOLM_OTHER_INDEX_TIME_FIELD
- Default time field to use for other logs in Logstash and Dashboards (default is@timestamp
)MALCOLM_OTHER_INDEX_SUFFIX
- Suffix used to create index to which other logs are written (with the same rules asMALCOLM_NETWORK_INDEX_SUFFIX
above) (default is%{%y%m%d}
)
pcap-capture.env
- settings specific to capturing traffic for live traffic analysisPCAP_ENABLE_NETSNIFF
– if set totrue
, Malcolm will capture network traffic on the local network interface(s) indicated inPCAP_IFACE
using netsniff-ngPCAP_ENABLE_TCPDUMP
– if set totrue
, Malcolm will capture network traffic on the local network interface(s) indicated inPCAP_IFACE
using tcpdump; there is no reason to enable bothPCAP_ENABLE_NETSNIFF
andPCAP_ENABLE_TCPDUMP
PCAP_FILTER
– specifies a tcpdump-style filter expression for local packet capture; leave blank to capture all trafficPCAP_IFACE
– used to specify the network interface(s) for local packet capture ifPCAP_ENABLE_NETSNIFF
,PCAP_ENABLE_TCPDUMP
,ZEEK_LIVE_CAPTURE
orSURICATA_LIVE_CAPTURE
are enabled; for multiple interfaces, separate the interface names with a comma (e.g.,'enp0s25'
or'enp10s0,enp11s0'
)PCAP_IFACE_TWEAK
- if set totrue
, Malcolm will [useethtool
]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/shared/bin/nic-capture-setup.sh) to disable NIC hardware offloading features and adjust ring buffer sizes for capture interface(s); this should betrue
if the interface(s) are being used for capture only,false
if they are being used for management/communicationPCAP_ROTATE_MEGABYTES
– used to specify how large a locally captured PCAP file can become (in megabytes) before it is closed for processing and a new PCAP file createdPCAP_ROTATE_MINUTES
– used to specify a time interval (in minutes) after which a locally-captured PCAP file will be closed for processing and a new PCAP file created
process.env
- settings for how the processes running inside Malcolm containers are executedPUID
andPGID
- Docker runs all its containers as the privilegedroot
user by default. For better security, Malcolm immediately drops to non-privileged user accounts for executing internal processes wherever possible. ThePUID
(process user ID) andPGID
(process group ID) environment variables allow Malcolm to map internal non-privileged user accounts to a corresponding user account on the host. Note a few (including thelogstash
andnetbox
containers) may take a few extra minutes during startup ifPUID
andPGID
are set to values other than the default1000
. This is expected and should not affect operation after the initial startup.MALCOLM_PROFILE
- Specifies the profile which determines the Malcolm containers to run (malcolm
to run all containers,hedgehog
to run only capture-related containers)
ssl.env
- TLS-related settings used by many containerssuricata.env
,suricata-live.env
andsuricata-offline.env
- settings for SuricataSURICATA_AUTO_ANALYZE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will automatically be analyzed by Suricata, and the resulting logs will also be imported (defaultfalse
)SURICATA_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Suricata logs (default1
)SURICATA_CUSTOM_RULES_ONLY
– if set totrue
, Malcolm will bypass the default Suricata ruleset and use only user-defined rules (./suricata/rules/*.rules
).SURICATA_UPDATE_RULES
– if set totrue
, Suricata signatures will periodically be updated (defaultfalse
)SURICATA_LIVE_CAPTURE
- if set totrue
, Suricata will monitor live traffic on the local interface(s) defined byPCAP_FILTER
SURICATA_ROTATED_PCAP
- if set totrue
, Suricata can analyze PCAP files captured bynetsniff-ng
ortcpdump
(seePCAP_ENABLE_NETSNIFF
andPCAP_ENABLE_TCPDUMP
, as well asSURICATA_AUTO_ANALYZE_PCAP_FILES
); ifSURICATA_LIVE_CAPTURE
istrue
, this should befalse
; otherwise Suricata will see duplicate trafficSURICATA_DISABLE_ICS_ALL
- if set totrue
, this variable can be used to disable Malcolm's [built-in Suricata rules for Operational Technology/Industrial Control Systems (OT/ICS) vulnerabilities and exploits]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/suricata/rules-default/OT)SURICATA_STATS_ENABLED
,SURICATA_STATS_EVE_ENABLED
, andSURICATA_STATS_INTERVAL
- these variables control the generation of live traffic capture statistics for Suricata, which data is used to populate the Packet Capture Statistics dashboard- See Tuning Suricata for other variables related to managing Suricata's performance and resource utilization.
upload-common.env
- settings for dealing with PCAP files uploaded to Malcolm for analysisAUTO_TAG
– if set totrue
, Malcolm will automatically create Arkime sessions and Zeek logs with tags based on the filename, as described in Tagging (defaulttrue
)EXTRA_TAGS
– a comma-separated list of default tags for data generated by Malcolm (default is an empty string)PCAP_NODE_NAME
- specifies the node name to associate with network traffic metadata
zeek.env
,zeek-secret.env
,zeek-live.env
andzeek-offline.env
- settings for Zeek and for scanning extracted files Zeek observes in network trafficEXTRACTED_FILE_CAPA_VERBOSE
– if set totrue
, all Capa rule hits will be logged; otherwise (false
) only MITRE ATT&CK® technique classifications will be loggedEXTRACTED_FILE_ENABLE_CAPA
– if set totrue
, Zeek-extracted files determined to be PE (portable executable) files will be scanned with CapaEXTRACTED_FILE_ENABLE_CLAMAV
– if set totrue
, Zeek-extracted files will be scanned with ClamAVEXTRACTED_FILE_ENABLE_YARA
– if set totrue
, Zeek-extracted files will be scanned with YaraEXTRACTED_FILE_HTTP_SERVER_ENABLE
– if set totrue
, the directory containing Zeek-extracted files will be served over HTTP at./extracted-files/
(e.g., https://localhost/extracted-files/ if connecting locally)EXTRACTED_FILE_HTTP_SERVER_ZIP
– if totrue
, the Zeek-extracted files will be archived in a ZIP file upon downloadEXTRACTED_FILE_HTTP_SERVER_KEY
– specifies the password for the ZIP archive ifEXTRACTED_FILE_HTTP_SERVER_ZIP
istrue
; otherwise, this specifies the decryption password for encrypted Zeek-extracted files in anopenssl enc
-compatible format (e.g.,openssl enc -aes-256-cbc -d -in example.exe.encrypted -out example.exe
)EXTRACTED_FILE_IGNORE_EXISTING
– if set totrue
, files extant in./zeek-logs/extract_files/
directory will be ignored on startup rather than scannedEXTRACTED_FILE_PRESERVATION
– determines behavior for preservation of Zeek-extracted filesEXTRACTED_FILE_UPDATE_RULES
– if set totrue
, file scanner engines (e.g., ClamAV, Capa, Yara) will periodically update their rule definitions (defaultfalse
)EXTRACTED_FILE_YARA_CUSTOM_ONLY
– if set totrue
, Malcolm will bypass the default Yara rulesets (Neo23x0/signature-base, reversinglabs/reversinglabs-yara-rules, and bartblaze/Yara-rules) and use only user-defined rules in./yara/rules
VTOT_API2_KEY
– used to specify a VirusTotal Public API v.20 key, which, if specified, will be used to submit hashes of Zeek-extracted files to VirusTotalZEEK_AUTO_ANALYZE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will automatically be analyzed by Zeek, and the resulting logs will also be imported (defaultfalse
)ZEEK_AUTO_ANALYZE_PCAP_THREADS
– the number of threads available to Malcolm for analyzing Zeek logs (default1
)ZEEK_JSON
- whether Zeek should generate JSON format logs (true
) or TSV format logs (false
)ZEEK_DISABLE_…
- if set totrue
, each of these variables can be used to disable a certain Zeek function when it analyzes PCAP files (for example, settingZEEK_DISABLE_LOG_PASSWORDS
totrue
to disable logging of cleartext passwords)ZEEK_…_PORTS
- used to specify non-default ports to register certain Zeek analyzers (e.g.,ZEEK_SYNCHROPHASOR_PORTS
for the ICSNPP-Synchrophasor analyzer,ZEEK_GENISYS_PORTS
for the ICSNPP-Genisys analyzer, andZEEK_ENIP_PORTS
for the ICSNPP-Ethernet/IP analyzer) formatted as a comma-separated list of Zeek ports (e.g.,12345/tcp
or4041/tcp,4042/udp
)ZEEK_DISABLE_ICS_ALL
andZEEK_DISABLE_ICS_…
- if set totrue
, these variables can be used to disable Zeek's protocol analyzers for Operational Technology/Industrial Control Systems (OT/ICS) protocolsZEEK_DISABLE_BEST_GUESS_ICS
- see "Best Guess" Fingerprinting for ICS ProtocolsZEEK_EXTRACTOR_MODE
– determines the file extraction behavior for file transfers detected by Zeek; see Automatic file extraction and scanning for more detailsZEEK_INTEL_FEED_SINCE
- when querying a TAXII, MISP, or Mandiant threat intelligence feed, only process threat indicators created or modified since the time represented by this value; it may be either a fixed date/time (01/01/2025
) or relative interval (7 days ago
)ZEEK_INTEL_ITEM_EXPIRATION
- specifies the value for Zeek'sIntel::item_expiration
timeout as used by the Zeek Intelligence Framework (default-1min
, which disables item expiration)ZEEK_INTEL_REFRESH_CRON_EXPRESSION
- Specifies a cron expression (usingcronexpr
-compatible syntax) indicating the refresh interval for generating the Zeek Intelligence Framework files (defaults to empty, which disables automatic refresh)ZEEK_JA4SSH_PACKET_COUNT
- the Zeek JA4+ plugin calculates the JA4SSH value once for every x SSH packets; x is set here (default200
)ZEEK_LIVE_CAPTURE
- if set totrue
, Zeek will monitor live traffic on the local interface(s) defined byPCAP_FILTER
- See Tuning Zeek for other variables related to managing Zeek's performance and resource utilization.
ZEEK_DISABLE_STATS
- ifZEEK_LIVE_CAPTURE
istrue
and this variable is set tofalse
or blank, Malcolm will enable capture statistics Zeek, which data is used to populate the Packet Capture Statistics dashboardZEEK_LOCAL_NETS
- specifies the value for Zeek'sSite::local_nets
variable (andnetworks.cfg
for live capture) (e.g.,1.2.3.0/24,5.6.7.0/24
); note that by default, Zeek considers IANA-registered private address space such as10.0.0.0/8
and192.168.0.0/16
site-localZEEK_ROTATED_PCAP
- if set totrue
, Zeek can analyze captured PCAP files captured bynetsniff-ng
ortcpdump
(seePCAP_ENABLE_NETSNIFF
andPCAP_ENABLE_TCPDUMP
, as well asZEEK_AUTO_ANALYZE_PCAP_FILES
); ifZEEK_LIVE_CAPTURE
istrue
, this should befalse
; otherwise Zeek will see duplicate traffic- See Managing disk usage below for a discussion of the variables control automatic threshold-based deletion of the oldest Zeek-extracted files.
The ./scripts/configure
script can also be run noninteractively which can be useful for scripting Malcolm setup. This behavior can be selected by supplying the -d
or --defaults
option on the command line. Running with the --help
option will list the arguments accepted by the script:
$ ./scripts/configure --help
usage: configure <arguments>
Malcolm install script
options:
-v [true|false], --verbose [true|false]
Verbose output
-d [true|false], --defaults [true|false]
Accept defaults to prompts without user interaction
-c [true|false], --configure [true|false]
Only do configuration (not installation)
…
Note that the value for any argument not specified on the command line will be reset to its default (as if for a new Malcolm installation) regardless of the setting's current value in the corresponding .env
file. In other words, users who want to use the --defaults
option should carefully review all available command-line options and choose all that apply.
Similarly, authentication-related settings can also be set noninteractively by using the command-line arguments for ./scripts/auth_setup
.
In instances where Malcolm is deployed with the intention of running indefinitely, eventually the question arises of what to do when the file systems used for storing Malcolm's artifacts (e.g., PCAP files, raw logs, OpenSearch indices, extracted files, etc.). Malcolm provides options for tuning the "aging out" (deletion) of old artifacts to make room for newer data.
- PCAP deletion is configured by environment variables in
arkime.env
:MANAGE_PCAP_FILES
– if set totrue
, all PCAP files imported into Malcolm will be marked as available for deletion by Arkime if available storage space becomes too low (defaultfalse
)ARKIME_FREESPACEG
- whenMANAGE_PCAP_FILES
istrue
, this value is used by Arkime to determine when to delete the oldest PCAP files. Note that this variable represents the amount of free/unused/available desired on the file system: e.g., a value of5%
means "delete PCAP files if the amount of unused storage on the file system falls below 5%" (default10%
).
- Zeek logs and Suricata logs are temporarily stored on disk as they are parsed, enriched, and indexed, and afterwards are periodically [pruned]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/filebeat/scripts/clean-processed-folder.py) from the file system as they age, based on these variables in
filebeat.env
:LOG_CLEANUP_MINUTES
- specifies the age, in minutes, at which already-processed log files should be deletedZIP_CLEANUP_MINUTES
- specifies the age, in minutes, at which the compressed archives containing already-processed log files should be deleted
- Files extracted by Zeek stored in the
./zeek-logs/extract_files/
directory can be periodically [pruned]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/shared/bin/prune_files.sh) based on the following variables inzeek.env
. If either of the two threshold limits defined here are met, the oldest extracted files will be deleted until the limit is no longer met. Setting either of the threshold limits to0
disables that check.EXTRACTED_FILE_PRUNE_THRESHOLD_MAX_SIZE
- specifies the maximum size, specified either in gigabytes or as a human-readable data size (e.g.,250G
), that the./zeek-logs/extract_files/
directory is allowed to contain before the prune condition triggersEXTRACTED_FILE_PRUNE_THRESHOLD_TOTAL_DISK_USAGE_PERCENT
- specifies a maximum fill percentage for the file system containing the./zeek-logs/extract_files/
; in other words, if the disk is more than this percentage utilized, the prune condition triggersEXTRACTED_FILE_PRUNE_INTERVAL_SECONDS
- the interval between checking the prune conditions, in seconds (default300
)
- Index management policies can be handled via plugins provided as part of the OpenSearch and Elasticsearch platforms, respectively. In addition to those tools, the
OPENSEARCH_INDEX_SIZE_PRUNE_LIMIT
variable indashboards-helper.env
defines a maximum cumulative that OpenSearch indices are allowed to consume before the oldest indices [are deleted]({{ site.github.repository_url }}/blob/{{ site.github.build_revision }}/dashboards/scripts/opensearch_index_size_prune.py), specified as either as a human-readable data size (e.g.,250G
) or as a percentage of the total disk size (e.g.,70%
): e.g., a value of500G
means "delete the oldest OpenSearch indices if the total space consumed by Malcolm's indices exceeds five hundred gigabytes."
Similar settings exist for managing disk usage on Hedgehog Linux.