The simulator is written in rust and replays an input pcap trace feeding the monitoring pipeline.
src/simulation_fse_lpc.rs:
Source file for DUMBO system simulation (flow size estimation use case)src/simulation_iat_lpc.rs:
Source file for DUMBO system simulation (IAT use case)src/simulation_baseline.rs:
Source file for Baseline CMS simulationsrc/packet:
defines Packet and PacketReceiver interface are definedsrc/parser:
PCAP trace parsersrc/flow_manager:
flow manager implementation for aggregated feature retrievalsrc/model_wrapper:
object wrapping various model implementation for inference (ONNX, oracle, simulated metrics, etc.)src/hash_table:
implements the Elephant Trackersrc/control_plane:
dummy control plane that registers keys inserted in the CMSsrc/cms:
Count-Min Sketch for Mice Trackingsrc/ddsketch:
DDSketch used both for Mice and Elephant Trackingsrc/bloom_filter:
a BF implementation, optionally used for mice inference cache
Details on the simulation architecture can be find in README_DEV.md.
If you run an old version of Ubuntu, you might get some errors while building the blake3 library. Install a newer version of binutils:
$ git clone https://github.com/bminor/binutils-gdb
$ cd binutils-gdb/
$ sudo apt install libmpc-dev libmpfr-dev
$ ./configure
$ make
$ sudo make install
NOTE: Actually, just updating
ar
is sufficient:sudo cp binutils/ar /usr/local/bin/ar
(in place ofsudo make install
).
TODO: Simulator currently requires nightly toolchain. Switch back to stable once the rust stable release includes pull request 118133 (it will be included in release 1.76.0).
NB: Any Conda environment should be deactivated first (using
conda deactivate
).
Build the repository with command cargo build -r
. Then in order to launch a simulation:
./target/release/simulation_fse_lpc [OPTIONS] <MODEL_TYPE> <PCAP_FILE> <MODEL_FILE> <FEATURES_FILE> <OUTPUT_DIR> <OUTPUT_FILE_NAME> <HT_ROWS> <HT_SLOTS> <HT_COUNT> <CMS_ROWS> <CMS_COLS> <FLOW_MANAGER_ROWS> <FLOW_MANAGER_SLOTS> <FLOW_MANAGER_PACKET_LIMIT> <BF_SIZE> <BF_HASH_COUNT> <MODEL_THRESHOLD> <MODEL_MEMORY> <MODEL_AP> <MODEL_EMR> <MODEL_MMR>
- MODEL_TYPE: five-tuple-aggr | five-tuple | oracle | random | synth-ap | synth-rates
- PCAP_FILE: network trace file in pcap format
- MODEL_FILE: the file storing the model data (.onnx if running a pretrained model, ground truth if running an oracle or a synth model)
- FEATURES_FILE: pickle file specifying the features actually used by the model (only for .onnx models)
- OUTPUT_DIR: output folder
- OUTPUT_FILE_NAME: all output files will feature this name to distinguish them from other runs
- HT_ROWS: number of rows for the Elephant Tracker
- HT_SLOTS: number of entries per row in the Elephant Tracker (4 -> 95% load factor)
- CMS_ROWS: number of rows for the CMS
- CMS_COLS: number of cols for the CMS
- FLOW_MANAGER: number of rows for the Flow Manager
- FLOW_MANAGER_SLOTS: number of entries per row in the Flow Manager (implemented as a hash table)
- FLOW_MANAGER_PACKET_LIMIT: firs k packets collected for each flow to extract features from
- BF_SIZE: number of buckets for the Bloom Filter
- BF_HASH_COUNT: number of hash functions used in the Bloom Filter
- MODEL_THRESHOLD: threshold above which a flow is considered as an Elephant (only used when simulating synth models, please provide 0.5 for .onnx models as the threshold is now hardcoded)
- MODEL_AP: the ap-score to simulate (only used if MODEL_TYPE=synth-ap)
- MODEL_EMR: the Elephants misprediction rate (FNR) to simulate (only used if MODEL_TYPE=synth-rates)
- MODEL_MMR: the Mice misprediction rate (FPR) to simulate (only used if MODEL_TYPE=synth-rates)
- [OPTIONAL] --tcp-only: ignore all non-tcp traffic
The simulator generates a series of sub-folders in the output directory:
- gt: ground true sizes for all the flows in the trace
- cms: size estimate of all the flows sent to the control plane
- ht: dump of Elephant Tracker content
- pc: dump of flows found in Flow Manager at the end of the simulation
- pc_evicted: flows evicted from the Flow Manager Inside each of these directory, the simulator outputs a file named <OUTPUT_FILE_NAME>.csv (the same folder contains results for all past simulations run with different OUTPUT_FILE_NAME parameter).
All files are .csv where row starts with comma-separated flow ID (5 tuple) and end with flow size. To compare true flow size with estimate, one should import data in a database managing framework, and then it is a matter of join :)
You can rely on some python wrappers to run the simulation with certain configurations based on the use case, and also ensure that all the needed files are correctly preprocessed. Python scripts are located under the python folder. Please refer to the following workflow to run them.
NB: Only needed to run synthetic model simulations (i.e., simulating a model with a certain confusion matrix or AP score).
Before running a synth model simulation (i.e., a simulation where we run a fake model simulating a certain AP score or confusion matrix), you need to pre-process the trace to extract round truth and other parameters:
$ ./scripts/preprocessing/init_trace_dir.sh <DATA_DIR> <TRACE_SET>
$ ./scripts/preprocessing/preprocess_synth.sh <TRACE_SET> <tcp|tcp_udp>
Output files are generated under .trace/.
Wrapper python scripts configure the memory associated to each component and then run the simulation accordingly:
$ python ./simulations/run_simulation_fse.py --type <MODEL_TYPE> --pcap <PCAP_FILE> --hh_perc <HH_PERCENTAGE> --proba_threshold <THRESHOLD> --ap <AP_SCORE> --fnr <FNR> --fpr <FPR> --ms <MODEL_SIZE>
MODEL_TYPE:
onnx-pre-bins | five-tuple | oracle | random | synth-ap | synth-ratesPCAP_FILE:
network trace file in pcap formatHH_PERCENTAGE:
percentage of flows we want to send to the Elephant TrackerTHRESHOLD:
classification threshold, only use when MODEL_TYPE=synth-apAP_SCORE:
desired ap-score, only use when MODEL_TYPE=synth-apFNR:
desired false negative rate, only use when MODEL_TYPE=synth-ratesFPR:
desired false positive rate, only use when MODEL_TYPE=synth-ratesMODEL_SIZE:
model size in KB- [OPTIONAL]
--tcp_only:
ignore non-tcp traffic
The simulation generates output files under the output/<TRACE>/<PROTOCOL>/top_<HH_PERCENTAGE>_pct/fse/<STAT_FOLDER>
folders. particular:
fm:
flows left in the Flow Manager at the end of the simulationfm_evicted:
flows evicted from the flow managergt:
flow size ground truth for all flowscms:
flows added to the CMS (Mice Tracker) and their size estimationht:
flows added to the Elephant Tracker and their sizesimulation_configuration:
a dump of the simulation parameters (e.g., pcap file, model file, component sizes, etc.)
In every <STAT_FOLDER>
, the output files will be named identically based on the <MODEL_TYPE>
(e.g., coda.csv
for type onnx-pre-bins
, which is our solution).
Same applies for the IAT use case, using ./simulations/run_simulation_iat.py
instead (ddsketch stats are given in place of cms
and ht
).
After running the simulation:
$ python ./simulations/error_fse.py --trace=<PCAP_FILE> --perc=<HH_PERCENTAGE> --memory=<FSE_MEMORY> --packet_limit=<FLOW_MANAGER_PACKET_LIMIT> --model=<OUTPUT_FILE_NAME>
- PCAP_FILE: network trace file in pcap format
- HH_PERCENTAGE: percentage of flows we send to the Elephant Tracker
- FSE_MEMORY: base memory dedicated to the use case structures, currently hardcoded to 1.0MB
- FLOW_MANAGER_PACKET_LIMIT: matches the one passed to the main executable, currently always 5 if run through the python wrapper
- OUTPUT_FILE_NAME: this has to match the model suffix used to name output files concerning the simulation (see above)
- [OPTIONAL] --tcp_only: ignore non-tcp traffic
This generates a file with the same name as the previous outputs (OUTPUT_FILE_NAME
) under the error
folder. For
the FSE use case, it computes the Average Weighted Absolute Estimation error (AWAE), while for IAT
(python/error_fse.py), the script computes the Mean Relative Error on the 50th, 75th, 90th, 95th
and 99th quantiles.