Skip to content

Latest commit

 

History

History
69 lines (49 loc) · 2.04 KB

README.md

File metadata and controls

69 lines (49 loc) · 2.04 KB

This is the official repository of our VLDB 2024 paper: Optimal Matrix Sketching over Sliding Windows

  • 2024/08/29: This paper was nominated for the Best Research Paper Award at VLDB 2024! Thanks to editors and reviewers for their recognition and support.
  • 2024/11/15: We added a simple max-heap optimization trick, which can significantly improve the performance furthermore without any extra space overhead and precision loss. Refer to dump_fd.py for more details.

Install dependencies

pip install -r requirements.txt

Prepare datasets

mkdir dataset

# Generate SYNTHETIC data
python generate_synthetic.py    

# Download BIBD data
wget https://www.cise.ufl.edu/research/sparse/mat/JGD_BIBD/bibd_22_8.mat --directory-prefix ./dataset 

# Download PAMAP2 data
wget https://archive.ics.uci.edu/static/public/231/pamap2+physical+activity+monitoring.zip -o dataset/PAMAP2_Dataset.zip
unzip dataset/PAMAP2_Dataset.zip -d dataset

# Download RAIL data
wget https://www.cise.ufl.edu/research/sparse/mat/Mittelmann/rail2586.mat --directory-prefix ./dataset 

# Download YEAR data
wget https://archive.ics.uci.edu/static/public/203/yearpredictionmsd.zip --directory-prefix ./dataset 
unzip dataset/yearpredictionmsd.zip -d dataset

Run experimental scripts

The filenames of experimental scripts are prefixed with test_*.py.

Installation & Usage

Under your project folder, pull this git repository:

git clone https://github.com/yinhanyan/DS-FD.git swfd --depth 1

Then you can import it in your project folder:

from swfd import SeqBasedSWFD
from swfd import RobustFrequentDirections
from swfd import FrequentDirectionsWithDump

import numpy as np

if __name__ == "__main__":
    l = 16
    N = 64
    d = 32
    R = 2**16 - 1
    epochs = 2048
    sw_fd = SeqBasedSWFD(N, R, d, l)

    random_vector = np.random.randn(1, d)
    a = np.sqrt(R) * random_vector / np.linalg.norm(random_vector, ord=2)
    sw_fd.fit(a)
    B_t, _, _, delta = sw_fd.get()
    print(B_t)