Skip to content

Latest commit

 

History

History
68 lines (43 loc) · 2.51 KB

README.md

File metadata and controls

68 lines (43 loc) · 2.51 KB

data-profiler Build Status goreport Coverage Status Docker Automated build

data-profiler is a Go project used to transform a set of datasets, based on a set of characteristics (distribution similarity, correlation, etc.), in order to model the behavior of an operator, applied on top of them using Machine Learning techniques.

Screenshots

Similarity Matrix

Dataset Space

SVM Modeling

SVM Residuals Distribution

Installation

You have two ways of installing data-profiler:

  1. Through Go:
# GOPATH must be set
~> go get github.com/giagiannis/data-profiler
  1. Using Docker:
~> docker pull ggian/data-profiler

Usage

data-profiler can be used both through a CLI and a Web interface.

  1. CLI

You can access the CLI client through the data-profiler-utils binary.

~> $GOPATH/bin/data-profiler-utils

This previous command will give an overview of the available actions.

Note: use this client only if you know how data-profiler works.

  1. Web UI

First run the Docker container, providing a directory with the dataset files.

~> docker run -v /src/datasets:/datasets -p 8080:8080 -d ggian/data-profiler

This command mounts the host's /src/datasets directory to the container and forwards the host's 8080 port to the container. After the successful start of the container, go to http://dockerhost:8080 and insert the first set of datasets for analysis.

License

Apache License v2.0 (see LICENSE file for more)

Contact

Giannis Giannakopoulos [email protected]