Skip to content

Implementation of Nearest Neighbor Clustering algorithm.

License

Notifications You must be signed in to change notification settings

luisincrespo/nn-clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nearest Neighbor Clustering

Requirements

  • Python 2.7.x

Usage

$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold>
  • The values given to --datafile and --datainfofile can be either absolute or relative paths.
  • The value of --threshold should be between 0..1.

The resulting clusters will be printed to stdout by default, but you can specify a path to a file to write results with the --outputfile option:

$ python main.py --datafile <path_to_data> --datainfofile <path_to_data_info> --threshold <threshold> --outputfile <path_to_outputfile>

Expected data format

As stated above, the script expects two files:

  • --datafile - the data set, which must be a CSV with each line of the form:

    <row_identifier>,<attr1_value>,<attr2_value>,...

    Missing attribute values must be represented with the value ?.

  • --datainfofile - metadata that describes the data set, which must be a CSV with one line describing each attribute available in the data set:

    <attr_name>,<attr_type>,<attr_possible_values>,...

    The supported values for <attr_type> are nominal, ordinal, binary_symmetric, binary_asymmetric, and numeric. The value for <attr_possible_values> may be omitted.

Example data can be found in the example_data folder, retrieved from http://archive.ics.uci.edu/ml/datasets/Sponge.

About

Implementation of Nearest Neighbor Clustering algorithm.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages