GitHub - eehsan/libPLUMP: Library for the Sequence Memoizer and related models

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
bindings		bindings
cmake		cmake
examples		examples
src		src
AUTHORS		AUTHORS
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README		README

Repository files navigation

libPLUMP -- Library for implementing the Sequence Memoizer and related models
===============================================================================
Jan Gasthaus <[email protected]>


0. Getting started
===============================================================================


0.1 Prerequisites
-------------------------------------------------------------------------------
To get started with the code you will need a few additional things:

  - CMake

  - Boost C++ libraries (recent version, 1.37 and 1.44 are known to work, but
    1.33 is known to not work due to some changes to Boost.Serialization)
    www.boost.org

  - GNU Scientific Library (tested with 1.12 and 1.14)
    www.gnu.org/software/gsl

  - SWIG

  - Python 2.5 or higher (but not 3.x)
    www.python.org


0.2 Compiling
-------------------------------------------------------------------------------
If you have the dependencies listed above installed in default locations then
simply doing

  # mkdir build && cd build
  # cmake ..
  # make

should configure libPLUMP (including the Python binding) and build the
library. 



0.3 Check whether the build was successful
-------------------------------------------------------------------------------
LibPLUMP comes with an example program that allows you to compute perlexity on
some test file. To see how well the SM model can e.g. predict this file, do
  
  # src/score_file README

This will incrementally build the context tree for this file and at the same
time incrementally compute the probability of the next symbol given the 
previous ones. The result should be around 3 bits/symbol. You can run 

  # src/score_file --help

to find out about various options that change the behaviour of the model.
More interestingly, have a look at src/utils/score_file.cc as it shows how
to use most of the high-level interface of the libPLUMP.


0.4 Testing the Python wrapper
-------------------------------------------------------------------------------
To see if the Python wrapper is working, try the following:

  # cd examples
  # source init.sh
  # python python_examples.py

where init.sh sets up the environment so the libPLUMP and the Python bindings
can be found, and python_example.py is a simple example calling libPLUMP.

There are some further examples in the examples/ directory that can you 
look at to find out how to use the Python interface. 

The Python bindings are generated by SWIG, so it should be relatively easy
to generate bindings for other languages that are supported by SWIG (e.g. 
Ruby, Octave, R, Java, Lua). 


0.5 Where to go from here
-------------------------------------------------------------------------------
Currently, the only documentation is in the form of source code comments. So, 
have a look at the source if you want to know more about the internal workings
of the library. 

If you care about performance you may want to enable optimizations and
disable all assertions in the code by using

  # export CXXFLAGS="-O3 -march=native -DNDEBUG"

and then rerunning cmake/make, which should yield significant speedups.


If you have problems, find bugs or want to contribute, let me know at <[email protected]>