Skip to content

Latest commit

 

History

History
376 lines (274 loc) · 9.45 KB

README.md

File metadata and controls

376 lines (274 loc) · 9.45 KB

HorseIR

  1. Getting started
  2. Prerequisites
  3. Database configuration
  4. Experiments
  5. Results
  6. Contact

1. Getting started

The title of our paper submitted to DLS18 is

HorseIR: Bringing Array Programming Languages together with Database Query Processing

This repository is created for showing the reproducibility of our experiments in this paper. We provide the details of scripts and original data used in the experiments. There are mainly two systems: HorseIR and RDBMS MonetDB. We supply step-by-step instructions to configure and deploy both systems in the experiments.

In this repository, you will find:

  • the software you need to install (Sec. 2);
  • how to configure databases (Sec. 3);
  • how to run experiments (Sec. 4);
  • the results used in the paper (Sec. 5);
  • the contacts if you have any further questions (Sec. 6).

The logo was based on an anonymous horse and Laurie's dragon.

Back to top

2. Prerequisites

Let DIR_EXP be the directory where the experiments take place. In our local tests, we used

DIR_EXP=/mnt/local

Please install the following systems before you go to next sections.

- TPC-H: version v2.17.0
- GCC-7: version v7.2.0
- MonetDB: version Jul2017-SP2
- HorseIR

TPC-H

    # TPC-H (found in ./exp/query/)
    unzip tpc-h-tool.zip
    cd tpch_2_17_0/dbgen
    make
    # Generate data with scale factor 1.
    ./dbgen -s 1

In our local tests, we selected 8 queries which were saved in

<DIR_EXP>/query/chf/

Moreover, 5 databases with 5 different scale factors (SF 1/2/4/8/16) were created as follows.

    # SF 1
    mkdir <DIR_EXP>/tpch-tbl/db1
    ./dbgen -s 1
    mv *.tbl <DIR_EXP>/tpch-tbl/db1
    # ... repeat for SF 2, 4, 8, 16

Then, after you check the directory tpch-tbl, you can see 5 folders. In each folder, it contains different sizes of data.

    ls <DIR_EXP>/tpch-tbl
db1  db16  db2  db4  db8

Reference: TPC - Current specifications

GCC-7

Note: about 5 hours spent on compiling GCC from source code.

    # Download GCC-7
    wget https://ftp.gnu.org/gnu/gcc/gcc-7.2.0/gcc-7.2.0.tar.gz
    tar -xf gcc-7.2.0.tar.gz
    cd gcc-7.2.0
    ./contrib/download_prerequisites
    ./configure --prefix=<DIR_EXP>/gcc
    make
    make install
    # set alias
    ln -s <DIR_EXP>/gcc/bin/gcc <DIR_BIN>/gcc-7

The directory DIR_BIN is the directory which has been included in PATH.

MonetDB

    # Download MonetDB 2017-SP2
    wget https://www.monetdb.org/downloads/sources/Jul2017-SP2/MonetDB-11.27.9.tar.bz2
    tar -xf MonetDB-11.27.9.tar.bz2
    cd MonetDB-11.27.9
    # configure with performance flags enabled
    ./configure --prefix=<DIR_EXP>/monetdb17 --enable-debug=no --enable-assert=no --enable-optimize=yes
    make
    make install
    # set alias
    ln -s <DIR_EXP>/monetdb17/bin/mclient  <DIR_BIN>/mclient
    ln -s <DIR_EXP>/monetdb17/bin/monetdb  <DIR_BIN>/monetdb
    ln -s <DIR_EXP>/monetdb17/bin/monetdbd <DIR_BIN>/monetdbd
    ln -s <DIR_EXP>/monetdb17/bin/mserver5 <DIR_BIN>/mserver5

The directory DIR_BIN is the directory which has been included in PATH.

Create a configuration file ~/.monetdb with the following content:

user=monetdb
password=monetdb
language=sql

Then, type mclient to login MonetDB. Please check if you see the version number Jul2017-SP2 on the top of the welcome message. Moreover, try the following command to see if the database has been installed successfully.

sql>SELECT 'Hello world';
+-------------+
| L2          |
+=============+
| Hello world |
+-------------+
1 tuple (1.328ms)

Reference: How to install MonetDB and the instroduction of server and client programs.

HorseIR

Download HorseIR project from GitHub

    git clone [email protected]:Sable/HorsePower.git
    cd HorsePower
    git checkout 34ca12ac4e725e8e1c8e9d5061904c752130bd76

Download and install libraries (about 7 minutes)

    (cd libs && sh deploy_linux.sh)

Setup experiments

    # build HorseIR system
    (cd exp-dls18 && ./setup)

    # setup and generate HorseIR programs from execution plans
    (cd exp-dls18 && ./setup_translator)

Troubleshooting

  • Remove an empty import statement in genIR.py
    • Line from analysis import *

External links

  • A LL-based grammar for the HorseIR translator: click here

Back to top

3. Database Configuration

3.1 MonetDB

We need to create

  • 1 data farm, TPCH; and
  • 5 TPC-H related databases, tpch1/2/4/8/16

Note: In each TPC-H related database, there are 8 tables: part, supplier, partsupp, lineitem, customer, nation, region, and orders.

Create a datafarm

    monetdbd create TPCH
    monetdbd start TPCH

Create and release database 'tpch1' (similar for database tpch2/4/8/16)

    monetdb create tpch1
    monetdb release tpch1
    monetdb start tpch1

Login a database

    mclient -d tpch1

Load data to TPCH (inside mclient)

    # create empty tables
    \< <DIR_EXP>/tpch_2_17_0/dbgen/dss.ddl
    # import data (should replace /mnt/local with <DIR_EXP> in initTPCH1.txt
    \< <DIR_EXP>/script-tpch/initTPCH1.txt

3.2 HorseIR

Remember to set soft aliases for your data created in TPC-H, so that data copies can be avoided.

    cd src/HorseIR/data
    ln -s <DIR_EXP>/tpch-tbl tpch

Back to top

4. Experiments

The overview of the machine used in our local test is

  • Server named, Sable-Intel
  • 4 Intel Xeon E7-4850 2.00 GHz
  • Total 40 cores with 80 threads
  • 128GB RAM
  • Ubuntu 16.04.02 LTS

4.1 Run MonetDB with TPC-H queries

Note: by default, MonetDB uses all 80 threads with mclient. Therefore, we need to use mserver5 instead to set the number of threads to, for example, 40.

# SF 1
mserver5 --dbpath=<DIR_EXP>/datafarm/TPCHDB/tpch1 --set monet_vault_key=<DIR_EXP>/datafarm/TPCHDB/tpch1/.vaultkey --set gdk_nr_threads=40

Then, a new terminal should be opened and run MonetDB with the following command:

# SF 1
(cd <DIR_EXP>/script-tpch && time ./runtest | mclient -d tpch1)  &> summary1.log

Finally, exit mserver5 by typing \q or CTRL+D. Repeat the process for SF 2, 4, 8, and 16.

In each log file (e.g. <DIR_EXP>/script-tpch/sf1/\*.log), search for the keyword avg_query to identify the average of the last 10 runs. The details of the 10 runs can be found just above it with a single-column table (i.e. query_time).

4.2 Run HorseIR with TPC-H queries

Run HorseIR with 8 queries and 15 times on SF 1/2/4/8/16, and report the average execution time (ms)

    # base dir is 'HorsePower'
    (cd exp-dls18 && time ./run_all.sh)

The script run_all.sh sets different scales and invokes the script test-thread.sh for each scale. The script test-thread.sh computes each query 15 times and returns the average. As a result, all output information is saved into a log file, for example, sf1/log_thread_1.log contains the information of all queries for SF 1.

Fetch brief information from a log file

    cat sf1/log_thread_1.log | grep -E 'Run with 15 times|HorseIR Optimization Level'

All log files can be found in the folder exp/script-tpch

5. Results

Note: You can click a figure on the left side to see its original figure.

Figures

Table overview Description
An example HorseIR module
See details in data and scripts for Figure 7.
See details in data and scripts for Figure 8.

Tables

Figures Description
See details in the selected 8 queries.
See details in MonetDB-thread-1-log, MonetDB-thread-2-log, MonetDB-thread-4-log, MonetDB-thread-8-log, and MonetDB-thread-16-log,

Back to top

6. Contact

If you have any questions regarding MonetDB or HorseIR, such as database configuration problems, please contact Hanfeng Chen ([email protected]).