Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Result Serialization #26

Merged
merged 11 commits into from
Dec 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .clang-format
1 change: 1 addition & 0 deletions .clang-tidy
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,5 @@ duckdb_unittest_tempdir/
testext
test/python/__pycache__/
.Rhistory
__pycache__
venv
25 changes: 11 additions & 14 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,20 @@
set(LOADABLE_EXTENSION_NAME ${TARGET_NAME}_loadable_extension)

project(${TARGET_NAME})
include_directories(
src/include
${CMAKE_CURRENT_BINARY_DIR}
duckdb/third_party/httplib
duckdb/parquet/include
)
include_directories(src/include ${CMAKE_CURRENT_BINARY_DIR}
duckdb/third_party/httplib duckdb/parquet/include)

# Embed ./src/assets/index.html as a C++ header
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/playground.hpp
COMMAND ${CMAKE_COMMAND} -P ${PROJECT_SOURCE_DIR}/embed.cmake ${PROJECT_SOURCE_DIR}/src/assets/index.html ${CMAKE_CURRENT_BINARY_DIR}/playground.hpp playgroundContent
DEPENDS ${PROJECT_SOURCE_DIR}/src/assets/index.html
)
COMMAND
${CMAKE_COMMAND} -P ${PROJECT_SOURCE_DIR}/embed.cmake
${PROJECT_SOURCE_DIR}/src/assets/index.html
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp playgroundContent
DEPENDS ${PROJECT_SOURCE_DIR}/src/assets/index.html)

set(EXTENSION_SOURCES
src/httpserver_extension.cpp
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp
)
set(EXTENSION_SOURCES src/httpserver_extension.cpp src/result_serializer.cpp
${CMAKE_CURRENT_BINARY_DIR}/playground.hpp)

if(MINGW)
set(OPENSSL_USE_STATIC_LIBS TRUE)
Expand All @@ -36,7 +32,8 @@
build_loadable_extension(${TARGET_NAME} " " ${EXTENSION_SOURCES})

include_directories(${OPENSSL_INCLUDE_DIR})
target_link_libraries(${LOADABLE_EXTENSION_NAME} duckdb_mbedtls ${OPENSSL_LIBRARIES})
target_link_libraries(${LOADABLE_EXTENSION_NAME} duckdb_mbedtls
${OPENSSL_LIBRARIES})
target_link_libraries(${EXTENSION_NAME} duckdb_mbedtls ${OPENSSL_LIBRARIES})

if(MINGW)
Expand Down
37 changes: 37 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,43 @@ Check out this flocking macro from fellow _Italo-Amsterdammer_ @carlopi @ DuckDB

<br>

## Development

### Cloning the Repository

Clone the repository and all its submodules

```bash
git clone <your-fork-url>
git submodule update --init --recursive
```

### Setting up CLion
**Opening project:**
Configuring CLion with the extension template requires a little work. Firstly, make sure that the DuckDB submodule is available.
Then make sure to open `./duckdb/CMakeLists.txt` (so not the top level `CMakeLists.txt` file from this repo) as a project in CLion.
Now to fix your project path go to `tools->CMake->Change Project Root`([docs](https://www.jetbrains.com/help/clion/change-project-root-directory.html)) to set the project root to the root dir of this repo.

**Debugging:**
To set up debugging in CLion, there are two simple steps required. Firstly, in `CLion -> Settings / Preferences -> Build, Execution, Deploy -> CMake` you will need to add the desired builds (e.g. Debug, Release, RelDebug, etc). There's different ways to configure this, but the easiest is to leave all empty, except the `build path`, which needs to be set to `../build/{build type}`. Now on a clean repository you will first need to run `make {build type}` to initialize the CMake build directory. After running make, you will be able to (re)build from CLion by using the build target we just created. If you use the CLion editor, you can create a CLion CMake profiles matching the CMake variables that are described in the makefile, and then you don't need to invoke the Makefile.

The second step is to configure the unittest runner as a run/debug configuration. To do this, go to `Run -> Edit Configurations` and click `+ -> Cmake Application`. The target and executable should be `unittest`. This will run all the DuckDB tests. To specify only running the extension specific tests, add `--test-dir ../../.. [sql]` to the `Program Arguments`. Note that it is recommended to use the `unittest` executable for testing/development within CLion. The actual DuckDB CLI currently does not reliably work as a run target in CLion.


### Testing

To run the E2E test install all packages necessary:

```bash
pip install -r requirements.txt
```

Then run the test suite:

```bash
pytest pytest test_http_api
```

##### :black_joker: Disclaimers

[^1]: DuckDB ® is a trademark of DuckDB Foundation. All rights reserved by their respective owners. [^1]
Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
httpx==0.28.1
pytest==8.3.4
127 changes: 17 additions & 110 deletions src/httpserver_extension.cpp
Original file line number Diff line number Diff line change
@@ -1,32 +1,30 @@
#define DUCKDB_EXTENSION_MAIN
#define CPPHTTPLIB_OPENSSL_SUPPORT

#include <chrono>
#include <cstdlib>
#include <thread>
#include "httpserver_extension.hpp"
#include "query_stats.hpp"
#include "duckdb.hpp"
#include "duckdb/common/exception.hpp"
#include "duckdb/common/string_util.hpp"
#include "duckdb/function/scalar_function.hpp"
#include "duckdb/main/extension_util.hpp"
#include "duckdb/common/atomic.hpp"
#include "duckdb/common/exception/http_exception.hpp"
#include "duckdb/common/allocator.hpp"
#include <chrono>
#include <thread>
#include <memory>
#include <cstdlib>

#ifndef _WIN32
#include <syslog.h>
#endif

#define CPPHTTPLIB_OPENSSL_SUPPORT
#include "result_serializer.hpp"
#include "result_serializer_compact_json.hpp"
#include "httplib.hpp"
#include "yyjson.hpp"

#include "playground.hpp"

using namespace duckdb_yyjson; // NOLINT
#ifndef _WIN32
#include <syslog.h>
#endif

namespace duckdb {

using namespace duckdb_yyjson; // NOLINT(*-build-using-namespace)

struct HttpServerState {
std::unique_ptr<duckdb_httplib_openssl::Server> server;
std::unique_ptr<std::thread> server_thread;
Expand All @@ -40,98 +38,6 @@ struct HttpServerState {

static HttpServerState global_state;

std::string GetColumnType(MaterializedQueryResult &result, idx_t column) {
if (result.RowCount() == 0) {
return "String";
}
switch (result.types[column].id()) {
case LogicalTypeId::FLOAT:
return "Float";
case LogicalTypeId::DOUBLE:
return "Double";
case LogicalTypeId::INTEGER:
return "Int32";
case LogicalTypeId::BIGINT:
return "Int64";
case LogicalTypeId::UINTEGER:
return "UInt32";
case LogicalTypeId::UBIGINT:
return "UInt64";
case LogicalTypeId::VARCHAR:
return "String";
case LogicalTypeId::TIME:
return "DateTime";
case LogicalTypeId::DATE:
return "Date";
case LogicalTypeId::TIMESTAMP:
return "DateTime";
case LogicalTypeId::BOOLEAN:
return "Int8";
default:
return "String";
}
return "String";
}

struct ReqStats {
float elapsed_sec;
int64_t read_bytes;
int64_t read_rows;
};

// Convert the query result to JSON format
static std::string ConvertResultToJSON(MaterializedQueryResult &result, ReqStats &req_stats) {
auto doc = yyjson_mut_doc_new(nullptr);
auto root = yyjson_mut_obj(doc);
yyjson_mut_doc_set_root(doc, root);
// Add meta information
auto meta_array = yyjson_mut_arr(doc);
for (idx_t col = 0; col < result.ColumnCount(); ++col) {
auto column_obj = yyjson_mut_obj(doc);
yyjson_mut_obj_add_str(doc, column_obj, "name", result.ColumnName(col).c_str());
yyjson_mut_arr_append(meta_array, column_obj);
std::string tp(GetColumnType(result, col));
yyjson_mut_obj_add_strcpy(doc, column_obj, "type", tp.c_str());
}
yyjson_mut_obj_add_val(doc, root, "meta", meta_array);

// Add data
auto data_array = yyjson_mut_arr(doc);
for (idx_t row = 0; row < result.RowCount(); ++row) {
auto row_array = yyjson_mut_arr(doc);
for (idx_t col = 0; col < result.ColumnCount(); ++col) {
Value value = result.GetValue(col, row);
if (value.IsNull()) {
yyjson_mut_arr_append(row_array, yyjson_mut_null(doc));
} else {
std::string value_str = value.ToString();
yyjson_mut_arr_append(row_array, yyjson_mut_strncpy(doc, value_str.c_str(), value_str.length()));
}
}
yyjson_mut_arr_append(data_array, row_array);
}
yyjson_mut_obj_add_val(doc, root, "data", data_array);

// Add row count
yyjson_mut_obj_add_int(doc, root, "rows", result.RowCount());
//"statistics":{"elapsed":0.00031403,"rows_read":1,"bytes_read":0}}
auto stat_obj = yyjson_mut_obj_add_obj(doc, root, "statistics");
yyjson_mut_obj_add_real(doc, stat_obj, "elapsed", req_stats.elapsed_sec);
yyjson_mut_obj_add_int(doc, stat_obj, "rows_read", req_stats.read_rows);
yyjson_mut_obj_add_int(doc, stat_obj, "bytes_read", req_stats.read_bytes);
// Write to string
auto data = yyjson_mut_write(doc, 0, nullptr);
if (!data) {
yyjson_mut_doc_free(doc);
throw InternalException("Failed to render the result as JSON, yyjson failed");
}

std::string json_output(data);
free(data);
yyjson_mut_doc_free(doc);
return json_output;
}

// New: Base64 decoding function
std::string base64_decode(const std::string &in) {
std::string out;
Expand Down Expand Up @@ -300,7 +206,8 @@ void HandleHttpRequest(const duckdb_httplib_openssl::Request& req, duckdb_httpli
std::string json_output = ConvertResultToNDJSON(*result);
res.set_content(json_output, "application/x-ndjson");
} else if (format == "JSONCompact") {
std::string json_output = ConvertResultToJSON(*result, stats);
ResultSerializerCompactJson serializer;
std::string json_output = serializer.Serialize(*result, stats);
res.set_content(json_output, "application/json");
} else {
// Default to NDJSON for DuckDB's own queries
Expand All @@ -325,9 +232,9 @@ void HttpServerStart(DatabaseInstance& db, string_t host, int32_t port, string_t
global_state.is_running = true;
global_state.auth_token = auth.GetString();

// Custom basepath, defaults to root /
// Custom basepath, defaults to root /
const char* base_path_env = std::getenv("DUCKDB_HTTPSERVER_BASEPATH");
std::string base_path = "/";
std::string base_path = "/";

if (base_path_env && base_path_env[0] == '/' && strlen(base_path_env) > 1) {
base_path = std::string(base_path_env);
Expand Down
1 change: 0 additions & 1 deletion src/include/httpserver_extension.hpp
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
#pragma once

#include "duckdb.hpp"
#include "duckdb/common/file_system.hpp"

namespace duckdb {

Expand Down
12 changes: 12 additions & 0 deletions src/include/query_stats.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
#pragma once
#include <cstdint>

namespace duckdb {

struct ReqStats {
float elapsed_sec;
uint64_t read_bytes;
Copy link
Collaborator

@NiclasHaderer NiclasHaderer Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we perhaps want to remove read_bytes and read_rows until they are used @lmangani ? Otherwise people will just be confused

Copy link
Collaborator

@lmangani lmangani Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI read_bytes and read_rows are only used by the UI to display the results and emulate Clickhouse query stats

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But aren't hey always 0?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They shouldn't be when using the JSONCompact format but I'll re-check

uint64_t read_rows;
};

} // namespace duckdb
46 changes: 46 additions & 0 deletions src/include/result_serializer.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#pragma once

#include "duckdb/main/query_result.hpp"
#include "yyjson.hpp"

namespace duckdb {
using namespace duckdb_yyjson; // NOLINT(*-build-using-namespace)

class ResultSerializer {
public:
explicit ResultSerializer(const bool _set_invalid_values_to_null = false)
: set_invalid_values_to_null(_set_invalid_values_to_null) {
doc = yyjson_mut_doc_new(nullptr);
}

virtual ~ResultSerializer() {
yyjson_mut_doc_free(doc);
}

std::string YY_ToString() {
auto data = yyjson_mut_write(doc, 0, nullptr);
if (!data) {
throw SerializationException("Could not render yyjson document");
}
std::string json_output(data);
free(data);
return json_output;
}

protected:
void SerializeInternal(QueryResult &query_result, yyjson_mut_val *append_root, bool values_as_array);

void SerializeChunk(const DataChunk &chunk, vector<string> &names, vector<LogicalType> &types,
yyjson_mut_val *append_root, bool values_as_array);

yyjson_mut_val *SerializeRowAsArray(const DataChunk &chunk, idx_t row_idx, vector<LogicalType> &types);

yyjson_mut_val *SerializeRowAsObject(const DataChunk &chunk, idx_t row_idx, vector<string> &names,
vector<LogicalType> &types);

void SerializeValue(yyjson_mut_val *parent, const Value &value, optional_ptr<string> name, const LogicalType &type);

yyjson_mut_doc *doc;
bool set_invalid_values_to_null;
};
} // namespace duckdb
Loading
Loading