Skip to content

Commit

Permalink
Fleshed out the README.
Browse files Browse the repository at this point in the history
  • Loading branch information
LTLA committed Apr 4, 2024
1 parent 5acd161 commit 7934756
Showing 1 changed file with 103 additions and 2 deletions.
105 changes: 103 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,107 @@

[![Project generated with PyScaffold](https://img.shields.io/badge/-PyScaffold-005CA0?logo=pyscaffold)](https://pyscaffold.org/)

# SewerRat
# Python interface to the SewerRat API

Python interface to the [SewerRat](https://github.com/ArtifactDB/SewerRat) API.
## Overview

The **sewerrat** package implements an Python client for the [API of the same name](https://github.com/ArtifactDB/SewerRat).
This allows users to easily query the SewerRat search index, register or deregister their own directories in the index, and quickly save and load Bioconductor objects for registration.
It is assumed that the users of the **sewerrat** client and the SewerRat API itself are accessing the same shared filesystem;
this is typically the case for high-performance computing (HPC) clusters in scientific institutions.

## Registering directories

Let's mock up a directory of metadata files:

```python
import tempfile
import os

mydir = tempfile.mkdtemp()
with open(os.path.join(mydir, "metadata.json"), "w") as handle:
handle.write('{ "first": "foo", "last": "bar" }')

os.mkdir(os.path.join(mydir, "diet"))
with open(os.path.join(mydir, "diet", "metadata.json"), "w") as handle:
handle.write('{ "fish": "barramundi" }')
```

We can then easily register it:

```python
import sewerrat

# Only indexing metadata files named 'metadata.json'.
sewerrat.register(mydir, names=["metadata.json"])
```

Similarly, we can deregister this directory with `deregister(mydir)`.

## Searching the index

Use the `query()` function to perform free-text searches:

```python
sewerrat.query("foo")
sewerrat.query("bar%") # partial match to 'bar...'
sewerrat.query("bar% AND foo") # boolean operations
sewerrat.query("fish:bar%") # match in the 'fish' field
```

We can also search on the user, path components, and time of creation:

```python
sewerrat.query(user="LTLA") # created by myself
sewerrat.query(path="diet/") # path has 'diet/' in it

import time
sewerrat.query(after=time.time() - 3600) # created less than 1 hour ago
```

## Saving Bioconductor objects for registration

We provide some convenience methods to quickly save Bioconductor objects and associated metadata for quick registration.
For example, if we have the following objects:

```python
import biocframe
df1 = biocframe.BiocFrame({ "X": [1,2,3,4,5] })
df2 = biocframe.BiocFrame({ "Y": ["x", "y", "z"] })
df3 = biocframe.BiocFrame({ "Z": [ 1.2, 3.4, 5.6, 7.8, 9.0 ] })
```

We use the `quickSave()` function to deposit them into a directory.
(This uses the `r Biocpkg("alabaster.base")` package under the hood to create the on-disk representations.)

```python
biocdir = tempfile.mkdtemp()
sewerrat.quick_save(df1, { "description": "This has integers" }, os.path.join(biocdir, "int"))
sewerrat.quick_save(df2, { "description": "This has characters" }, os.path.join(biocdir, "char"))
sewerrat.quick_save(df3, { "description": "This has reals" }, os.path.join(biocdir, "real"))
```

Then we can just register this directory with our SewerRat API.

```python
sewerrat.register(biocdir)
```

We can now query for these objects:

```python
res = sewerrat.query("integers")
```

And once we find something we like, we can load it back in quickly:

```python
x, meta = sewerrat.quick_read(res[0]["path"])
```

## Administrator instructions

The URL to the SewerRat REST API depends on the instance and needs to be specified correctly before **sewerrat** functions can be used.
Administrators of a Python installation can achieve this by setting the `SEWERRAT_REST_URL` environment variable before **sewerrat** package load.
Developers of packages that call **sewerrat** can either pass in a URL to the `url=` argument in various **sewerrat** functions,
or they can globally set the URL via the `rest_url()` function.

0 comments on commit 7934756

Please sign in to comment.