Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gh-543: Improve Cypher docs #544

Merged
merged 5 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions docs/administration-guide/gaffer-deployment/gremlin.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ outline how to obtain this reference using the REST API.

As mentioned previously the recommended way to use Gremlin queries is via the
Websocket in the Gaffer REST API. To do this you will need to provide a config
file that sets up the Gaffer Tinkerpop library (a.k.a 'GafferPop'). The file can
file that sets up the Gaffer TinkerPop library (a.k.a 'GafferPop'). The file can
either be added to `/gaffer/gafferpop.properties` in the container, or at a
custom path by setting the `gaffer.gafferpop.properties` key in the
`store.properties` file. This file can be blank but it is still recommended to
Expand Down Expand Up @@ -63,11 +63,11 @@ The `gafferpop.properties`, file is the configuration for GafferPop. If using
the REST API there is no mandatory properties you need to set since you already
will have configured the Graph in the existing `store.properties` file. However,
adding some default values in for operation modifiers, such as a limit for
`GetAllElement` operations, is good practice.
`GetElements` and `GetAllElements` operations, is good practice.

```properties
# Default operation config
gaffer.elements.getalllimit=5000
gaffer.elements.getlimit=20000
gaffer.elements.hasstepfilterstage=PRE_AGGREGATION
```

Expand All @@ -79,13 +79,14 @@ A full breakdown of the available properties is as follows:

| Property Key | Description | Used in REST API |
| --- | --- | --- |
| `gremlin.graph` | The Tinkerpop graph class we should use for construction. | No |
| `gremlin.graph` | The TinkerPop graph class we should use for construction. | No |
| `gaffer.graphId` | The graph ID of the Tinkerpop graph. | No |
| `gaffer.storeproperties` | The path to the store properties file. | No |
| `gaffer.schemas` | The path to the directory containing the graph schema files. | No |
| `gaffer.userId` | The default user ID for the Tinkerpop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
| `gaffer.userId` | The default user ID for the TinkerPop graph. | No (User is always set via the [`UserFactory`](../security/user-control.md).) |
| `gaffer.dataAuths` | The default data auths for the user to specify what operations can be performed | No |
| `gaffer.rest.timeout` | The timeout for gremlin queries submitted to the REST API in ms. Default is 2 mins if not specified. | Yes |
| `gaffer.operation.options` | Default `Operation` options in the form `key:value` (this can be overridden per query see [here](../../user-guide/query/gremlin/custom-features.md)) | Yes |
| `gaffer.elements.getalllimit` | The default limit for unseeded queries e.g. `g.V()`. | Yes |
| `gaffer.elements.getlimit` | The default limit applied to get element operations called by TinkerPop e.g. `GetElements` or `GetAllElements`. | Yes |
| `gaffer.elements.hasstepfilterstage` | The default stage to apply any `has()` steps e.g. `PRE_AGGREGATION` | Yes |
| `gaffer.includeOrphanedVertices` | Should orphaned vertices be returned by default in a result, these are vertices on an edge that have no associated entity in the Gaffer graph. Queries will likely be slower if enabled. | Yes |
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
5 changes: 4 additions & 1 deletion docs/reference/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ hide:
| Node | A node is what Gaffer calls an entity |
| Properties | A property is a key/value pair that stores data on both edges and entities |
| Element | The word is used to describe edges or entities |
| Stores | A Gaffer store represents the backing database responsbile for storing or facilitating access to a graph |
| Stores | A Gaffer store represents the backing database responsible for storing or facilitating access to a graph |
| Operations | An operation is an instruction / function that you send to the API to manipulate and query a graph |
| View | Used in Gaffer like a filter it lets you view the data differently in a query, often used to filter the data you get back from a given operation |
| Matched vertex | `matchedVertex` is a field added to Edges which are returned by Gaffer queries, stating whether your seeds matched the source or destination |
| Python | A programming language that is used to build applications. Gaffer uses Python to interact with the API |
| Java | A object oriented programming language used to build software. Gaffer is primarily built in Java |
| Database | A database is a collection of organised structured information or data typically stored in a computer system |
| API | Application Programming Interface. An API is for one or more services / systems to communicate with each other |
| JSON | JavaScript Object Notation is a text based format for representing structure data based on JavaScript object syntax |
| GafferPop | The library used to translate Gremlin queries to Gaffer operations using the TinkerPop framework |
| Orphaned Vertices | Vertices on an edge without any associated entity in the Graph |
26 changes: 22 additions & 4 deletions docs/reference/gremlin-guide/gaffer-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,14 @@ Note that any options should be passed as a list or dictionary.
g.with_("operationOptions", {"gaffer.federatedstore.operation.graphIds": "graphA"}).V().to_list()
```

## GetElements Limit
## Get Elements Limit

Key `getElementsLimit`

Limits the amount of elements returned if performing a query which returns a large amount of elements e.g. a
`GetAllElements` operation. This will override the default for the current
query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
Limits the amount of elements that can be returned for each `GetElements` or
`GetAllElements` query ran by TinkerPop. This applies a Gaffer `Limit`
operation in the translated operation chain. This will override the default for
the current query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
for more detail on setting up defaults.

!!! example
Expand Down Expand Up @@ -76,3 +77,20 @@ from translation.
```groovy
g.with("cypher", "MATCH (p:person) RETURN p").call().toList()
```

## Include Orphaned Vertices

Key: `includeOrphanedVertices`

The option to set if orphaned vertices should be included in the result.
Orphaned vertices are deemed as vertices on an edge that have no
associated Gaffer entity with them. Enabling this will likely result in slower
query performance as each vertex on an edge needs to be checked. The orphaned
vertices returned will be in a special `id` group. This will override the default for
the current query, see the [admin guide](../../administration-guide/gaffer-deployment/gremlin.md#configuring-the-gafferpop-library)
for more detail on setting up defaults.

!!! example
```groovy
g.with("includeOrphanedVertices", "true").V().toList()
```
12 changes: 4 additions & 8 deletions docs/user-guide/apis/gremlin-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,17 +68,16 @@ and get results back.
### REST API Endpoints

The Gremlin endpoints provide a similar interface to running Gaffer Operations.
They accept a plaintext Gremlin Groovy or OpenCypher query and will return
the results in [GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
They accept a plaintext Gremlin Groovy query and will return the results in
[GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
format.

The two endpoints are:

- `/rest/gremlin/execute` - Runs a Gremlin Groovy script and outputs the result
as GraphSONv3 JSON.
- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
executes it returning a GraphSONv3 JSON result. Note will always append a
`.toList()` to the translation.
- `/rest/gremlin/explain` - Runs a Gremlin Groovy script and returns an
explanation of what Gaffer operations it ran.

A query can be submitted via the Swagger UI or simple POST request such as:

Expand All @@ -100,7 +99,4 @@ gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")

# Execute and return gremlin
gremlin_result = gc.execute_gremlin("g.V('1').toList()")

# Execute and return cypher
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
```
83 changes: 83 additions & 0 deletions docs/user-guide/apis/opencypher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# openCypher API

!!! warning
The openCypher API is still experimental, it is provided by a
translation layer to Gremlin from the [OpenCypher project](https://github.com/opencypher/cypher-for-gremlin).
Due to this, the implementation may experience the same [limitations](../query/gremlin/gremlin-limits.md)
as the Gremlin API. It's performance is unknown but likely slower than
Gremlin or Standard Gaffer Operations.

## What is openCypher?

openCypher is an open source implementation of Cypher - the most widely
adopted, fully-specified, and open query language for property graph databases.
The original Cypher language was developed by Neo4j®.

Cypher is a declarative graph query language that allows for expressive and
efficient querying and updating of the graph store. Cypher is a relatively
simple but still very powerful language. Complicated database queries can easily
be expressed through Cypher.

!!! tip
Please see the [full reference guide](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
from the openCypher organisation for more details.

## How to Query a Graph

There are two main methods of using openCypher in Gaffer, these are via Gremlin
using a websocket by wrapping the query in a Gremlin `with()` step. Or by
submitting queries via the REST Endpoints like standard Gaffer Operations. More
details on setting up the Gremlin side can be found on its [respective page](./gremlin-api.md).

!!! note
Both methods require a running [Gaffer REST API](./rest-api.md) instance.

### Using the `with()` Step

The most full featured way to use openCypher is to simply add it into a Gremlin
query. This is done using the options interface, known in Gremlin as a `with()`
step. More information on how to run a Gaffer option in Gremlin is available in
the [reference guide](../../reference/gremlin-guide/gaffer-options.md) but
general usage is outlined below:

```groovy
g.with("cypher", "MATCH (n) WHERE ID(n) = '1' RETURN n").call().toList()
```

### REST API Endpoints

The endpoints provide a similar interface to running Gaffer Operations. They
accept a plaintext OpenCypher query and will return the results in
tb06904 marked this conversation as resolved.
Show resolved Hide resolved
[GraphSONv3](https://tinkerpop.apache.org/docs/current/dev/io/#graphson-3d0)
format.

The two endpoints for openCypher are:

- `/rest/gremlin/cypher/execute` - Translates a Cypher query to Gremlin and
executes it returning a GraphSONv3 JSON result. Note will always append a
`.toList()` to the translation.
- `/rest/gremlin/cypher/explain` - Translates a Cypher query to Gremlin,
executes it and returns an explanation of what Gremlin query and Gaffer
operations it ran.

A query can be submitted via the Swagger UI or simple POST request such as:

```bash
curl -X 'POST' \
'http://localhost:8080/rest/gremlin/cypher/execute' \
-H 'accept: application/x-ndjson' \
-H 'Content-Type: text/plain' \
-d 'MATCH (n:'\''something'\'') RETURN n'
```

You can also utilise [Gafferpy](./python-api.md) to connect and run queries
using the endpoints.

```python
from gafferpy import gaffer_connector

gc = gaffer_connector.GafferConnector("http://localhost:8080/rest")

# Execute and return cypher
cypher_result = gc.execute_cypher("MATCH (n) WHERE ID(n) = '1' RETURN n")
```
8 changes: 5 additions & 3 deletions docs/user-guide/query/gremlin/gremlin-limits.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,10 @@ but some features may also be yet to be implemented.

Current TinkerPop features not present in the GafferPop implementation:

- Unseeded queries run a `GetAllElements` with a configured limit applied,
this limit can be configured per query or will default to 5000.
- Each `GetElements` or `GetAllElements` query ran by TinkerPop will have a
Gaffer `Limit` operation also applied. This limit can be configured via the
[GafferPop properties](../../../administration-guide/gaffer-deployment/gremlin.md)
or per query, but will default to 20000 if not otherwise specified.
- Gaffer graphs are readonly to Gremlin queries.
- TinkerPop Graph Computer is not supported.
- TinkerPop Transactions are not supported.
Expand All @@ -30,7 +32,7 @@ Current known limitations or bugs:
may get results back when you realistically shouldn't.
- Input seeds to Gaffer operations are deduplicated.
Therefore, the results of a query against a GafferPop graph may be different than a standard Gremlin graph.
For example, for the Tinkerpop Modern graph:
For example, for the TinkerPop Modern graph:
```text
(Gremlin) g.V().out() = [v2, v3, v3, v3, v4, v5]
(GafferPop) g.V().out() = [v2, v3, v4, v5]
Expand Down
92 changes: 92 additions & 0 deletions docs/user-guide/query/opencypher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# openCypher in Gaffer

!!! warning
The openCypher API is still experimental, it is provided by a
translation layer to Gremlin from the [OpenCypher project](https://github.com/opencypher/cypher-for-gremlin).
Due to this, the implementation may experience the same [limitations](../query/gremlin/gremlin-limits.md)
as the Gremlin API. It's performance is unknown but likely slower than
Gremlin or Standard Gaffer Operations.

## openCypher Querying

Generally the syntax and features of using openCypher in Gaffer are the same as
using Cypher in other graph databases. Most of the features you will have used
in standard Cypher should be available. The layer in Gaffer targets
[openCypher v9](https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf)
meaning any features outside of that version cannot be guaranteed.

Full guides on querying using Cypher are available elsewhere however, a few useful
queries to get you started are available here. Translation of what exact Gremlin query
this maps to is provided.

!!! example ""
Seeded vertex query using string IDs.

=== "cypher"
```cypher
MATCH (n) WHERE ID(n) IN ['0', '1', '2', '3'] RETURN n
```

=== "Gremlin"
```groovy
g.V().has('~id', within('0', '1', '2', '3')).project('n').by(__.valueMap().with('~tinkerpop.valueMap.tokens')).toList()
```

!!! example ""
Seeded edge query using string IDs.

=== "cypher"
```cypher
MATCH (s)-[r]->(d) WHERE ID(r) IN ['[0, 1]', '[2, 3]'] RETURN r
```

=== "Gremlin"
```groovy
g.E().has('~id', within('[0, 1]', '[2, 3]')).project('r').by(__.project(' cypher.element', ' cypher.inv', ' cypher.outv').by(__.valueMap().with('~tinkerpop.valueMap.tokens')).by(__.inV().id()).by(__.outV().id())).toList()
```

!!! example ""
Filtering on group and properties.

=== "cypher"
```cypher
MATCH (n:person) WHERE n.age > toInteger(25) AND n.`full-name` CONTAINS 'John' RETURN n
```

=== "Gremlin"
```groovy
g.V().as('n').hasLabel('person').has('full-name', containing('John')).where(__.constant(25d).map(cypherToInteger()).is(neq(' cypher.null')).as(' GENERATED1').select('n').values('age').where(gt(' GENERATED1'))).select('n').project('n').by(__.choose(neq(' cypher.null'), __.valueMap().with('~tinkerpop.valueMap.tokens'))).toList()
```

!!! example ""
Transform and project on properties.

=== "cypher"
```cypher
MATCH (n) RETURN (n.age * 1000), reverse(n.name)
```

=== "Gremlin"
```groovy
g.V().as('n').project('(n.age * 1000)', 'reverse(n.name)').by(__.constant(1000).as('__GENERATED1').select('n').choose(neq(' cypher.null'), __.choose(__.values('age'), __.values('age'), __.constant(' cypher.null'))).choose(__.or(__.is(eq(' cypher.null')), __.select('__GENERATED1').is(eq(' cypher.null'))), __.constant(' cypher.null'), __.math('_ * __GENERATED1'))).by(__.choose(neq(' cypher.null'), __.choose(__.values('name'), __.values('name'), __.constant(' cypher.null'))).map(cypherReverse())).toList()
```

## Limitations and Considerations

There are a few limitations you need to be aware of using this API. Generally
these stem from the translation layer but are also due to a fundamental
difference in the way Gaffer and Cypher are intended to be used.

- If using the `with()` step all numbers are longs by default you need to
specifically change them to integers if required e.g. `toInteger(1)`. You can
also change them to floats with `toFloat(1.2)`.
- How data is returned is different to normal Gremlin. It will be returned as
key value maps where each `RETURN` in the cypher query is a key.
- Currently the version of the openCypher translator is stuck at v1.0.0 due to
the Gaffer scala version. This means not all features of openCypher are
available e.g. no `replace()` function. The [reference guide](../../reference/gremlin-guide/custom-functions.md)
attempts to document all custom Cypher functions available.
- Need to be considerate of how it maps to Gremlin and Gaffer as something like
this: `MATCH (n) WHERE (n:person OR n:software)` will do a `GetAllElements`
with nothing in the Gaffer View. In this case you should use an `OPTIONAL MATCH` instead.
- Gaffer groups or properties with a `-` in require wrapping in back ticks.
10 changes: 6 additions & 4 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,23 +86,25 @@ nav:
- 'What is Python?': 'user-guide/gaffer-basics/what-is-python.md'
- 'What is Cardinality?': 'user-guide/gaffer-basics/what-is-cardinality.md'
- 'What is Aggregation?': 'user-guide/gaffer-basics/what-is-aggregation.md'
- 'Graph Schema': 'user-guide/schema.md'
- Available APIs:
- 'Spring REST': 'user-guide/apis/rest-api.md'
- 'Python (gafferpy)': 'user-guide/apis/python-api.md'
- 'Java': 'user-guide/apis/java-api.md'
- 'Gremlin (GafferPop)': 'user-guide/apis/gremlin-api.md'
- 'openCypher': 'user-guide/apis/opencypher.md'
- Querying:
- Gaffer Query Syntax:
- 'Operations': 'user-guide/query/gaffer-syntax/operations.md'
- 'Filtering Data': 'user-guide/query/gaffer-syntax/filtering.md'
- 'FAQs': 'user-guide/query/gaffer-syntax/faqs.md'
- Import/Export:
- 'Using CSV Data': 'user-guide/query/gaffer-syntax/import-export/csv.md'
- Apache Gremlin:
- Gremlin:
- 'Gremlin in Gaffer': 'user-guide/query/gremlin/gremlin.md'
- 'GafferPop Features': 'user-guide/query/gremlin/custom-features.md'
- 'GafferPop Limitations': 'user-guide/query/gremlin/gremlin-limits.md'
- 'Graph Schemas': 'user-guide/schema.md'
- 'Features': 'user-guide/query/gremlin/custom-features.md'
- 'Limitations': 'user-guide/query/gremlin/gremlin-limits.md'
- 'openCypher': 'user-guide/query/opencypher.md'
- Developer Guide:
- 'Introduction': 'development-guide/introduction.md'
- 'Ways of Working': 'development-guide/ways-of-working.md'
Expand Down
Loading