This page describes differences between Cypher and Gremlin, most of which could be addressed by installing Gremlin Extensions for Cypher Support.
- 75% of the TCK scenarios are supported with translation to common Gremlin steps
- Enabling fuller Cypher support
- Cypher features that cannot be supported
You are very welcome to suggest better translation or workaround.
Some functionality is exclusive to Gremlin Servers with Gremlin Extensions for Cypher Support, commonly provided by the Cypher Gremlin Server plugin. For example, functions and predicates that are not available in Gremlin. Note that with or without extensions, translation does not use lambdas, as it is considered bad practice.
For examples, search for tests with category org.opencypher.gremlin.groups.SkipExtensions
.
⚠ Note: Gremlin Extensions for Cypher Support are not available for cloud implementations like AWS Neptune or Cosmos DB.
The easiest way to use this module is by installing the Gremlin Server Cypher plugin on the target Gremlin Server. The plugin includes all of the extensions and registers them on the Server.
Alternatively, add CustomPredicate.java and CustomFunctions.java to Gremlin Groovy script engine.
If extensions are installed on a target server and translation happens on the client side, extensions should be explicitly enabled on the client so the resulting query can use it. For example:
- Java API:
Translator<String, GroovyPredicate> translator = Translator.builder() .gremlinGroovy() .enableCypherExtensions() .build(TranslatorFlavor.gremlinServer());
- Console:
:remote connect opencypher.gremlin conf/remote-objects.yaml translate gremlin+cfog_server_extensions
⚠ Note To include Gremlin in Cypher query see Gremlin Function.
Functions that are present in Cypher but not in Gremlin:
- Type Conversion functions:
toString
,toBoolean
,toInteger
,toFloat
- Type predicates:
isString
,isRelationship
,isNode
- String functions:
reverse
,substring
,trim
,toUpper
,toLower
... - Regex predicate:
regex
- Percentile functions: percentileCont, percentileDisc
- round function
There are no functions or predicates to get the type of object in Gremlin. However, depending on the type of object, Gremlin steps required to achieve certain functionality might be different. For example, when accessing an element by index:
WITH $p AS unknown
RETURN unknown[$k] AS r
Because p
and k
are non constant values (parameters) type is unknown to parser. Depending on type translation to Gremlin could be:
values($k)
for vertex and edgeselect($k)
for mapsrange(Scope.local, $k, $k + 1)
for listsconstant(null)
ifk
orp
are null
As Gremlin will throw an exception if the step does not match object type, custom functions are used for the unknown type when:
- Accessing element by index
- properties function
- size function
If neither type information and extensions are not available, CfoG relies on "best guess" approach.
In Cypher, plus operator works with numbers, strings and arrays:
RETURN 1 + 2; // 3
RETURN 'a' + 'b'; // "ab"
RETURN [1, 2]+[3,4]; // [1, 2, 3, 4]
RETURN [1, 2]+3; // [1, 2, 3]
In Gremlin different steps are required for each type:
- Math step for numbers
.union(select('list1').unfold(), select('list2').unfold()).fold())
for collections- String concatenation is not supported in Gremlin
If type information is unknown (or on string concatenation) - custom function is used. If Gremlin Extensions for Cypher Support are not installed, translation falls back to number operator.
- Gremlin has no concept of
null
g.V().has('name', 'lop').values('notExising') // empty results g.V().has('name', 'lop').project('p').by(values('notExising')) // The provided traverser does not map to a value: v[3]->[PropertiesStep([notExising],value)]
null
value produces NullPointerExceptiong.inject(null)
- To represent Cypher
null
value, string token" cypher.null"
is used - To produce
null
values:.choose(traversal, traversal, " cypher.null")
- Null guards are added to translation:
choose(" cypher.null", traversal)
There is no known way to throw custom an exception from Gremlin traversal. To achieve runtime validation (for example deleting nodes that still have relationships) custom function is used.
Currently is implemented using repeat
, emit
and
times
step combination, that works in generic cases, but fails when graph contains loops.
For example nodes a
and b
, b
contains self-loop:
CREATE (a:a)-[:knows]->(b:b)
CREATE (b)-[:knows]->(b)
To get all paths with length from 1 to 4:
MATCH p = (a:a)-[:knows*1..4]->(b) RETURN p
Expected result is:
["a", "knows", "b"],
["a", "knows", "b", "knows", "b"]
Current translation (simplified):
g.V().as('a').hasLabel('a').
emit(__.loops().is(gte(1))).
repeat(__.outE('knows').inV()).
times(4)
Result is:
["a", "knows", "b"],
["a", "knows", "b", "knows", "b"],
["a", "knows", "b", "knows", "b", "knows", "b"],
["a", "knows", "b", "knows", "b", "knows", "b", "knows", "b"]
These are queries and TCK scenarios in Cypher TCK without known translation to Gremlin. You are very welcome to suggest translation or workaround.
TinkerPop3 Documentation documentation states:
Vertices are allowed a single immutable string label
In Cypher it is possible to modify label on a node.
Following TCK scenarios rely on thact feature:
- LabelsAcceptance,"Adding a single label"
- LabelsAcceptance,"Ignore space before colon"
- LabelsAcceptance,"Ignoring intermediate whitespace 1"
- LabelsAcceptance,"Removing a label"
- LabelsAcceptance,"Removing a non-existent label"
- MergeNodeAcceptance,"Merge node with label add label on create"
- MergeNodeAcceptance,"Merge node with label add label on match when it exists"
- MergeNodeAcceptance,"Should be able to set labels on match and on create"
- MergeNodeAcceptance,"Should be able to set labels on match"
- NullAcceptance,"Ignore null when removing label"
- NullAcceptance,"Ignore null when setting label"
- RemoveAcceptance,"Remove a single label"
- SetAcceptance,"Add a label to a node"
- RemoveAcceptance,"Remove multiple labels"
- SetAcceptance,"Add a label to a node"
- LabelsAcceptance,"Adding multiple labels"
In Cypher, it is possible to compare values with different types:
UNWIND [1, 'string'] AS x RETURN max(x) // returns 1
Gremlin traversal fails when comparing values with different types:
g.inject(1).inject('string').max() // fails java.lang.Integer cannot be cast to java.lang.String
Following TCK scenarios rely on that feature:
- Aggregation,"
max()
over list values" - Aggregation,"
max()
over mixed numeric values" - Aggregation,"
max()
over mixed values" - Aggregation,"
max()
over strings" - Aggregation,"
min()
over list values" - Aggregation,"
min()
over mixed values" - Aggregation,"
min()
over strings" - Comparability,"Comparing strings and integers using > in a OR'd predicate "
- Comparability,"Comparing strings and integers using > in an AND'd predicate"
There is no common support of Temporal Types in Gremlin, so each implementation may handle it differently. It is possible to use java.util.Date
to represent this in JanusGraph and inmemory TinkerGraph. However, it would require lots of custom code for:
- Creation and parsing of Temporal Types
- Property access
- Arithmetics
- Comparison
- Custom predicates for all comparisons to account Temporal Types
- Additional functions like
truncate
In query translated by Cypher for Gremlin return elements are normalized depending on element type. When element type is unknown, normalization is not possible.
MATCH (n)-[r]->(m)
RETURN [n, r, m] AS r
Following TCK scenarios rely on that feature:
- MatchAcceptance2.Projecting a list of nodes and relationships
- MatchAcceptance2.Projecting a map of nodes and relationships
In Cypher direction of the traversed relationship is not significant for path equality. Path comparison works different in Gremlin and considers direction.
Following TCK scenarios rely on that feature:
- PathEquality.Direction of traversed relationship is not significant for path equality, simple
There is no known simple way to create numeric range in Gremlin (Range) step has different use).
Current implementation of numeric range
in Cypher for Gremlin timeouts when creating a big range (over 10000 elements):
// range(1000000, 2000000)
g.inject('start').
repeat(sideEffect(loops().
is(gte(1000000)).
aggregate('range'))).
until(loops().is(gt(2000000))).
select('range')
Following TCK scenario rely on that feature:
- AggregationAcceptance.No overflow during summation
- Expects range from
1000000
to2000000
- Expects range from
Some Gremlin steps support only constant values. For example Skip step. No known way to pass dynamic or expression variables.
MATCH (n) RETURN n SKIP toInteger(rand()*9)
Following TCK scenario rely on that feature:
- SkipLimitAcceptanceTest.SKIP with an expression that does not depend on variables