Releases: neo4j/graph-data-science
Graph Data Science 2.5.4
Other changes
- Updated Netty dependencies to
4.1.100.Final
to fix CVE-2023-44487
Graph Data Science 2.5.3
New features
- Add support for Neo4j 4.4.27
Bug fixes
- Fixed a bug that lead to an unresponsive DBMS or even OS when running GDS projections in Neo4j on MacBooks (x86 or ARM).
- Avoid a possible race condition random walking and training in Node2Vec. Also removes the need for a timeout in Node2Vec.
Improvements
- Improved state synchronization during Arrow graph import to avoid errors due to out-of-sync messages.
Graph Data Science 2.5.1
New features
- Add support for Neo4j 5.13
Graph Data Science 2.5.0
Breaking changes
- Dropped support for earlier version of Neo4j 5, in particular 5.1, 5.2, 5.3, 5.4, and 5.5 are no longer supported and GDS is no longer compatible with those versions.
New features
Major
- Added new algorithms for directed acyclic graphs:
gds.dag.topologicalSort.stream
gds.dag.longestPath.stream
- Deprecating
alpha
andbeta
namespace for procedures and algorithms, and improving many to production grade - see details in ‘Full list of procedure being promoted’
Minor
- Added procedure to retrieve the version of the installed GDS
CALL gds.version
- Add new procedure,
gds.license.state
to verify the license state of the Graph Data Science library. Also, analogous adding a new functiongds.isLicensed()
. - Added memory estimation for modularity calculation via procedures
gds.modularity.[stream|stats].estimate
- Added memory estimation for filtered KNN via procedures
gds.knn.filtered.[mutate|stream|stats|write].estimate
- Added Stats and Write modes for Harmonic Closeness Centrality
- Added new procedures for SCC:
gds.scc.mutate
gds.scc.stats
- Added memory estimation to SCC:
gds.scc.stream.estimate
gds.scc.stats.estimate
gds.scc.mutate.estimate
gds.scc.write.estimate
- Added consecutiveIds parameter to
gds.scc
procedures to output the components in a consecutive id space. - Added memory estimation for Steiner Tree via procedures
gds.steinerTree.[mutate|stream|stats|write].estimate
- Added stats mode for
gds.modularityOptimization
Bug fixes
- Fixed a bug that in logging progress of
Prepare Batches
in GraphSAGE training. - Fixed a bug where KNN would compute incorrect
EUCLIDEAN
similarity. - Fixed a bug where limits validation could potentially not be triggered with configuration settings passed by from specified defaults.
- Fixed a bug where
gds.graph.filter
would list a relationship type of__ALL__
even if all relationships were filtered out. - Fixed a bug where Triangle Count could compute an incorrect number of triangles when the
maxDegree
parameter was specified. - Fixed a bug where Triangle Count could compute an incorrect number of triangles when multiple relationship types are specified.
Improvements
- The random graph generation procedure now will return a different graph each time
gds.beta.graph.generate
is called without specifying a random seed. Furthermore, when the seed is specified, the resulting graph will always have the same topology. - It is now possible to specify common node labels when importing nodes via arrow.
- A better error message is thrown when encountering null values in the
nodeLabels
column when importing nodes via arrow. - Added the configuration option
listNodeLabels
for the node property stream procedures that will trigger listing all node labels for the respective node. - Added the configuration option
list_node_labels
for the node property stream arrow endpoints that will trigger listing all node labels for the respective node. - The Cypher projection now returns the executing query as part of the projection result as well as part of the
gds.graph.list
output. - Support passing
startNodes
togds.graph.sample.cnarw
as node objects instead of only node ids. - Support passing
nodeId
togds.util.nodeProperty
as node objects instead of only node id. - Improved validation for relationship projections: If a global
SUM
,MIN
,MAX
orCOUNT
aggregation is defined, there needs to be at least one property mapping. - HITS algorithm procedures have a default
hitsIterations
value of 20 - More accurate progress tracking for the
gds.scc
algorithm. - The
componentDistribution
andcommunityDistribution
parameters now also include thep1, p5,p10, p25
percentiles. This affects algorithms in theCommunity Detection
category.
Full list of procedure being promoted
- Promoting Model Catalog procedures:
gds.beta.model.drop
, deprecated bygds.model.drop
- Return column
shared
renamed topublished
modelName
,modelType
extracted to separate return columns
- Return column
gds.beta.model.exists
, deprecated bygds.model.exists
gds.beta.model.list
, deprecated bygds.model.list
- Return column
shared
renamed topublished
modelName
,modelType
extracted to separate return columns
- Return column
gds.alpha.model.delete
, deprecated bygds.model.delete
gds.alpha.model.load
, deprecated bygds.model.load
gds.alpha.model.publish
, deprecated bygds.model.publish
- Return column
shared
renamed topublished
modelName
,modelType
extracted to separate return columns
- Return column
gds.alpha.model.store
, deprecated bygds.model.store
- Promoting Pipeline Catalog procedures:
gds.beta.pipeline.drop
, deprecated bygds.pipeline.drop
gds.beta.pipeline.exists
, deprecated bygds.pipeline.exists
gds.beta.pipeline.list
, deprecated bygds.pipeline.list
- Procedure
gds.alpha.systemMonitor
is deprecated bygds.systemMonitor
- Procedure
gds.beta.listProgress
is deprecated bygds.listProgress
- Procedure
gds.alpha.triangles
is deprecated bygds.triangles
- Deprecating
gds.beta.steinerTree
proceduresgds.beta.steinerTree.mutate
, deprecated bygds.steinerTree.mutate
gds.beta.steinerTree.stats
, deprecated bygds.steinerTree.stats
gds.beta.steinerTree.stream
, deprecated bygds.steinerTree.stream
gds.beta.steinerTree.write
, deprecated bygds.steinerTree.write
- Deprecating
gds.beta.spanningTree
proceduresgds.beta.spanningTree.mutate[.estimate]
, deprecated bygds.spanningTree.mutate[.estimate]
gds.beta.spanningTree.stats[.estimate]
, deprecated bygds.spanningTree.stats[.estimate]
gds.beta.spanningTree.stream[.estimate]
, deprecated bygds.spanningTree.stream[.estimate]
gds.beta.spanningTree.write[.estimate]
, deprecated bygds.spanningTree.write[.estimate]
- Deprecating
gds.alpha.maxkcut
proceduresgds.alpha.maxkcut.mutate[.estimate]
, deprecated bygds.maxkcut.mutate[.estimate]
gds.alpha.maxkcut.stream[.estimate]
, deprecated bygds.maxkcut.stream[.estimate]
- Deprecating
gds.beta.closeness
proceduresgds.beta.closeness.mutate
, deprecated bygds.closeness.mutate
- The
mutateProperty
field has been removed, it can be accessed via theconfiguration
.
- The
gds.beta.closeness.stats
, deprecated bygds.closeness.stats
gds.beta.closeness.stream
, deprecated bygds.closeness.stream
gds.beta.closeness.write
, deprecated bygds.closeness.write
- The
writeProperty
field has been removed, it can be accessed via theconfiguration
.
- The
- Deprecating
gds.beta.leiden
proceduresgds.beta.leiden.mutate[.estimate]
, deprecated bygds.leiden.mutate[.estimate]
gds.beta.leiden.stats[.estimate]
, deprecated bygds.leiden.stats[.estimate]
gds.beta.leiden.stream[.estimate]
, deprecated bygds.leiden.stream[.estimate]
gds.beta.leiden.write[.estimate]
, deprecated bygds.leiden.write[.estimate]
- Deprecating
gds.alpha.conductance
proceduresgds.alpha.conductance.stream
, deprecated bygds.conductance.stream
- Deprecating
gds.alpha.modularity
proceduresgds.alpha.modularity.stream
, deprecated bygds.modularity.stream
gds.alpha.modularity.stats
, deprecated bygds.modularity.stats
- Deprecating
gds.beta.modularityOptimization
proceduresgds.beta.modularityOptimization.stream[.estimate]
, deprecated bygds.modularityOptimization.stream[.estimate]
gds.beta.modularityOptimization.stats[.estimate]
, deprecated bygds.modularityOptimization.stats[.estimate]
gds.beta.modularityOptimization.stream[.estimate]
, deprecated bygds.modularityOptimization.stream[.estimate]
gds.beta.modularityOptimization.stats[.estimate]
, deprecated bygds.modularityOptimization.stats[.estimate]
- Deprecating
gds.beta.influenceMaximization.celf
proceduresgds.beta.influenceMaximization.celf.mutate[.estimate]
, deprecated bygds.influenceMaximization.celf.mutate[.estimate]
gds.beta.influenceMaximization.celf.stats[.estimate]
, deprecated bygds.influenceMaximization.celf.stats[.estimate]
gds.beta.influenceMaximization.celfstream[.estimate]
, deprecated bygds.influenceMaximization.celf.stream[.estimate]
gds.beta.influenceMaximization.celf.write[.estimate]
, deprecated bygds.influenceMaximization.celf.write[.estimate]
- Deprecating
gds.alpha.knn.filtered
proceduresgds.alpha.knn.filtered.mutate
, deprecated bygds.knn.filtered.mutate
gds.alpha.knn.filtered.stats
, deprecated bygds.knn.filtered.stats
gds.alpha.knn.filtered.stream
, deprecated bygds.knn.filtered.stream
gds.alpha.knn.filtered.write
, deprecated bygds.knn.filtered.write
- Deprecating
gds.alpha.nodeSimilarity.filtered
proceduresgds.alpha.nodeSimilarity.filtered.mutate[.estimate]
, deprecated bygds.nodeSimilarity.filtered.mutate[.estimate]
gds.alpha.nodeSimilarity.filtered.stats[.estimate]
, deprecated bygds.nodeSimilarity.filtered.stats[.estimate]
gds.alpha.nodeSimilarity.filtered.stream[.estimate]
, deprecated bygds.nodeSimilarity.filtered.stream[.estimate]
gds.alpha.nodeSimilarity.filtered.write[.estimate]
, deprecated bygds.nodeSimilarity.filtered.write[.estimate]
- Deprecating
gds.alpha.closeness.harmonic
proceduresgds.alpha.closeness.harmonic.stream
, deprecated bygds.closeness.harmonic.stream
gds.alpha.closeness.harmonic.write
, deprecated bygds.closeness.harmonic.write
- Deprecating
gds.beta.graph.relationships
procedures- `gds.beta.graph.relati...
Graph Data Science 2.4.6
neo4j-graph-data-science-2.4.6
New features
- Added compatibility with Neo4j database 5.12.0.
Bug fixes
- Fix a bug where HITS
write
andmutate
procedures failed to parse configuration.
2.4.5
neo4j-graph-data-science-2.4.5
Bug fixes
- Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
gds.triangleCount.[stream|mutate|write|stats]
gds.localClusteringCoefficient.[stream|mutate|write|stats]
gds.alpha.triangles
Graph Data Science 2.4.4
Bug fixes
- Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up
Graph Data Science 2.4.3
Improvements
- Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
- When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping
Bug fixes
- Fixed a bug where array default values would not be serialized or deserialized to csv correctly
- Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
- Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores
Graph Data Science 2.4.1
Bug fixes
- Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
- Fix a bug when using
mutateProperty
where using the same name as an existing node property could fail. Affected procedures include:gds.alpha.knn.filtered.mutate
gds.alpha.nodeSimilarity.filtered.mutate
gds.beta.pipeline.linkPrediction.predict.mutate
gds.beta.steinerTree.mutate
gds.beta.spanningTree.mutate
gds.knn.mutate
gds.nodeSimilarity.mutate
Improvements
- Improved error handling when negative node ids are used as input in the
sourceNode
,targetNode
,sourceNodes
, andtargetNodes
fields. - Improved performance when projecting in-memory graphs when projecting larger graphs.
Graph Data Science 2.4.0
Breaking changes
- Pass
concurrency
when training a pipeline to the node property steps. Before they were executed with the default concurrency of4
if not overridden. This affectsgds.beta.pipeline.linkPrediction.train
gds.beta.pipeline.nodeClassification.train
gds.alpha.pipeline.nodeClassification.train
New features
Major
- Added Bellman-Ford algorithm
- Added K-Core Decomposition algorithm
- Added new Common Neighbour Aware Random Walk graph sampling algorithm
- Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable
Minor
-
You can rename node properties when writing them back to the neo4j database using
gds.nodeProperties.write
by placing them inside a map in the formnodeProperty: 'renamedProperty'
. -
Added
minCommunitySize|minComponentSize
parameter to more procedures to allow filtering the result. (Contributed by @airtyon) -
Added new procedure
gds.alpha.drop.cypherdb
to drop created in-memory databases -
Added
upperDegreeCutoff
parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value. -
Added
aggregation
togds.beta.toUndirected
to allow the aggregation of the new undirected relationships. -
Added new optional parameter
storeModelToDisk
that automatically saves serializable models after training for licensed users. This affectsgds.beta.pipeline.[linkPrediction|nodeClassification].train
andgds.beta.graphSage.train
. -
Added procedure
gds.graph.relationshipProperties.write
that allows writing relationships with multiple properties to Neo4j. -
Cypher Aggregation has graduated, which comes with a new name and API changes:
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
- The existing 'Cypher projection' (
gds.graph.project.cypher
) is now called "Legacy Cypher projection"
- The existing 'Cypher projection' (
- The procedure name is losing the
alpha
qualifier and is now calledgds.graph.project
. - The old name
gds.alpha.graph.project
is deprecated and usages will forward to the new name while also adapting to the new API. - The 4th and 5th parameters
nodeConfig
andrelationshipConfig
have been merged into a singledataConfig
parameter. - The
properties
configuration key in this mergeddataConfig
parameter has been renamed torelationshipProperties
. - The overall projection configuration (e.g.
readConcurrency
) has moved from the 6th parameter to the 5th parameter.
- The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
-
Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the
FlightInfo
endpoint.
Bug fixes
- Fixed: Arrow server doesn't enable to project graphs with blank names anymore
- Fixed: Arrow validates dangling relationships when creating an in-memory graph
- Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
- Fixed a bug where
gds.graph.export
could fail when exporting larger graphs - Fixed a bug where
gds.alpha.kSpanningTree
returned incorrect results when called with thenodeLabels
parameter. - Fixed a bug where
gds.triangleCount
would throw an ArrayIndexOutOfBoundsException when called with thenodeLabels
parameter. - Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.
Improvements
Major
- Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
- FastRP
- HashGNN
- Leiden
- Approxmaxkcut
- Conductance
- LinkPrediction training
- ToUndirected
- Improved partitioning. This affects the parallel runtime of
gds.alpha.hits
,gds.beta.graph.project.subgraph
andgds.beta.pipeline.linkPrediction.predict
ifsampleRate = 0
Minor
- Improve progress tracking for
gds.beta.graphSage.train
. This will enable progress bars on the python client. - Improve error message for invalid
nodeLabels
andrelationshipTypes
for procedures supporting memory estimation. - Allow running
gds.debug.sysInfo
andgds.debug.arrow
to run against the system database. - Improve automatic conversion of array property values during graph projection.
- The Yens algorithm can now be run in parallel.
- The node regression now verifies upfront that the all
targetProperty
values provided are valid when callinggds.alpha.pipeline.nodeRegression.train
. - The scale properties algorithm has been promoted:
- Added new procedures
gds.scaleProperties.[stream,mutate]
which replacegds.alpha.scaleProperties.[stream,mutate]
that are now deprecated- The scalers
L1Norm
andL2Norm
are not supported in the new procedures.
- The scalers
- Added new procedures
gds.scaleProperties.[stats,write]
to return statistics from a scale properties computation and write scaled properties back to a database respectively - Procedures
gds.scaleProperties.[mutate,stats,stream,write]
support progress tracking with volumes. This will enable progress bars on the python client - Procedures
gds.scaleProperties.[mutate,stats,write]
return statistics from the performed scale computation - Added new parameter
offset
to thelog
scaler. This also affects procedures:gds.pageRank
gds.eigenvector
gds.articleRank
- Added new procedures
gds.scaleProperties.[mutate|stats|stream|write].estimate
for estimating the memory requirements of running the scale properties algorithm - Nodes with missing properties (
null
orNaN
) are now omitted in the scale computation. Their scale value is set toNaN
in the output.
- Added new procedures
- Reduce the memory footprint of the binary embeddings saved by
gds.beta.hashgnn.mutate
. - Promote random forest classifier to beta tier. Added
gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest
which replacegds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest
that are now deprecated. - Reduced memory allocation for the Spanning Tree algorithm.
- A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
- Improve memory usage when projecting very large graphs with very high degree nodes.
- Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
- The import of nodes with negative id via arrow into a database is now forbidden.
- Graph restore now attempts to use the same id map implementation that has been used for the original graph.
- Setting the
useBadCollector
option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.