Releases: neo4j/graph-data-science
Graph Data Science 2.2.5
Neo4j Graph Data Science 2.2.5 is compatible with Neo4j Database 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For Neo4j Graph Data Science compatibility, please use the Neo4j Compatibility Matrix.
Bug Fixes
- Some functions would not work as expected with Neo4j 5.x versions
gds.alpha.linkprediction.adamicAdar
gds.alpha.linkprediction.commonNeighbors
gds.alpha.linkprediction.resourceAllocation
gds.alpha.linkprediction.totalNeighbors
Graph Data Science 2.3.0-Alpha02
GDS 2.3.0-alpha02 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Breaking changes
- Leiden promoted to the beta tier. It is now called via the 'gds.beta.leiden' command instead of the
gds.alpha.leiden
command. - K-means is promoted to the beta tier. It is now called via the
gds.beta.kmeans
command instead of thegds.alpha.kmeans
command. - The parameter
startNodeId
in Spanning Tree algorithms have been replaced withsourceNode
. - The procedures
gds.alpha.spanningTree.minimum
andgds.alpha.spanningTree.maximum
have been removed. You can get the same behavior by specifying the new parameterobjective
ingds.alpha.spanningTree
.
New features
Leiden
- New parameter
consecutiveIds
that assigns consecutive ids for the discovered communities. - New parameter
seedProperty
to seed initial communities for nodes. - New parameter
tolerance
to enable convergence criteria based on difference in modularity from one iteration to another. - Now available in progress tracking -
gds.list.progress()
- Added memory estimation mode:
gds.beta.leiden.mutate.estimate
gds.beta.leiden.stats.estimate
gds.beta.leiden.stream.estimate
gds.beta.leiden.write.estimate
Logistic Regression & MLP
- New configuration parameters
classWeights
andfocusWeight
for training methods, supported by procedures:gds.beta.pipeline.nodeClassification.addLogisticRegression
gds.beta.pipeline.nodeClassification.addMLP
gds.beta.pipeline.linkPrediction.addLogisticRegression
gds.beta.pipeline.linkPrediction.addMLP
HashGNN
- New algorithm
gds.alpha.hashgnn.{mutate,stream}
to create HashGNN node embeddings - New procedures
gds.alpha.hashgnn.{mutate,stream}.estimate
to estimate the memory required to run HashGNN
Spanning Tree
- New modes supported:
gds.alpha.spanningTree.(stats, stream, mutate)
- New yield output for
gds.alpha.spanningTree
that outputs the sum of weights in the discovered spanning tree. - New yield output for
gds.alpha.spanningTree
that outputs the number of relationships written or added for write and mutate mode respectively. - Added memory estimation mode:
gds.alpha.spanningTree.stream.estimate
gds.alpha.spanningTree.mutate.estimate
gds.alpha.spanningTree.stats.estimate
gds.alpha.spanningTree.write.estimate
Bug fixes
- Fixed a bug in Minimum Weighted Spanning Tree on graphs with parallel edges where the discovered tree could have wrong weights.
Improvements
Arrow
- graph import now fully supports external node ids in the 64 Bit space.
- graph import now supports 16, 32 or 64 Bit node identifiers.
Leiden
- Better parallelization and improved overall performance improvements
Other changes
- Histograms returned such as
degreeDistribution
ingds.graph.list
can have slightly different values for specific percentiles due to changes in floating point operations. - Progress tracking in the Spanning Tree algorithm has been reworked. Progress reporting may differ from earlier versions.
Graph Data Science 2.2.4
GDS 2.2.4 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Bug fixes
gds.alpha.nodeSimilarity.filtered
- would give incorrect node IDs.- Pregel framework - the computation would not stop after terminating the underlying transaction. This affects
gds.pageRank
,gds.articleRank
,gds.eigenvector
. alpha.hits
andgds.alpha.sllpa
could not be used as a nodeProperty step inside ml pipeline includinggds.beta.pipeline.linkPrediction
,gds.beta.pipeline.nodeClassification
, andgds.alpha.pipeline.nodeRegression
.- nodeProperty steps could not be added to ml pipelines when running against Neo4j 5.x. This affected
gds.beta.pipeline.linkPrediction
,gds.beta.pipeline.nodeClassification
, andgds.alpha.pipeline.nodeRegression
.
Improvements
gds.graph.list
will only calculate the graph size when the procedure is called without anyYIELD
or if the fieldsmemoryUsage
orsizeInBytes
are explicitlyYIELD
-ed.
UsingYIELD
to return other fields but not one ofmemoryUsage
orsizeInBytes
can speed up the execution time ofgds.graph.list
.
Graph Data Science 2.2.3
GDS 2.2.3 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Bug fixes
gds.graph.export
failed to run on Neo4j 5.Xgds.graph.export
failed with InvalidRecordException whenwriteConcurrency
is set >1.- Enterprise users were unable to load models trained with concurrency > 4.
Improvements
- Arrow graph import now fully supports external node ids in the 64 Bit space.
Graph Data Science 2.3.0-alpha01
GDS 2.2.2 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
New features
- Added a parameter
consecutiveIds
to Leiden to assign consecutive ids for the discovered communities. - Added a parameter
seedProperty
to Leiden to seed initial communities for nodes. - Added new configuration parameter
focusWeight
for Logistic Regression training method, supported by procedures:gds.beta.pipeline.nodeClassification.addLogisticRegression
gds.beta.pipeline.linkPrediction.addLogisticRegression
Bug fixes
- Fixed a bug in Minimum Weighted Spanning Tree on graphs with parallel edges where the discovered tree could have wrong weights.
Improvements
- Arrow graph import now fully supports external node ids in the 64 Bit space.
- Arrow graph import now supports 16, 32 or 64 Bit node identifiers.
Other changes
- Histograms returned such as
degreeDistribution
ingds.graph.list
can have slightly different values for specific percentiles due to changes in floating point operations.
Graph Data Science 2.2.2
GDS 2.2.2 is compatible with Neo4j 5 versions & 4.4 versions ≥ 4.4.9 & 4.3 versions ≥ 4.3.15.
For GDS compatibility with previous releases, please use GDS Compatibility Table.
Improvements
- Graph Data Science ≥2.2.2 now supports Neo4j 5
Graph Data Science 2.2.1
GDS 2.2.1 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Breaking changes
- Change the content of some fields from the output of
gds.debug.arrow
:-
listenAddress
now always returns the same content asadvertisedListenAddress
-
serverLocation
always returnsNULL
-
Graph Data Science 2.2.0
GDS 2.2.0 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Breaking changes
- Link Prediction filtering:
- Change graph filtering in
gds.beta.pipeline.linkPrediction.train
- Replace parameter
nodeLabels
withsourceNodeLabel
andtargetNodeLabel
. - Replace parameter
relationshipTypes
withtargetRelationshipType
.
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.linkPrediction.predict
- Replace parameter
nodeLabels
with optionalsourceNodeLabels
andtargetNodeLabels
. By default, they will be derived from the model's train configuration. - Change the default value for
relationshipTypes
with thetargetRelationshipType
from the model's train configuration.
- Replace parameter
- Change graph filtering in
- Node Classification & Regression filtering:
- Change graph filtering in
gds.beta.pipeline.nodeClassification.train
andgds.beta.pipeline.nodeRegression.train
- Replace parameter
nodeLabels
withtargetNodeLabels
- Replace parameter
- Change graph filtering in
gds.beta.pipeline.nodeClassification.predict
andgds.beta.pipeline.nodeRegression.predict
- Replace parameter
nodeLabels
withtargetNodeLabels
By default, they will be derived from the model's train configuration.
- Replace parameter
- Change graph filtering in
- Promoting Collapse Path to beta tier
- Changed the procedure name to
gds.beta.collapsePath.mutate
- Use parameter
pathTemplates
to now specify multiple_path templates_.
- Changed the procedure name to
- Promoting CELF to
beta
tier- Moved
gds.alpha.influenceMaximization.celf.stream
togds.beta.influenceMaximization.celf.stream
- Moved
- For graphs created, with
gds.graph.project.cypher
, reduce output ofgds.graph.list
to only print the names ofparameters
. This will avoid printing the parameter values, which potentially leads to long procedure execution times. - RandomWalk algorithm promoted to product tier
gds.beta.randomWalk.stats
=>gds.randomWalk.stats
gds.beta.randomWalk.stats.estimate
=>gds.randomWalk.stats.estimate
gds.beta.randomWalk.stream
=>gds.randomWalk.stream
gds.beta.randomWalk.stream.estimate
=>gds.randomWalk.stream.estimate
- Removed
debug_log
config field from Arrow Create Database action. - Node2Vec uses new embedding initializer
NORMALIZED
as default. - Dropped support for older patches:
- for 4.3, only 4.3.15 and later is supported
- for 4.4, only 4.4.9 and later is supported
New features
- Link Prediction filtering:
- Supports heterogeneous LinkPrediction pipelines by allowing configuring which node labels and relationship type to train and predict for.
- See Breaking changes above for more details.
- K-means:
- Added centroids and average node-centroid distance to result for Mutate, Stats, and Write modes.
- Added distance to centroid per node result in Stream mode.
- Introduced a parameter
numberOfRestarts
that runs K-Means multiple times and picks the one with the minimum node-centroid distance. - Introduced a parameter
computeSilhouette
that if enabled will compute silhouette related metrics. - Introduced a parameter
initialSampler
which can select different sampling strategies for picking the first centroids.- Added the
K-means++
initialization algorithm which can be enabled by settinginitialSampler=kmeans++
.
- Added the
- Introduced the parameter
seedCentroids
which seeds input centroids to k-means (in negation of the above).
- Introduced a new scaler
Center
forScaleProperties
that subtracts the mean from each value. - Expose
penaltyL2
to configure the l2 regularization term to the loss function ingds.beta.graphSage.train
. - Add Multilayer Perceptron as a training method for node classification (
gds.alpha.pipeline.nodeClassification.addMLP
) and link prediction (gds.alpha.pipeline.linkPrediction.addMLP
). - Add
SAME_CATEGORY
feature type togds.beta.pipeline.linkPrediction.addFeature
. - Added new procedure
gds.beta.graph.relationships.stream
that streams relationship topology. - Added arrow export endpoint
gds.beta.graph.relationships.stream
that streams relationship topology. - Added new procedure
gds.alpha.graph.sample.rwr
that creates a new graph projection by sampling using random walk with restarts. - Added the ability to collapse multiple paths using
gds.beta.collapsePath.mutate
. - Promoting CELF algorithm to
beta
tier.- Added
gds.beta.influenceMaximization.celf.stats
- Added
gds.beta.influenceMaximization.celf.mutate
- Added
gds.beta.influenceMaximization.celf.write
- Added progress tracking capabilities.
- Added memory estimation.
- Added
- Progress tracking for KMeans algorithm.
- Memory estimation for KMeans.
- added
gds.alpha.kmeans.mutate.estimate
- added
gds.alpha.kmeans.stats.estimate
- added
gds.alpha.kmeans.stream.estimate
- added
gds.alpha.kmeans.write.estimate
- added
- Added procedure to compute modularity for pre-computed communities.
gds.alpha.modularity.stats
gds.alpha.modularity.stream
- Added new config options to the GDS Flight server.
gds.arrow.encryption.never
deactivates the server encryption even if it would otherwise be enabled.gds.arrow.advertised_listen_address
sets the server location that clients should connect to.
- Added support for importing
String
node identifiers for the ArrowCREATE_DATABASE
action. - Added capability to run BetweennessCentrality using relationship weights.
- Added
relationshipWeightProperty
optional configuration parameter.
- Added
- Added
stats
mode procedures for RandomWalk.gds.beta.randomWalk.stats
gds.beta.randomWalk.stats.estimate
- Introduced the ability to configure defaults and limits for configuration parameters.
gds.alpha.config.defaults.list
gds.alpha.config.defaults.set
gds.alpha.config.limits.list
gds.alpha.config.limits.set
- Introduce new configuration parameters
contextNodeLabels
andcontextRelationshipTypes
in nodePropertySteps.gds.beta.pipeline.linkPrediction.addNodeProperty
gds.beta.pipeline.nodeClassification.addNodeProperty
gds.alpha.pipeline.nodeRegression.addNodeProperty
- The context is used to enlarge the input graph to the node property steps when running
gds.beta.pipeline.linkPrediction.addNodeProperty.[train|predict]
,gds.beta.pipeline.nodeClassification.[train|predict]
andgds.alpha.pipeline.nodeRegression.[train|predict]
.
Leiden
- Add capability to mutate
intermediateCommunities
whenincludeIntermediateCommunities
is set totrue
. - Add capability to write
intermediateCommunities
whenincludeIntermediateCommunities
is set totrue
.
- Add capability to mutate
- Node2Vec adds new embedding initializer
NORMALIZED
configured with the parameterembeddingInitializer
.
Bug fixes
- Fixed a bug where eager checking for business rules around GDS on a Neo4j cluster would cause the cluster to fail to start.
- Fixed a bug where Neo4j users with
admin
role could not see all graphs in the catalog on GDS enterprise. - Fixed a bug in random graph generation where the resulting graph can end up with an incorrect relationship schema.
- Fixed a bug where a schema filter would not create a deep copy of the property schema map.
- Fixed a bug where modularity could have been incorrectly updated in ModularityOptimization. This may affect the number of performed iterations for ModularityOptimization or number of levels for Louvain.
- Fixed a bug where restoring from csv could not read values wrapped in quotes.
- Fixed a bug where KNN did not use the expected search space. This will improve the result but also increase the runtime.
- Fixed a bug in ML autotuning where
maxTrials
included model evaluations with concrete configs. - Fixed a bug where
gds.triangleCount
andgds.localClusteringCoefficient
were allowed to run on directed graphs. - Fixed a bug in
gds.graph.export
and Arrow DB import where thewriteConcurrency
was not respected. - Fixed a bug with Node Operations where
gds.graph.nodeProperties.write
,gds.graph.nodeProperties.drop
andgds.graph.nodeProperties/y.stream
would not acceptString
input for parametersnodeLabels
and/ornodeProperties
. - Fixed a bug, where Node2Vec would report negative losses.
- Fixed a bug with
gds.graph.nodeProperties/y.stream
, where the wrong nodes where returned when specifying anodeLabels
filter and using Arrow. - Fixed a bug in the Louvain algorithm, where aggregating dense communities could potentially lead to an exception.
- Fixed a bug where model loading is attempted even for unlicensed user, which might fail database startup.
Improvements
- Better error handling in K-means
- Improve memory estimation for
gds.beta.pipeline.linkPrediction.train
when the nodePropertySteps used a weighted graph. - Improve runtime of feature generation in
gds.beta.linkPrediction.[train|predict]
. - Improve performance of
gds.graph.project.cypher
by using the subscriber API. - Improve convergence criteria for
LogisticRegression
andLinearRegression
trainers, by making it independent of the number of batches. This affectsgds.alpha.pipeline.nodeRegression.train
,gds.beta.pipeline.[linkPrediction|nodeClassification].train
. - Improve error handling on invalid user input.
- Cypher on GDS projections is now capable of setting labels on nodes.
- Promoting CELF algorithm to `bet...
Graph Data Science 2.1.13
GDS 2.1.13 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Bug Fixes
-
gds.graph.nodeProperties.write
,gds.graph.nodeProperties.drop
,gds.graph.nodeProperty.stream
andgds.graph.nodeProperties.stream
now acceptString
input for parametersnodeLabels
and/ornodeProperties
. -
gds.graph.nodeProperty.stream
andgds.graph.nodeProperties.stream
, would return the wrong nodes when specifying anodeLabels
filter when using Arrow. - Louvain algorithm would throw an exception when aggregating dense communities.
Improvements
-
Export to CSV now enabled when GDS is running on a Causal Cluster Read Replica
-
gds.beta.graph.export.csv
-
gds.beta.graph.export.csv.estimate
-
Graph Data Science 2.1.12
GDS 2.1.12 is compatible with Neo 4.3 versions ≥ 4.3.15 and 4.4 ≥ 4.4.9.
For GDS compatibility with previous releases of 4.3 and 4.4, please use please see GDS 2.1.6. The 2.1 series is also incompatible with Neo4j 3.5.x, 4.0, 4.1, and 4.2. For a 3.5 compatible release, please see GDS 1.1.7. For a 4.0 compatible release, please see GDS 1.6.5. For a 4.1 or 4.2 compatible release, please see GDS 1.8.8
Improvements
-
New procedures for enabling and disabling Arrow database import (default: enabled)
gds.features.enableArrowDatabaseImport
gds.features.enableArrowDatabaseImport.reset