Skip to content

Releases: neo4j/graph-data-science

Graph Data Science 2.5.4

16 Nov 07:52
Compare
Choose a tag to compare

Other changes

Graph Data Science 2.5.3

13 Nov 15:31
Compare
Choose a tag to compare

New features

  • Add support for Neo4j 4.4.27

Bug fixes

  • Fixed a bug that lead to an unresponsive DBMS or even OS when running GDS projections in Neo4j on MacBooks (x86 or ARM).
  • Avoid a possible race condition random walking and training in Node2Vec. Also removes the need for a timeout in Node2Vec.

Improvements

  • Improved state synchronization during Arrow graph import to avoid errors due to out-of-sync messages.

Graph Data Science 2.5.1

23 Oct 09:05
Compare
Choose a tag to compare

New features

  • Add support for Neo4j 5.13

Graph Data Science 2.5.0

13 Oct 09:43
Compare
Choose a tag to compare

Breaking changes

  • Dropped support for earlier version of Neo4j 5, in particular 5.1, 5.2, 5.3, 5.4, and 5.5 are no longer supported and GDS is no longer compatible with those versions.

New features

Major

  • Added new algorithms for directed acyclic graphs:
    • gds.dag.topologicalSort.stream
    • gds.dag.longestPath.stream
  • Deprecating alpha and beta namespace for procedures and algorithms, and improving many to production grade - see details in ‘Full list of procedure being promoted’

Minor

  • Added procedure to retrieve the version of the installed GDS
    • CALL gds.version
  • Add new procedure, gds.license.state to verify the license state of the Graph Data Science library. Also, analogous adding a new function gds.isLicensed().
  • Added memory estimation for modularity calculation via procedures gds.modularity.[stream|stats].estimate
  • Added memory estimation for filtered KNN via procedures gds.knn.filtered.[mutate|stream|stats|write].estimate
  • Added Stats and Write modes for Harmonic Closeness Centrality
  • Added new procedures for SCC:
    • gds.scc.mutate
    • gds.scc.stats
  • Added memory estimation to SCC:
    • gds.scc.stream.estimate
    • gds.scc.stats.estimate
    • gds.scc.mutate.estimate
    • gds.scc.write.estimate
  • Added consecutiveIds parameter to gds.scc procedures to output the components in a consecutive id space.
  • Added memory estimation for Steiner Tree via procedures gds.steinerTree.[mutate|stream|stats|write].estimate
  • Added stats mode for gds.modularityOptimization

Bug fixes

  • Fixed a bug that in logging progress of Prepare Batches in GraphSAGE training.
  • Fixed a bug where KNN would compute incorrect EUCLIDEAN similarity.
  • Fixed a bug where limits validation could potentially not be triggered with configuration settings passed by from specified defaults.
  • Fixed a bug where gds.graph.filter would list a relationship type of __ALL__ even if all relationships were filtered out.
  • Fixed a bug where Triangle Count could compute an incorrect number of triangles when the maxDegree parameter was specified.
  • Fixed a bug where Triangle Count could compute an incorrect number of triangles when multiple relationship types are specified.

Improvements

  • The random graph generation procedure now will return a different graph each time gds.beta.graph.generate is called without specifying a random seed. Furthermore, when the seed is specified, the resulting graph will always have the same topology.
  • It is now possible to specify common node labels when importing nodes via arrow.
  • A better error message is thrown when encountering null values in the nodeLabels column when importing nodes via arrow.
  • Added the configuration option listNodeLabels for the node property stream procedures that will trigger listing all node labels for the respective node.
  • Added the configuration option list_node_labels for the node property stream arrow endpoints that will trigger listing all node labels for the respective node.
  • The Cypher projection now returns the executing query as part of the projection result as well as part of the gds.graph.list output.
  • Support passing startNodes to gds.graph.sample.cnarw as node objects instead of only node ids.
  • Support passing nodeId to gds.util.nodeProperty as node objects instead of only node id.
  • Improved validation for relationship projections: If a global SUM, MIN, MAX or COUNT aggregation is defined, there needs to be at least one property mapping.
  • HITS algorithm procedures have a default hitsIterations value of 20
  • More accurate progress tracking for the gds.scc algorithm.
  • The componentDistribution and communityDistribution parameters now also include the p1, p5,p10, p25 percentiles. This affects algorithms in the Community Detection category.

Full list of procedure being promoted

  • Promoting Model Catalog procedures:
    • gds.beta.model.drop, deprecated by gds.model.drop
      • Return column shared renamed to published
      • modelName, modelType extracted to separate return columns
    • gds.beta.model.exists, deprecated by gds.model.exists
    • gds.beta.model.list, deprecated by gds.model.list
      • Return column shared renamed to published
      • modelName, modelType extracted to separate return columns
    • gds.alpha.model.delete, deprecated by gds.model.delete
    • gds.alpha.model.load, deprecated by gds.model.load
    • gds.alpha.model.publish, deprecated by gds.model.publish
      • Return column shared renamed to published
      • modelName, modelType extracted to separate return columns
    • gds.alpha.model.store, deprecated by gds.model.store
  • Promoting Pipeline Catalog procedures:
    • gds.beta.pipeline.drop, deprecated by gds.pipeline.drop
    • gds.beta.pipeline.exists, deprecated by gds.pipeline.exists
    • gds.beta.pipeline.list, deprecated by gds.pipeline.list
    • Procedure gds.alpha.systemMonitor is deprecated by gds.systemMonitor
    • Procedure gds.beta.listProgress is deprecated by gds.listProgress
    • Procedure gds.alpha.triangles is deprecated by gds.triangles
  • Deprecating gds.beta.steinerTree procedures
    • gds.beta.steinerTree.mutate, deprecated by gds.steinerTree.mutate
    • gds.beta.steinerTree.stats, deprecated by gds.steinerTree.stats
    • gds.beta.steinerTree.stream, deprecated by gds.steinerTree.stream
    • gds.beta.steinerTree.write, deprecated by gds.steinerTree.write
  • Deprecating gds.beta.spanningTree procedures
    • gds.beta.spanningTree.mutate[.estimate], deprecated by gds.spanningTree.mutate[.estimate]
    • gds.beta.spanningTree.stats[.estimate], deprecated by gds.spanningTree.stats[.estimate]
    • gds.beta.spanningTree.stream[.estimate], deprecated by gds.spanningTree.stream[.estimate]
    • gds.beta.spanningTree.write[.estimate], deprecated by gds.spanningTree.write[.estimate]
  • Deprecating gds.alpha.maxkcut procedures
    • gds.alpha.maxkcut.mutate[.estimate], deprecated by gds.maxkcut.mutate[.estimate]
    • gds.alpha.maxkcut.stream[.estimate], deprecated by gds.maxkcut.stream[.estimate]
  • Deprecating gds.beta.closeness procedures
    • gds.beta.closeness.mutate, deprecated by gds.closeness.mutate
      • The mutateProperty field has been removed, it can be accessed via the configuration.
    • gds.beta.closeness.stats, deprecated by gds.closeness.stats
    • gds.beta.closeness.stream, deprecated by gds.closeness.stream
    • gds.beta.closeness.write, deprecated by gds.closeness.write
      • The writeProperty field has been removed, it can be accessed via the configuration.
  • Deprecating gds.beta.leiden procedures
    • gds.beta.leiden.mutate[.estimate], deprecated by gds.leiden.mutate[.estimate]
    • gds.beta.leiden.stats[.estimate], deprecated by gds.leiden.stats[.estimate]
    • gds.beta.leiden.stream[.estimate], deprecated by gds.leiden.stream[.estimate]
    • gds.beta.leiden.write[.estimate], deprecated by gds.leiden.write[.estimate]
  • Deprecating gds.alpha.conductance procedures
    • gds.alpha.conductance.stream, deprecated by gds.conductance.stream
  • Deprecating gds.alpha.modularity procedures
    • gds.alpha.modularity.stream, deprecated by gds.modularity.stream
    • gds.alpha.modularity.stats, deprecated by gds.modularity.stats
  • Deprecating gds.beta.modularityOptimization procedures
    • gds.beta.modularityOptimization.stream[.estimate], deprecated by gds.modularityOptimization.stream[.estimate]
    • gds.beta.modularityOptimization.stats[.estimate], deprecated by gds.modularityOptimization.stats[.estimate]
    • gds.beta.modularityOptimization.stream[.estimate], deprecated by gds.modularityOptimization.stream[.estimate]
    • gds.beta.modularityOptimization.stats[.estimate], deprecated by gds.modularityOptimization.stats[.estimate]
  • Deprecating gds.beta.influenceMaximization.celf procedures
    • gds.beta.influenceMaximization.celf.mutate[.estimate], deprecated by gds.influenceMaximization.celf.mutate[.estimate]
    • gds.beta.influenceMaximization.celf.stats[.estimate], deprecated by gds.influenceMaximization.celf.stats[.estimate]
    • gds.beta.influenceMaximization.celfstream[.estimate], deprecated by gds.influenceMaximization.celf.stream[.estimate]
    • gds.beta.influenceMaximization.celf.write[.estimate], deprecated by gds.influenceMaximization.celf.write[.estimate]
  • Deprecating gds.alpha.knn.filtered procedures
    • gds.alpha.knn.filtered.mutate, deprecated by gds.knn.filtered.mutate
    • gds.alpha.knn.filtered.stats, deprecated by gds.knn.filtered.stats
    • gds.alpha.knn.filtered.stream, deprecated by gds.knn.filtered.stream
    • gds.alpha.knn.filtered.write, deprecated by gds.knn.filtered.write
  • Deprecating gds.alpha.nodeSimilarity.filtered procedures
    • gds.alpha.nodeSimilarity.filtered.mutate[.estimate], deprecated by gds.nodeSimilarity.filtered.mutate[.estimate]
    • gds.alpha.nodeSimilarity.filtered.stats[.estimate], deprecated by gds.nodeSimilarity.filtered.stats[.estimate]
    • gds.alpha.nodeSimilarity.filtered.stream[.estimate], deprecated by gds.nodeSimilarity.filtered.stream[.estimate]
    • gds.alpha.nodeSimilarity.filtered.write[.estimate], deprecated by gds.nodeSimilarity.filtered.write[.estimate]
  • Deprecating gds.alpha.closeness.harmonic procedures
    • gds.alpha.closeness.harmonic.stream, deprecated by gds.closeness.harmonic.stream
    • gds.alpha.closeness.harmonic.write, deprecated by gds.closeness.harmonic.write
  • Deprecating gds.beta.graph.relationships procedures
    • `gds.beta.graph.relati...
Read more

Graph Data Science 2.4.6

15 Sep 14:53
Compare
Choose a tag to compare

neo4j-graph-data-science-2.4.6

New features

  • Added compatibility with Neo4j database 5.12.0.

Bug fixes

  • Fix a bug where HITS write and mutate procedures failed to parse configuration.

2.4.5

24 Aug 15:20
Compare
Choose a tag to compare

neo4j-graph-data-science-2.4.5

Bug fixes

  • Fix a bug in the triangle-related procedures with on graphs with multiple relationship types where triangles could be computed incorrectly. The following procedures are affected:
    • gds.triangleCount.[stream|mutate|write|stats]
    • gds.localClusteringCoefficient.[stream|mutate|write|stats]
    • gds.alpha.triangles

Graph Data Science 2.4.4

17 Aug 13:01
Compare
Choose a tag to compare

Bug fixes

  • Fixed a bug where arrow processes that are automatically removed when they were aborted would not be properly cleaned up

Graph Data Science 2.4.3

27 Jul 12:30
Compare
Choose a tag to compare

Improvements

  • Added COSINE as an available similarityMetric for the gds.nodeSimilarity procedure
  • When exporting graphs to CSV or using backup and restore, a more diverse node label naming is now possible by using label mapping

Bug fixes

  • Fixed a bug where array default values would not be serialized or deserialized to csv correctly
  • Fixed an issue where Speaker-Listener LabelPropagation and other Pregel procedures wouldn’t stream or mutate on graphs that are not persisted in a Neo4j database
  • Fixed a bug in graph restore on AuraDS, which was failing after shutdown when node label name contained special characters or underscores

Graph Data Science 2.4.1

27 Jun 11:41
Compare
Choose a tag to compare

Bug fixes

  • Fix a bug in K-Core decomposition that can return invalid values if core values are not consecutive.
  • Fix a bug when using mutateProperty where using the same name as an existing node property could fail. Affected procedures include:
    • gds.alpha.knn.filtered.mutate
    • gds.alpha.nodeSimilarity.filtered.mutate
    • gds.beta.pipeline.linkPrediction.predict.mutate
    • gds.beta.steinerTree.mutate
    • gds.beta.spanningTree.mutate
    • gds.knn.mutate
    • gds.nodeSimilarity.mutate

Improvements

  • Improved error handling when negative node ids are used as input in the sourceNode, targetNode, sourceNodes, and targetNodes fields.
  • Improved performance when projecting in-memory graphs when projecting larger graphs.

Graph Data Science 2.4.0

14 Jun 15:39
Compare
Choose a tag to compare

Breaking changes

  • Pass concurrency when training a pipeline to the node property steps. Before they were executed with the default concurrency of 4 if not overridden. This affects
    • gds.beta.pipeline.linkPrediction.train
    • gds.beta.pipeline.nodeClassification.train
    • gds.alpha.pipeline.nodeClassification.train

New features

Major

  • Added Bellman-Ford algorithm
  • Added K-Core Decomposition algorithm
  • Added new Common Neighbour Aware Random Walk graph sampling algorithm
  • Add Random Forest and MLP classifier serialization support. This makes all node classification and link prediction models serializable

Minor

  • You can rename node properties when writing them back to the neo4j database using gds.nodeProperties.write by placing them inside a map in the form nodeProperty: 'renamedProperty'.

  • Added minCommunitySize|minComponentSize parameter to more procedures to allow filtering the result. (Contributed by @airtyon)

  • Added new procedure gds.alpha.drop.cypherdb to drop created in-memory databases

  • Added upperDegreeCutoff parameter to Node-Similarity and filtered Node-Similarity algorithm which allows skipping nodes if their degree is higher than the provided value.

  • Added aggregation to gds.beta.toUndirected to allow the aggregation of the new undirected relationships.

  • Added new optional parameter storeModelToDisk that automatically saves serializable models after training for licensed users. This affects gds.beta.pipeline.[linkPrediction|nodeClassification].train and gds.beta.graphSage.train.

  • Added procedure gds.graph.relationshipProperties.write that allows writing relationships with multiple properties to Neo4j.

  • Cypher Aggregation has graduated, which comes with a new name and API changes:

    • The method of projection is now generally called "Cypher projection", possible with an additional "new" or "v2" qualifier.
      • The existing 'Cypher projection' (gds.graph.project.cypher) is now called "Legacy Cypher projection"
    • The procedure name is losing the alpha qualifier and is now called gds.graph.project.
    • The old name gds.alpha.graph.project is deprecated and usages will forward to the new name while also adapting to the new API.
    • The 4th and 5th parameters nodeConfig and relationshipConfig have been merged into a single dataConfig parameter.
    • The properties configuration key in this merged dataConfig parameter has been renamed to relationshipProperties.
    • The overall projection configuration (e.g. readConcurrency) has moved from the 6th parameter to the 5th parameter.
  • Graph data retrieved via the GDS Arrow endpoint can now be partitioned via the FlightInfo endpoint.

Bug fixes

  • Fixed: Arrow server doesn't enable to project graphs with blank names anymore
  • Fixed: Arrow validates dangling relationships when creating an in-memory graph
  • Fixed: if an arrow process is aborted, creating a new process with the same name is now possible
  • Fixed a bug where gds.graph.export could fail when exporting larger graphs
  • Fixed a bug where gds.alpha.kSpanningTree returned incorrect results when called with the nodeLabels parameter.
  • Fixed a bug where gds.triangleCount would throw an ArrayIndexOutOfBoundsException when called with the nodeLabels parameter.
  • Fixed a bug where link prediction mutate results could fail when predicted probability is extremely close to zero.

Improvements

Major

  • Improve parallel runtime of several algorithms due to improvements of our degree-based partitioning. Note this is highly dataset dependent and is not be visible for all datasets. Affected algorithms are:
    • FastRP
    • HashGNN
    • Leiden
    • Approxmaxkcut
    • Conductance
    • LinkPrediction training
    • ToUndirected
  • Improved partitioning. This affects the parallel runtime of gds.alpha.hits, gds.beta.graph.project.subgraph and gds.beta.pipeline.linkPrediction.predict if sampleRate = 0

Minor

  • Improve progress tracking for gds.beta.graphSage.train. This will enable progress bars on the python client.
  • Improve error message for invalid nodeLabels and relationshipTypes for procedures supporting memory estimation.
  • Allow running gds.debug.sysInfo and gds.debug.arrow to run against the system database.
  • Improve automatic conversion of array property values during graph projection.
  • The Yens algorithm can now be run in parallel.
  • The node regression now verifies upfront that the all targetProperty values provided are valid when calling gds.alpha.pipeline.nodeRegression.train.
  • The scale properties algorithm has been promoted:
    • Added new procedures gds.scaleProperties.[stream,mutate] which replace gds.alpha.scaleProperties.[stream,mutate] that are now deprecated
      • The scalers L1Norm and L2Norm are not supported in the new procedures.
    • Added new procedures gds.scaleProperties.[stats,write] to return statistics from a scale properties computation and write scaled properties back to a database respectively
    • Procedures gds.scaleProperties.[mutate,stats,stream,write] support progress tracking with volumes. This will enable progress bars on the python client
    • Procedures gds.scaleProperties.[mutate,stats,write] return statistics from the performed scale computation
    • Added new parameter offset to the log scaler. This also affects procedures:
      • gds.pageRank
      • gds.eigenvector
      • gds.articleRank
    • Added new procedures gds.scaleProperties.[mutate|stats|stream|write].estimate for estimating the memory requirements of running the scale properties algorithm
    • Nodes with missing properties (null or NaN) are now omitted in the scale computation. Their scale value is set to NaN in the output.
  • Reduce the memory footprint of the binary embeddings saved by gds.beta.hashgnn.mutate.
  • Promote random forest classifier to beta tier. Added gds.beta.pipeline.[nodeClassification,linkPrediction].addRandomForest which replace gds.alpha.pipeline.[nodeClassification,linkPrediction].addRandomForest that are now deprecated.
  • Reduced memory allocation for the Spanning Tree algorithm.
  • A more effective rerouting algorithm is applied for the minimum Directed Steiner-Tree algorithm when the inverted index is present.
  • Improve memory usage when projecting very large graphs with very high degree nodes.
  • Additional validation for Cypher projection configuration to guide migration and avoid common mistakes.
  • The import of nodes with negative id via arrow into a database is now forbidden.
  • Graph restore now attempts to use the same id map implementation that has been used for the original graph.
  • Setting the useBadCollector option to true for the arrow database import will now actually trigger errors if the collector encountered a problem.