Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store node attributes as floats or strings #87

Open
wants to merge 6 commits into
base: dev
Choose a base branch
from

Conversation

rkpamegan
Copy link
Contributor

Issue #74

  • C++ Graph objects now have a second attribute map called node_float_attr_map which stores mappings between attribute names and NodeFloatAttributeValueMaps. It only stores mappings where every value in the mapping is a float (a 'float attribute'), while the original node_attr_map should only store NodeAttributeValueMaps for attributes where at least one string value has been added (a 'string attribute').
  • Adding floats to float attributes or strings to string attributes preserves the attribute type, but adding a float to a string attribute converts the singular float value to a string and adds it, and adding a string to a float attribute irrevocably converts the entire float attribute into a string attribute.
  • Additional C++ functions AddNodeAttributeFloat (+ plural) which take in float/float vectors, and GetNodeAttributesFloat (+ by ID) which return float vectors.
  • CInterface functions AddNodeAttributesFloat, GetNodeAttributesFloat, GetNodeAttributesByIDFloat which call the C++ functions and store result in a float array.
  • Python/C c_get_node_attributes and c_add_node_attributes call the new CInterface functions if the attribute is a float attribute or if the input scores are floats.
  • Additional CInterface/C++ function IsFloatAttribute created for use by c_add_node_attributes (and AttrToCost to avoid trying to StringToFloat a float array)

New and old NodeAttribute tests pass:
image

Copy link
Owner

@cadop cadop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is good. Please see comments.

I am a little concerned/interested in the performance implications of needing to check the input attributes for str/float. It seems there is an opportunity to improve the speed by making this an attribute map that points to an array the size of the graph, and when using id's it simply indexes. What are your thoughts?

src/Cpp/spatialstructures/src/graph.cpp Outdated Show resolved Hide resolved
src/Cpp/spatialstructures/src/graph.cpp Outdated Show resolved Hide resolved
if (score_is_floating_pt) {
// Ok - data type matched.
node_attr_value_map_it->second = score;
if (node_float_attr_map_it == node_float_attr_map.end())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a unittest for ensuring when multiple node attribute maps exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the AddNodeAttribute unittest checks that a float attribute is turned into a string attribute if a string value is added to the float attribute, but there aren't any other checks for duplicates currently.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to have a check relating to this mapping iterator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there should be a test to check that no duplicates are created after any node attribute call?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

		// Retrieve an iterator to the [node attribute : NodeFloatAttributeValueMap]
		// that corresponds with attribute

is my focus. We want to check that adding multiple different attributes with names works. And then being able to get these attributes and create edge costs also works. I.E. testing that the mapping iterator is able to find the attribute name when there are multiple different ones.

src/Cpp/spatialstructures/src/graph.cpp Outdated Show resolved Hide resolved
"""
# Just send it to C++
spatial_structures_native_functions.c_add_node_attributes(
self.graph_ptr, attribute, ids, scores
)

def get_node_attributes(self, attribute: str, ids : List[int] | None = None) -> List[str]:
def get_node_attributes(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem to be python APIs for the get nodes, but the PR is also for adding node attributes with floats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to update the function signatures for these functions, but add_node_attributes does allow for adding node attributes with floats since the native function handles the conversion to C type, and get_node_attributes can return the attributes as floats.

@cadop cadop linked an issue Jan 10, 2025 that may be closed by this pull request
@rkpamegan
Copy link
Contributor Author

I am a little concerned/interested in the performance implications of needing to check the input attributes for str/float. It seems there is an opportunity to improve the speed by making this an attribute map that points to an array the size of the graph, and when using id's it simply indexes. What are your thoughts?

Would this mean replacing each NodeAttributeValueMap or NodeFloatAttributeValueMap with a string or float array respectively? I do think that would be much faster and easier, but I'm unsure how we'd handle changing an ID. Would you have to keep expanding the length of the array to match the maxID?

@cadop
Copy link
Owner

cadop commented Jan 10, 2025

I think the edge and node graph already needs to get rebuilt. There is a 'compress to csr' type function that might be related to this.

If we can get through this PR, maybe it makes more sense to make a new issue specifically for optimizing the backend rather than trying to change all this now.

@rkpamegan
Copy link
Contributor Author

image

Additional test AttributeValueMapsCheck which tests adding and retrieving multiple attributes of each type. Python function headers and naming conventions were also fixed.

@rkpamegan rkpamegan requested a review from cadop January 10, 2025 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Node attributes are only stored as strings
2 participants