-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store node attributes as floats or strings #87
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall is good. Please see comments.
I am a little concerned/interested in the performance implications of needing to check the input attributes for str/float. It seems there is an opportunity to improve the speed by making this an attribute map that points to an array the size of the graph, and when using id's it simply indexes. What are your thoughts?
if (score_is_floating_pt) { | ||
// Ok - data type matched. | ||
node_attr_value_map_it->second = score; | ||
if (node_float_attr_map_it == node_float_attr_map.end()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a unittest for ensuring when multiple node attribute maps exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the AddNodeAttribute unittest checks that a float attribute is turned into a string attribute if a string value is added to the float attribute, but there aren't any other checks for duplicates currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good to have a check relating to this mapping iterator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there should be a test to check that no duplicates are created after any node attribute call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Retrieve an iterator to the [node attribute : NodeFloatAttributeValueMap]
// that corresponds with attribute
is my focus. We want to check that adding multiple different attributes with names works. And then being able to get these attributes and create edge costs also works. I.E. testing that the mapping iterator is able to find the attribute name when there are multiple different ones.
""" | ||
# Just send it to C++ | ||
spatial_structures_native_functions.c_add_node_attributes( | ||
self.graph_ptr, attribute, ids, scores | ||
) | ||
|
||
def get_node_attributes(self, attribute: str, ids : List[int] | None = None) -> List[str]: | ||
def get_node_attributes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These seem to be python APIs for the get nodes, but the PR is also for adding node attributes with floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to update the function signatures for these functions, but add_node_attributes
does allow for adding node attributes with floats since the native function handles the conversion to C type, and get_node_attributes
can return the attributes as floats.
Would this mean replacing each NodeAttributeValueMap or NodeFloatAttributeValueMap with a string or float array respectively? I do think that would be much faster and easier, but I'm unsure how we'd handle changing an ID. Would you have to keep expanding the length of the array to match the maxID? |
I think the edge and node graph already needs to get rebuilt. There is a 'compress to csr' type function that might be related to this. If we can get through this PR, maybe it makes more sense to make a new issue specifically for optimizing the backend rather than trying to change all this now. |
Issue #74
node_float_attr_map
which stores mappings between attribute names andNodeFloatAttributeValueMap
s. It only stores mappings where every value in the mapping is a float (a 'float attribute'), while the originalnode_attr_map
should only storeNodeAttributeValueMap
s for attributes where at least one string value has been added (a 'string attribute').AddNodeAttributeFloat
(+ plural) which take in float/float vectors, andGetNodeAttributesFloat
(+ by ID) which return float vectors.AddNodeAttributesFloat
,GetNodeAttributesFloat
,GetNodeAttributesByIDFloat
which call the C++ functions and store result in a float array.c_get_node_attributes
andc_add_node_attributes
call the new CInterface functions if the attribute is a float attribute or if the input scores are floats.IsFloatAttribute
created for use byc_add_node_attributes
(andAttrToCost
to avoid trying to StringToFloat a float array)New and old NodeAttribute tests pass: