Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cycle detection removes same edge multiple times #8657

Open
1 task done
Willenbrink opened this issue Dec 18, 2024 · 5 comments · May be fixed by #8677, #8679 or #8707
Open
1 task done

Cycle detection removes same edge multiple times #8657

Willenbrink opened this issue Dec 18, 2024 · 5 comments · May be fixed by #8677, #8679 or #8707
Assignees
Labels
P1 High priority, add to the next sprint

Comments

@Willenbrink
Copy link

Describe the bug
_break_supported_cycles_in_graph might remove same edge multiple times.

Error message
'NoneType' object has no attribute 'keys' in https://github.com/deepset-ai/haystack/blob/ea3602643aa52c27f3bea7bf5bc90b97f568dcdc/haystack/core/pipeline/base.py#L1218C1-L1218C94

Expected behavior
If one edge is part of two cycles, I expect the algorithm to only break the edge once. After checking the second cycle, it shouldn't attempt to break the edge.

Additional context
I believe there are two other bugs occurring in my project, possibly related:

  • A cycle is deleted and not executed at all. The pipeline terminates too early as a result. I haven't been able to determine whether this really is a bug or what the cause is.
  • pipeline.draw() does not show user-provided value to variadic input #8656 This maybe messes with the topological sort of the graph though I'm not sure if that would affect the cycle detection.

As the cycle handling seems quite complicated, I'm wondering why Haystack even does that. Why is the pipeline not based on a queue of components that have all their inputs, executing them one at a time and adding their connected components once they've got their inputs. Something like:

for component in ready_comps:
  component.run()
  for connected_comp in component.connected_components:
    if connected_comp.has_all_inputs():
      ready_comps.append(connected_comp)

To Reproduce
Probably:

  • Have one edge as part of two cycles.
  • Run the pipeline
  • Observe the edge being removed first
  • Observe .get_edge_data failing for the second cycle as the edge no longer exists.

I can reliably reproduce the issue in my project.

FAQ Check

System:

  • OS: Linux
  • Haystack version (commit or version number): 2.7.0
@julian-risch julian-risch added the P1 High priority, add to the next sprint label Dec 20, 2024
etirelli added a commit to etirelli/haystack that referenced this issue Dec 29, 2024
…ops, due to the cycle detection removing the same edge multiple times (ref deepset-ai#8657)
@etirelli
Copy link

I submitted a proposal for the fix and a unit test to reproduce the problem: PR: #8677 . See my comments in there for details.

etirelli added a commit to etirelli/haystack that referenced this issue Dec 29, 2024
…ops, due to the cycle detection removing the same edge multiple times (ref deepset-ai#8657)
@Willenbrink Willenbrink linked a pull request Jan 2, 2025 that will close this issue
@mathislucka
Copy link
Member

Thanks @Willenbrink and @etirelli for reporting the issue and working on a fix!

We noticed other potential issues in the way we handle pipelines with cycles and we would like to consider different use cases in-depth before merging a fix.

We'll proceed with collecting more (e2e) test cases to have a comprehensive test suite for realistic use cases and then work on a fix that runs pipelines with cycles robustly and deterministically.

@etirelli your test case already helps with building the test suite. We'd appreciate any other test cases for complex cyclic pipelines that you might have come across when working on your use cases.

@Willenbrink you mentioned that you have a few very complex pipelines that you used to test your PR. Can you share these pipelines (even just conceptually) so that we can add them to our test suite for realistic use cases?

@Willenbrink
Copy link
Author

Here is an example of a pipeline with some elements redacted. I hope it is helpful anyway. In short: I have some none-llm steps, then, if an error occurs, I parse it into a json via an llm and use this for another llm call. If no error occurs, I expect the pipeline to terminate early (see the star)

pipeline_edited

@mathislucka mathislucka linked a pull request Jan 11, 2025 that will close this issue
@mathislucka
Copy link
Member

@Willenbrink, could you check out this branch and test it with your use cases: #8707?

We ran into a few more issues and this should be a comprehensive fix. We're currently testing simple and complex use cases in-depth to make sure that everything works as expected.

Your feedback would be much appreciated.

@Willenbrink
Copy link
Author

I'm sorry but I moved on to another project. That said, the PR looks very promising!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint
Projects
None yet
5 participants