Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional Routing for service-map data is not working. #5280

Open
arundanegoudar opened this issue Dec 27, 2024 · 5 comments
Open

Conditional Routing for service-map data is not working. #5280

arundanegoudar opened this issue Dec 27, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@arundanegoudar
Copy link

arundanegoudar commented Dec 27, 2024

Issue:
We're implementing a conditional based routing for trace data for creating dynamic indexes(multiple sinks) based upon one of the field present in the dataset (tenant_name). The conditional routing is working for traces, but not for the service-map data. The template for the service-map doesn't contain more fields and adding of custom fields is also not possible. So the service-map data is not going to relevant dynamic indexes like the trace data, which is causing issues while stitching the data at user interface level.

Reproduce:
Have multiple sinks for trace & service-map data depending upon the tenant_name (tenant1, tenant2 & tenant3 ... tenantN) field.
The service-map data won't follow the same routing like trace data.

Expected behavior
The trace & service-map data need to follow the similar kind of dynamic routing.

Config: pipelines.yaml for trace & service-map data

################################################################################

traces-entry-pipeline:
  workers: 1
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
  buffer:
    bounded_blocking:
  route:
    - tenant1_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant1"'                            
    - tenant2_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant2"'  
    - tenant3_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant3"'
  sink:
    - stdout:
    - pipeline:
        name: "traces-svcmaps-pipeline-t1"
        routes: [tenant1_traces_svcmaps]
    - pipeline:
        name: "traces-svcmaps-pipeline-t2"
        routes: [tenant2_traces_svcmaps]
    - pipeline:                                                                                         
        name: "traces-svcmaps-pipeline-t3"                                                              
        routes: [tenant3_traces_svcmaps]   

traces-svcmaps-pipeline-t1:                                                                                  
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                                               
  buffer:                                                                                               
    bounded_blocking:                                                                               
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:
        name: "trace-raws-pipeline-t1"
    - pipeline:
        name: "service-maps-pipeline-t1"

traces-svcmaps-pipeline-t2:                                                                             
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                            
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:                                                                                         
        name: "trace-raws-pipeline-t2"                                                                  
    - pipeline:                                                                                         
        name: "service-maps-pipeline-t2"

traces-svcmaps-pipeline-t3:                                                                             
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                             
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:                                                                                         
        name: "trace-raws-pipeline-t3"                                                                  
    - pipeline:                                                                                         
        name: "service-maps-pipeline-t3"

trace-raws-pipeline-t1:
  workers: 1
  delay: "100"
  source:
    pipeline:
      name: "traces-svcmaps-pipeline-t1"
  buffer:
    bounded_blocking:
  processor:
    - otel_traces:
  
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        # cert: "/usr/share/data-prepper/opensearch.crt"
        ssl_verification_enabled: false
        insecure: true
        username: 
        password: 
        index: acn_tenant1_traces-%{yyyy.MM.dd} # Dynamically creates index based on route

service-maps-pipeline-t1:
  workers: 1
  delay: "100"
  source:
    pipeline:
      name: "traces-svcmaps-pipeline-t1"
  buffer:
    bounded_blocking:
  processor:
    - service_map:
    
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        # cert: "/usr/share/data-prepper/opensearch.crt"
        ssl_verification_enabled: false
        insecure: true
        username: 
        password: 
        index: acn_tenant1_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

trace-raws-pipeline-t2:                                                                                 
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t2"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                                   
      # buffer_size: 128 # 10240                                                                        
      # batch_size: 8 # 160                                                                             
  processor:                                                                                 
    - otel_traces:                                                                                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                         
        username:                                                                                  
        password:                                                                                 
        index: acn_tenant2_traces-%{yyyy.MM.dd} # Dynamically creates index based on route 

service-maps-pipeline-t2:                                                                               
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t2"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                                
  processor:                                                                                   
    - service_map:                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                           
        username:                                                                                  
        password:                                                                              
        index: acn_tenant2_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

trace-raws-pipeline-t3:                                                                                 
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t3"                                                                
  buffer:                                                                                               
    bounded_blocking:                                                                               
  processor:                                                                                      
    - otel_traces:                                                                       
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                    
        username:                                                                                  
        password:                                                                             
        index: acn_tenant3_traces-%{yyyy.MM.dd} # Dynamically creates index based on route 

service-maps-pipeline-t3:                                                                               
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t3"                                                                
  buffer:                                                                                               
    bounded_blocking:                                                                            
  processor:                                                                                    
    - service_map:                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                  
        username:                                                                                  
        password:                                                                          
        index: acn_tenant3_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

################################################################################

@KarstenSchnitter
Copy link
Collaborator

There was a similar question on the OpenSearch Slack ingest channel recently. I posted the following reply indicating some work-arounds:

There are several possibilities, how this might be achieved. Unfortunately, there is no easy way, that I can see:
The service map is generated by the Service Map Processor. The relations/edges are modelled by class ServiceMapRelationship. These relations do not contain any fields to indicate a tenant. Hence, with the current implementation, you cannot use Document-level Security for tenant separation. Since the Dashboards Observability plugin uses OS queries, this approach should work to present tenant specific service maps. Before implementing this change, this should be verified though. You can raise an issue with the Data Prepper Github project for this extension.
Similar to the span data, you need to route the service map data into tenant-specific indices. This requires you to manage the indices manually using a common alias. I guess, you have a similar setup for the span indices. Due to the lack of the Service Map Processor to include tenant information in the relations of the service map, you cannot easily route the data with a single OpenSearchSink. As work-around, you can set up tenant-specific pipelines in Data Prepper. For this you take the raw trace stream and filter it by tenant id. Then you can direct it to the Service Map Processor and a tenant specific OpenSearchSink. Unfortunately, this is only feasible for a rather small and relatively constant number of tenants.

Following this post, I suggest to extend Data Prepper to copy a configurable list of fields from a span to a service map edge. There is an issue with disambiguity: Different spans might contain different values for the same edge/connection. To enable the multi-tenancy routing for @arundanegoudar, Data Prepper would need to index different versions of the same edge, one for each field value. I am not sure about the implications for memory consumption of that approach. I have also not investigated whether an implementation is feasible.

@sb2k16 sb2k16 removed the untriaged label Jan 7, 2025
@dlvenable
Copy link
Member

@arundanegoudar ,

I think the problem here is that the service_map does not currently support having multiple processors. It makes use of a number of static variables which may produce different bugs.

Could you route all of your spans to a single service map and then use that one service map? You could at the same time route the spans to different OpenSearch indexes.

@arundanegoudar
Copy link
Author

Hello @dlvenable , Thanks for the kind & informative response ...

As we're willing to index data depending upon the tenant_name in the OpenSearch, the service_map data won't follow that trend. Already we're indexing the span or trace data depending upon the tenant_name itself, we want to have the same trend for the service_map data which is a necessity for our next step in the application development.

Would you please suggest ways for indexing service_map data depending upon tenant_name implementation.

@KarstenSchnitter
Copy link
Collaborator

@dlvenable: Thanks for confirming my suspicion on the service map processor. I have started creating a manual test setup to work on the multi-tenant service map routing.

@KarstenSchnitter
Copy link
Collaborator

The OpenTelemetry Collector Contrib repository contains a service graph connector to create service maps from traces. It allows the definition of additional dimensions for grouping the requests. This feature would help the OP. Maybe that connector can serve as an inspiration on how to evolve the service map processor in Data Prepper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

4 participants