Conditional Routing for service-map data is not working. #5280

arundanegoudar · 2024-12-27T16:25:33Z

Issue:
We're implementing a conditional based routing for trace data for creating dynamic indexes(multiple sinks) based upon one of the field present in the dataset (tenant_name). The conditional routing is working for traces, but not for the service-map data. The template for the service-map doesn't contain more fields and adding of custom fields is also not possible. So the service-map data is not going to relevant dynamic indexes like the trace data, which is causing issues while stitching the data at user interface level.

Reproduce:
Have multiple sinks for trace & service-map data depending upon the tenant_name (tenant1, tenant2 & tenant3 ... tenantN) field.
The service-map data won't follow the same routing like trace data.

Expected behavior
The trace & service-map data need to follow the similar kind of dynamic routing.

Config: pipelines.yaml for trace & service-map data

################################################################################

traces-entry-pipeline:
  workers: 1
  delay: "100"
  source:
    otel_trace_source:
      ssl: false
  buffer:
    bounded_blocking:
  route:
    - tenant1_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant1"'                            
    - tenant2_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant2"'  
    - tenant3_traces_svcmaps: '/attributes/span.attributes.tenant_name == "tenant3"'
  sink:
    - stdout:
    - pipeline:
        name: "traces-svcmaps-pipeline-t1"
        routes: [tenant1_traces_svcmaps]
    - pipeline:
        name: "traces-svcmaps-pipeline-t2"
        routes: [tenant2_traces_svcmaps]
    - pipeline:                                                                                         
        name: "traces-svcmaps-pipeline-t3"                                                              
        routes: [tenant3_traces_svcmaps]   

traces-svcmaps-pipeline-t1:                                                                                  
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                                               
  buffer:                                                                                               
    bounded_blocking:                                                                               
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:
        name: "trace-raws-pipeline-t1"
    - pipeline:
        name: "service-maps-pipeline-t1"

traces-svcmaps-pipeline-t2:                                                                             
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                            
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:                                                                                         
        name: "trace-raws-pipeline-t2"                                                                  
    - pipeline:                                                                                         
        name: "service-maps-pipeline-t2"

traces-svcmaps-pipeline-t3:                                                                             
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-entry-pipeline"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                             
  sink:                                                                                                 
    - stdout:                                                                                           
    - pipeline:                                                                                         
        name: "trace-raws-pipeline-t3"                                                                  
    - pipeline:                                                                                         
        name: "service-maps-pipeline-t3"

trace-raws-pipeline-t1:
  workers: 1
  delay: "100"
  source:
    pipeline:
      name: "traces-svcmaps-pipeline-t1"
  buffer:
    bounded_blocking:
  processor:
    - otel_traces:
  
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        # cert: "/usr/share/data-prepper/opensearch.crt"
        ssl_verification_enabled: false
        insecure: true
        username: 
        password: 
        index: acn_tenant1_traces-%{yyyy.MM.dd} # Dynamically creates index based on route

service-maps-pipeline-t1:
  workers: 1
  delay: "100"
  source:
    pipeline:
      name: "traces-svcmaps-pipeline-t1"
  buffer:
    bounded_blocking:
  processor:
    - service_map:
    
  sink:
    - opensearch:
        hosts: ["http://opensearch:9200"]
        # cert: "/usr/share/data-prepper/opensearch.crt"
        ssl_verification_enabled: false
        insecure: true
        username: 
        password: 
        index: acn_tenant1_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

trace-raws-pipeline-t2:                                                                                 
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t2"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                                   
      # buffer_size: 128 # 10240                                                                        
      # batch_size: 8 # 160                                                                             
  processor:                                                                                 
    - otel_traces:                                                                                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                         
        username:                                                                                  
        password:                                                                                 
        index: acn_tenant2_traces-%{yyyy.MM.dd} # Dynamically creates index based on route 

service-maps-pipeline-t2:                                                                               
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t2"                                                                     
  buffer:                                                                                               
    bounded_blocking:                                                                                
  processor:                                                                                   
    - service_map:                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                           
        username:                                                                                  
        password:                                                                              
        index: acn_tenant2_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

trace-raws-pipeline-t3:                                                                                 
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t3"                                                                
  buffer:                                                                                               
    bounded_blocking:                                                                               
  processor:                                                                                      
    - otel_traces:                                                                       
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                    
        username:                                                                                  
        password:                                                                             
        index: acn_tenant3_traces-%{yyyy.MM.dd} # Dynamically creates index based on route 

service-maps-pipeline-t3:                                                                               
  workers: 1                                                                                            
  delay: "100"                                                                                          
  source:                                                                                               
    pipeline:                                                                                           
      name: "traces-svcmaps-pipeline-t3"                                                                
  buffer:                                                                                               
    bounded_blocking:                                                                            
  processor:                                                                                    
    - service_map:                                                                                      
                                                                                                        
  sink:                                                                                                 
    - opensearch:                                                                                       
        hosts: ["http://opensearch:9200"]                                                               
        # cert: "/usr/share/data-prepper/opensearch.crt"                                                
        ssl_verification_enabled: false                                                                 
        insecure: true                                                                  
        username:                                                                                  
        password:                                                                          
        index: acn_tenant3_servicemap-%{yyyy.MM.dd} # Dynamically creates index based on route

################################################################################

The text was updated successfully, but these errors were encountered:

KarstenSchnitter · 2025-01-02T21:38:14Z

There was a similar question on the OpenSearch Slack ingest channel recently. I posted the following reply indicating some work-arounds:

There are several possibilities, how this might be achieved. Unfortunately, there is no easy way, that I can see:
The service map is generated by the Service Map Processor. The relations/edges are modelled by class ServiceMapRelationship. These relations do not contain any fields to indicate a tenant. Hence, with the current implementation, you cannot use Document-level Security for tenant separation. Since the Dashboards Observability plugin uses OS queries, this approach should work to present tenant specific service maps. Before implementing this change, this should be verified though. You can raise an issue with the Data Prepper Github project for this extension.
Similar to the span data, you need to route the service map data into tenant-specific indices. This requires you to manage the indices manually using a common alias. I guess, you have a similar setup for the span indices. Due to the lack of the Service Map Processor to include tenant information in the relations of the service map, you cannot easily route the data with a single OpenSearchSink. As work-around, you can set up tenant-specific pipelines in Data Prepper. For this you take the raw trace stream and filter it by tenant id. Then you can direct it to the Service Map Processor and a tenant specific OpenSearchSink. Unfortunately, this is only feasible for a rather small and relatively constant number of tenants.

Following this post, I suggest to extend Data Prepper to copy a configurable list of fields from a span to a service map edge. There is an issue with disambiguity: Different spans might contain different values for the same edge/connection. To enable the multi-tenancy routing for @arundanegoudar, Data Prepper would need to index different versions of the same edge, one for each field value. I am not sure about the implications for memory consumption of that approach. I have also not investigated whether an implementation is feasible.

dlvenable · 2025-01-07T21:02:52Z

@arundanegoudar ,

I think the problem here is that the service_map does not currently support having multiple processors. It makes use of a number of static variables which may produce different bugs.

Could you route all of your spans to a single service map and then use that one service map? You could at the same time route the spans to different OpenSearch indexes.

arundanegoudar · 2025-01-08T07:48:34Z

Hello @dlvenable , Thanks for the kind & informative response ...

As we're willing to index data depending upon the tenant_name in the OpenSearch, the service_map data won't follow that trend. Already we're indexing the span or trace data depending upon the tenant_name itself, we want to have the same trend for the service_map data which is a necessity for our next step in the application development.

Would you please suggest ways for indexing service_map data depending upon tenant_name implementation.

KarstenSchnitter · 2025-01-10T21:14:30Z

@dlvenable: Thanks for confirming my suspicion on the service map processor. I have started creating a manual test setup to work on the multi-tenant service map routing.

KarstenSchnitter · 2025-01-10T21:24:41Z

The OpenTelemetry Collector Contrib repository contains a service graph connector to create service maps from traces. It allows the definition of additional dimensions for grouping the requests. This feature would help the OP. Maybe that connector can serve as an inspiration on how to evolve the service map processor in Data Prepper.

arundanegoudar added bug Something isn't working untriaged labels Dec 27, 2024

github-project-automation bot moved this to Unplanned in Data Prepper Tracking Board Dec 27, 2024

github-project-automation bot added this to Data Prepper Tracking Board Dec 27, 2024

sb2k16 removed the untriaged label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditional Routing for service-map data is not working. #5280

Conditional Routing for service-map data is not working. #5280

arundanegoudar commented Dec 27, 2024 •

edited by dlvenable

Loading

KarstenSchnitter commented Jan 2, 2025

dlvenable commented Jan 7, 2025

arundanegoudar commented Jan 8, 2025

KarstenSchnitter commented Jan 10, 2025

KarstenSchnitter commented Jan 10, 2025

Conditional Routing for service-map data is not working. #5280

Conditional Routing for service-map data is not working. #5280

Comments

arundanegoudar commented Dec 27, 2024 • edited by dlvenable Loading

KarstenSchnitter commented Jan 2, 2025

dlvenable commented Jan 7, 2025

arundanegoudar commented Jan 8, 2025

KarstenSchnitter commented Jan 10, 2025

KarstenSchnitter commented Jan 10, 2025

arundanegoudar commented Dec 27, 2024 •

edited by dlvenable

Loading