Holoscan not running multiple instances of operator simultaneously #41

CameronDevine · 2025-01-10T01:17:13Z

Please describe your question

I have an application where I would like to use Holoscan for an image processing pipeline. The problem I am having is that Holoscan seems to only run a single instance of an operator at a time. This is in contrast to Intel's Thread Building Blocks which will run multiple copies of an operator at once. As much of the processing I need to do is serial, this results in the performance of the application being dominated by the runtime of the slowest operator. Is there a different scheduler I can use to enable multiple copies of an operators to be run at once, or could I implement a custom scheduler?

Please specify what Holoscan SDK version you are using

2.8.0

Please add any details about your platform or use case

I am working on building a pipeline for real time processing of mouse brain images from transmission electron microscopes.

tbirdso · 2025-01-10T01:50:07Z

Hi there @CameronDevine , thanks for using Holoscan SDK. Yes, by default Holoscan SDK uses greedy scheduling, but also provides a polling-based scheduler and an event-based scheduler for multithreaded operations. Recommend visiting the Holoscan Schedulers documentation or multithreaded scheduler example for more on multithreaded approaches.

CameronDevine · 2025-01-10T09:24:43Z

Hi @tbirdso, here's some code showing the issue I am encountering. When run with -s greedy, I am getting a rate of approximately 0.5 Hz, which is what I expect. However, when I change to either the event based (-s event), or multi-threaded (-s multi) schedulers, I am still getting a rate of approximately 0.5 Hz. However, I would expect this to be increased to about 2 Hz when run either the event base, or multi-threaded schedulers since there are 4 worker threads (and more than 4 cores on my machine) which could each run the Wait operator simultaneously.

from holoscan.core import Application, Operator
import holoscan.schedulers
import argparse
from time import sleep, monotonic


class Count(Operator):
    def setup(self, spec):
        spec.output("out")

    def start(self):
        self.i = 0

    def compute(self, input, output, context):
        self.i += 1
        output.emit(self.i, "out")


class Wait(Operator):
    def setup(self, spec):
        spec.input("in")
        spec.output("out")
        spec.param("duration")

    def compute(self, input, output, context):
        sleep(self.duration)
        output.emit(input.receive("in"), "out")


class Rate(Operator):
    def setup(self, spec):
        spec.input("in")
        spec.param("window")

    def start(self):
        self.last_call = None
        self.durations = []

    def compute(self, input, output, context):
        if self.last_call is None:
            self.last_call = monotonic()
            print(f"Received {input.receive('in')}")
        else:
            now = monotonic()
            self.durations.append(now - self.last_call)
            self.last_call = now
            self.durations = self.durations[-self.window :]
            rate = len(self.durations) / sum(self.durations)
            print(f"Received {input.receive('in')} (rate {rate} Hz)")


class Test(Application):
    def __init__(self, scheduler):
        super().__init__()

        schedulers = {
            "greedy": lambda: holoscan.schedulers.GreedyScheduler(self),
            "event": lambda: holoscan.schedulers.EventBasedScheduler(
                self, worker_thread_number=4
            ),
            "multi": lambda: holoscan.schedulers.MultiThreadScheduler(
                self, worker_thread_number=4
            ),
        }

        assert scheduler in schedulers

        self.scheduler(schedulers[scheduler]())

    def compose(self):
        count = Count(self)
        wait = Wait(self, duration=2)
        rate = Rate(self, window=8)

        self.add_flow(count, wait)
        self.add_flow(wait, rate)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--scheduler",
        "-s",
        type=str,
        default="greedy",
        help='The scheduler to use, either "greedy", "event", or "multi".',
    )

    args = parser.parse_args()

    app = Test(args.scheduler)
    app.run()


if __name__ == "__main__":
    main()

grlee77 · 2025-01-10T15:13:59Z

Hi @CameronDevine,

Thanks for providing the concrete code example. I think there is some misconception in how the scheduling works. Based on your compose method

    def compose(self):
        count = Count(self)
        wait = Wait(self, duration=2)
        rate = Rate(self, window=8)

        self.add_flow(count, wait)
        self.add_flow(wait, rate)

There are only three total operators connected in a single linear chain, so there is nothing to potentially run in parallel. In other words, the scheduler does not launch multiple copies of the application. There are only these three operators that get repeatedly executed.

For a concrete example of running multiple wait/delay operators in parallel see this one included in the examples folder:
https://github.com/nvidia-holoscan/holoscan-sdk/blob/main/examples/multithread/python/multithread.py (README)

In that case, there is a single source operator that cannects to N "delay" operators in parallel which then converge on a single sink operator. It is not require to have a common source/sink, that is just what we happened to choose for the example. You could also have the same type of example, but with a separate source/sink for each "delay" operator.

CameronDevine · 2025-01-10T20:30:53Z

Hi @grlee77,

Thank you (and @tbirdso) for confirming what I had found experimentally. Unfortunately, this means that Holoscan will not work for my application as I have a pipeline that is primarily serial and the total computation time is dominated by a small number of operators.

whom3 · 2025-01-15T19:15:04Z

@CameronDevine how many of these operators will you have connected serially in your use case?

In this simple xample, there is only one operator that takes a lot of time and the runtime in the other operators are trivial so it won't help even if you have multiple workers.

If we modify your example here to include multiple operators that are connected serially, we could see a difference between the reported rate. Here I have set the worker_thread_number count to 2.

def compose(self):
     count = Count(self)
     wait1 = Wait(self, duration=2)
     wait2 = Wait(self, duration=2)
     wait3 = Wait(self, duration=2)
     wait4 = Wait(self, duration=2)
     rate = Rate(self, window=8)

     self.add_flow(count, wait1)
     self.add_flow(wait1, wait2)
     self.add_flow(wait2, wait3)
     self.add_flow(wait3, wait4)
     self.add_flow(wait4, rate)

With greedy scheduler:

python3 example.py 
Received 1
Received 2 (rate 0.12488370303508864 Hz)
Received 3 (rate 0.1248758491955746 Hz)
Received 4 (rate 0.12486922475752466 Hz)
Received 5 (rate 0.12486592252718796 Hz)

With event based scheduler:

python3 example.py -s event
Received 1
Received 2 (rate 0.24969815600480916 Hz)
Received 3 (rate 0.24969767448312177 Hz)
Received 4 (rate 0.24969736471333656 Hz)
Received 5 (rate 0.24969612976670194 Hz)

CameronDevine · 2025-01-15T21:03:36Z

Hi @whom3, I expect I will have about 10 operators, which is still lower than the number of cores of the systems I will be running the code on.

whom3 · 2025-01-15T22:25:24Z

@CameronDevine You could try the following which modifies your original example such that

we now have 10 wait operators connected serially, each waits 1 sec
you can provide the number of workers via -n to specify the # of workers
updated the input port of the Wait operator to have a queue size of 2, this allows upstream operator to do work while the downstream operator is still doing work. if we use the default queue size of 1, we still get some benefit to use multiple workers but not as much

from holoscan.core import Application, Operator, IOSpec
import holoscan.schedulers
import argparse
from time import sleep, monotonic

class Count(Operator):
    def setup(self, spec):
        spec.output("out")

    def start(self):
        self.i = 0

    def compute(self, input, output, context):
        self.i += 1
        output.emit(self.i, "out")


class Wait(Operator):
    def setup(self, spec):
        spec.input("in").connector(IOSpec.ConnectorType.DOUBLE_BUFFER, capacity=2)
        spec.output("out")
        spec.param("duration")

    def compute(self, input, output, context):
        sleep(self.duration)
        output.emit(input.receive("in"), "out")


class Rate(Operator):
    def setup(self, spec):
        spec.input("in")
        spec.param("window")

    def start(self):
        self.last_call = None
        self.durations = []

    def compute(self, input, output, context):
        if self.last_call is None:
            self.last_call = monotonic()
            print(f"Received {input.receive('in')}")
        else:
            now = monotonic()
            self.durations.append(now - self.last_call)
            self.last_call = now
            self.durations = self.durations[-self.window :]
            rate = len(self.durations) / sum(self.durations)
            print(f"Received {input.receive('in')} (rate {rate} Hz)")


class Test(Application):
    def __init__(self, scheduler, num_workers):
        super().__init__()

        schedulers = {
            "greedy": lambda: holoscan.schedulers.GreedyScheduler(self),
            "event": lambda: holoscan.schedulers.EventBasedScheduler(
                self, worker_thread_number=num_workers
            ),
            "multi": lambda: holoscan.schedulers.MultiThreadScheduler(
                self, worker_thread_number=num_workers
            ),
        }

        assert scheduler in schedulers

        self.scheduler(schedulers[scheduler]())

    def compose(self):
        count = Count(self)
        wait1 = Wait(self, duration=1)
        wait2 = Wait(self, duration=1)
        wait3 = Wait(self, duration=1)
        wait4 = Wait(self, duration=1)
        wait5 = Wait(self, duration=1)
        wait6 = Wait(self, duration=1)
        wait7 = Wait(self, duration=1)
        wait8 = Wait(self, duration=1)
        wait9 = Wait(self, duration=1)
        wait10 = Wait(self, duration=1)
        rate = Rate(self, window=8)

        self.add_flow(count, wait1)
        self.add_flow(wait1, wait2)
        self.add_flow(wait2, wait3)
        self.add_flow(wait3, wait4)
        self.add_flow(wait4, wait5)
        self.add_flow(wait5, wait6)
        self.add_flow(wait6, wait7)
        self.add_flow(wait7, wait8)
        self.add_flow(wait8, wait9)
        self.add_flow(wait9, wait10)
        self.add_flow(wait10, rate)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--scheduler",
        "-s",
        type=str,
        default="greedy",
        help='The scheduler to use, either "greedy", "event", or "multi".',
    )
    parser.add_argument(
        "--num_workers",
        "-n",
        type=int,
        default=4,
        help='Number of workers for multi-thread and event based scheduler',
    )


    args = parser.parse_args()

    app = Test(args.scheduler, args.num_workers)
    app.run()


if __name__ == "__main__":
    main()

For the above, I am seeing much better scaling when increasing the # of workers.
greedy: ~0.1 Hz
event with 4 workers: ~0.3 Hz
event with 5 workers: ~0.43 Hz
event with 8 workers: ~0.85 Hz
event with 10 workers: ~0.996 Hz

CameronDevine added the help wanted Extra attention is needed label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Holoscan not running multiple instances of operator simultaneously #41

Holoscan not running multiple instances of operator simultaneously #41

CameronDevine commented Jan 10, 2025

tbirdso commented Jan 10, 2025

CameronDevine commented Jan 10, 2025

grlee77 commented Jan 10, 2025 •

edited

Loading

CameronDevine commented Jan 10, 2025

whom3 commented Jan 15, 2025

CameronDevine commented Jan 15, 2025

whom3 commented Jan 15, 2025 •

edited

Loading

Holoscan not running multiple instances of operator simultaneously #41

Holoscan not running multiple instances of operator simultaneously #41

Comments

CameronDevine commented Jan 10, 2025

Please describe your question

Please specify what Holoscan SDK version you are using

Please add any details about your platform or use case

tbirdso commented Jan 10, 2025

CameronDevine commented Jan 10, 2025

grlee77 commented Jan 10, 2025 • edited Loading

CameronDevine commented Jan 10, 2025

whom3 commented Jan 15, 2025

CameronDevine commented Jan 15, 2025

whom3 commented Jan 15, 2025 • edited Loading

grlee77 commented Jan 10, 2025 •

edited

Loading

whom3 commented Jan 15, 2025 •

edited

Loading