grycap · gmolto · Jan 21, 2025 · Jan 20, 2025
diff --git a/docs/invoking-async.md b/docs/invoking-async.md
@@ -9,4 +9,6 @@ using asynchronous requests for every file uploaded to the bucket, which generat
 
 These jobs will be queued up in the Kubernetes scheduler and will be processed whenever there are resources available. If OSCAR cluster has been deployed as an elastic Kubernetes cluster (see [Deployment with IM](https://docs.oscar.grycap.net/deploy-im-dashboard/)), then new Virtual Machines will be provisioned (up to the maximum number of nodes defined) in the underlying Cloud platform and seamlessly integrated in the Kubernetes clusters to proceed with the execution of jobs. These nodes will be terminated as the worload is reduced. Notice that the output files can be stores in MinIO or in any other storage back-end supported by the [FaaS supervisor](oscar-service.md#faas-supervisor). 
 
+ Note that if your OSCAR service runs an AI model for inference, each job will load the AI model weights before performing the inference. You can mitigate this penalty by adjusting the inference code to process a compressed file with several images.
+
 If you want to process a large number of data files, consider using [OSCAR Batch](https://github.com/grycap/oscar-batch), a tool designed to perform batch-based processing in OSCAR clusters. It includes a coordinator tool where the user provides a MinIO bucket containing files for processing. This service calculates the optimal number of parallel service invocations that can be accommodated within the cluster, according to its current status, and distributes the image processing workload accordingly among the service invocations. This is mainly intended to process large amounts of files, for example, historical data.
diff --git a/docs/invoking.md b/docs/invoking.md
@@ -9,9 +9,11 @@ OSCAR services can be executed:
 
 After reading the different service execution types, take into account the following considerations to better decide the most appropriate execution type for your use case:
 
-* **Scalability**: Asynchronous invocations provide the best throughput when dealing with multiple concurrent data processing requests, since these are processed by independent jobs which are managed by the Kubernetes scheduler. A two-level elasticity approach is used (increase in the number of pods and increase in the number of Virtual Machines, if the OSCAR cluster was configured to be elastic). This is the recommended approach when each processing request exceeds the order of tens of seconds. 
+* **Scalability**: Asynchronous invocations provide the best throughput when dealing with multiple concurrent data processing requests, since these are processed by independent jobs which are managed by the Kubernetes scheduler. A two-level elasticity approach is used (increase in the number of pods and increase in the number of Virtual Machines, if the OSCAR cluster was configured to be elastic). This is the recommended approach when each processing request exceeds the order of tens of seconds.
 
 * **Reduced Latency** Synchronous invocations are oriented for short-lived (< tens of seconds) bursty requests. A certain number of containers can be configured to be kept alive to avoid the performance penalty of spawning new ones while providing an upper bound limit (see [`min_scale` and `max_scale` in the FDL](fdl.md#synchronoussettings), at the expense of always consuming resources in the OSCAR cluster. If the processing file is in the order of several MBytes it may not fit in the payload of the HTTP request.
 
 * **Easy Access** For services that provide their own user interface or their own API, exposed services provide the ability to execute them in OSCAR and benefit for an auto-scaled configuration in case they are [stateless](https://en.wikipedia.org/wiki/Service_statelessness_principle). This way, users can directly access the service using its well-known interfaces by the users. 
 
+If you want to process a large number of files, please consider using [OSCAR-batch](https://github.com/grycap/oscar-batch).
+
Original file line number	Diff line number	Diff line change
Expand Up		@@ -9,4 +9,6 @@ using asynchronous requests for every file uploaded to the bucket, which generat

		These jobs will be queued up in the Kubernetes scheduler and will be processed whenever there are resources available. If OSCAR cluster has been deployed as an elastic Kubernetes cluster (see [Deployment with IM](https://docs.oscar.grycap.net/deploy-im-dashboard/)), then new Virtual Machines will be provisioned (up to the maximum number of nodes defined) in the underlying Cloud platform and seamlessly integrated in the Kubernetes clusters to proceed with the execution of jobs. These nodes will be terminated as the worload is reduced. Notice that the output files can be stores in MinIO or in any other storage back-end supported by the [FaaS supervisor](oscar-service.md#faas-supervisor).

		Note that if your OSCAR service runs an AI model for inference, each job will load the AI model weights before performing the inference. You can mitigate this penalty by adjusting the inference code to process a compressed file with several images.

		If you want to process a large number of data files, consider using [OSCAR Batch](https://github.com/grycap/oscar-batch), a tool designed to perform batch-based processing in OSCAR clusters. It includes a coordinator tool where the user provides a MinIO bucket containing files for processing. This service calculates the optimal number of parallel service invocations that can be accommodated within the cluster, according to its current status, and distributes the image processing workload accordingly among the service invocations. This is mainly intended to process large amounts of files, for example, historical data.