Skip to content
This repository has been archived by the owner on Nov 11, 2022. It is now read-only.

Version 1.6.0

Compare
Choose a tag to compare
@dhalperi dhalperi released this 12 Jun 19:11
  • Added InProcessPipelineRunner, an improvement over the DirectPipelineRunner that better implements the Dataflow model. InProcessPipelineRunner runs on a user's local machine and supports multithreaded execution, unbounded PCollections, and triggers for speculative and late outputs.
  • Added display data, which allows annotating user functions (DoFn, CombineFn, and WindowFn), Sources, and Sinks with static metadata to be displayed in the Dataflow Monitoring Interface. Display data has been implemented for core components and is automatically applied to all PipelineOptions.
  • Added the ability to compose multiple CombineFns into a single CombineFn using CombineFns.compose or CombineFns.composeKeyed.
  • Added the methods getSplitPointsConsumed and getSplitPointsRemaining to the BoundedReader API to improve Dataflow's ability to automatically scale a job reading from these sources. Default implementations of these functions have been provided, but reader implementers should override them to provide better information when available.
  • Improved performance of side inputs when using workers with many cores.
  • Improved efficiency when using CombineFnWithContext.
  • Fixed several issues related to stability in the streaming mode.