Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add failfast mode #2308

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 14 additions & 1 deletion src/build.cc
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,12 @@
#include <stdio.h>
#include <stdlib.h>
#include <functional>
#ifdef _WIN32
#include <windows.h>
#else
#include <signal.h>
#include <unistd.h>
#endif

#if defined(__SVR4) && defined(__sun)
#include <sys/termios.h>
Expand Down Expand Up @@ -470,7 +476,7 @@ vector<Edge*> RealCommandRunner::GetActiveEdges() {
}

void RealCommandRunner::Abort() {
subprocs_.Clear();
subprocs_.Abort();
}

bool RealCommandRunner::CanRunMore() const {
Expand Down Expand Up @@ -675,6 +681,13 @@ bool Builder::Build(string* err) {
}

if (!result.success()) {
if (config_.failfast_mode){
Cleanup();
status_->BuildFinished();
*err = "at least one subcommand failed and failfast mode activated";
return false;
}

if (failures_allowed)
failures_allowed--;
}
Expand Down
6 changes: 5 additions & 1 deletion src/build.h
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,8 @@ struct CommandRunner {
/// Options (e.g. verbosity, parallelism) passed to a build.
struct BuildConfig {
BuildConfig() : verbosity(NORMAL), dry_run(false), parallelism(1),
failures_allowed(1), max_load_average(-0.0f) {}
failures_allowed(1), max_load_average(-0.0f),
failfast_mode(false) {}

enum Verbosity {
QUIET, // No output -- used when testing.
Expand All @@ -171,6 +172,9 @@ struct BuildConfig {
/// The maximum load average we must not exceed. A negative value
/// means that we do not have any limit.
double max_load_average;
/// In case of first subcommand failure, abort (send SIGINT) to all
/// other running subcommands.
bool failfast_mode;
DepfileParserOptions depfile_parser_options;
};

Expand Down
7 changes: 6 additions & 1 deletion src/ninja.cc
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ void Usage(const BuildConfig& config) {
"\n"
" -j N run N jobs in parallel (0 means infinity) [default=%d on this system]\n"
" -k N keep going until N jobs fail (0 means infinity) [default=1]\n"
" -K abort all jobs after first job fails; implies -k 1\n"
" -l N do not start new jobs if the load average is greater than N\n"
" -n dry run (don't run commands but act like they succeeded)\n"
"\n"
Expand Down Expand Up @@ -1429,7 +1430,7 @@ int ReadFlags(int* argc, char*** argv,

int opt;
while (!options->tool &&
(opt = getopt_long(*argc, *argv, "d:f:j:k:l:nt:vw:C:h", kLongOptions,
(opt = getopt_long(*argc, *argv, "d:f:j:k:Kl:nt:vw:C:h", kLongOptions,
NULL)) != -1) {
switch (opt) {
case 'd':
Expand Down Expand Up @@ -1463,6 +1464,10 @@ int ReadFlags(int* argc, char*** argv,
config->failures_allowed = value > 0 ? value : INT_MAX;
break;
}
case 'K': {
config->failfast_mode = true;
break;
}
case 'l': {
char* end;
double value = strtod(optarg, &end);
Expand Down
5 changes: 5 additions & 0 deletions src/subprocess-posix.cc
Original file line number Diff line number Diff line change
Expand Up @@ -366,3 +366,8 @@ void SubprocessSet::Clear() {
delete *i;
running_.clear();
}

void SubprocessSet::Abort() {
SetInterruptedFlag(SIGINT);
Clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patch. Can you ensure that SIGINT is also sent to the console subprocess, if there is one?
Otherwise if the failure comes from a non-console command, the console subprocess will continue to run and block Ninja.

Let me clarify:

Ninja launches subprocesses in their own process group, with the exception of commands in the "console" pool. These are launched in the same process group as Ninja itself. This is done to ensure that when Ctrl-C is pressed, both Ninja and the console subprocess receive a SIGINT (I suspect this is required for proper terminal handling too, but I don't remember the details).

By default, the console subprocess receiving the SIGINT would exit, while Ninja would catch the SIGINT, report it when exiting DoWork() to the Builder(), which ends up invoking Clear(). The latter relays the SIGINT to all non-console subprocesses to stop them as well (hence the use_console check in that function).

With your code, if the failure comes from a non-console subprocess, calling SetInterruptedFlag(SIGINT) will tell the next DoWork() call to report an interrupt, but will not send SIGINT to the console process which will continue running. And because it also shared the stdout / stderr descriptors, it will continue printing to the terminal. And Ninja will likely be stuck waiting for it in the Subprocess destructor (which ends up calling waitpid()).

It looks like something similar is true for Win32.

Also, because this is very subtle, I strongly recommend adding a robust unit test for this behavior to your PR.

For example one with a long console subprocess that sleeps for 10 seconds, and another non-console one that fails immediately. The test should verify that Ninja exits immediately with -K. You should also test the case where the console process fails first of course.

Hope this helps

}
4 changes: 4 additions & 0 deletions src/subprocess-win32.cc
Original file line number Diff line number Diff line change
Expand Up @@ -305,3 +305,7 @@ void SubprocessSet::Clear() {
delete *i;
running_.clear();
}

void SubprocessSet::Abort() {
Clear();
}
3 changes: 3 additions & 0 deletions src/subprocess.h
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,9 @@ struct SubprocessSet {
bool DoWork();
Subprocess* NextFinished();
void Clear();
/// Forceful variant of Clear() that should interrupt children (even
// though there was no external signal/interruption).
void Abort();

std::vector<Subprocess*> running_;
std::queue<Subprocess*> finished_;
Expand Down