Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove VISITED_STAT #5261

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ChrisJefferson
Copy link
Contributor

This remove VISITED_STAT and SET_VISITED_STAT, and the 'visited' bit from Stat, and keeps track of these things in profiling.c, and keep track of covered lines in profiling.c

This remove ActivateProfileColour, which allowed colouring functions with covered lines, hitting the exact parts of lines executed, as that was easy to do when we tracked what was covered per Stat, but I don't do that now (only by line), as I believe no-one was ever using this functionality!

If this is fine, please don't merge it yet, as I'll have to release an updating to the profiling package, as to sync up with this change (in particular, remove tests which use ActivateProfileColour)

src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
Obj linelist = ELM_PLIST(profileState.visitedStatements, nameid);

if (LEN_PLIST(linelist) < line || !ELM_PLIST(linelist, line)) {
AssPlist(linelist, line, True);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can use a blist instead of a plist to reduce memory usage by a factor of 64 or so? I think the main missing ingredient would be a function which takes a blist and efficiently increases its size by adding zero blocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh and perhaps another helper which takes a blist and an index and returns TRUE or FALSE (not True or False) ... this helper then could also just return 0 for anything beyond the bounds. That would simplify the code here a bit, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, I started trying to use blists then discovered there is no way to extend a blist while having it remain a blist.

It probably wouldn't be a bad idea to add such extensions to blists (maybe even accessible from the GAP level), but that was a rabbit-hole I didn't want to fall down, as it could be done later / more generally (should we just add a generic "extend list, and use this member to fill the new values, and special case that for blists?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't try to make something clever or generic. Thinking again about it, I would indeed just aim for a purely kernel internal API which does exactly what we need. These could even be private inside profile.c, no need to have them in blister.h:

int SAFE_TEST_BIT_BLIST(Obj list, UInt pos)
{
    if (pos > LEN_BLIST(list)
        return 0;
    return TEST_BIT_BLIST(list, pos);
}

void SAFE_SET_BIT_BLIST(Obj list, UInt pos)
{
    if (SIZE_OBJ(list) < SIZE_PLEN_BLIST(pos))
        ResizeBag(list, SIZE_PLEN_BLIST(pos));
    SET_BIT_BLIST(list, pos);
    //CLEAR_FILTS_LIST(list); // not really necessary for our application
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have a quick look. I worry about not testing such functions (you didn't do SET_LEN_BLIST in SAFE_SET_BIT_BLIST, which I think is required), I will take a look at doing this with BLists, however.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sure. I'm not even sure that code compiles, I just copy&pasted it quickly. Yeah, it should have something like if (pos > LEN_BLIST(list)) SET_LEN_BLIST(list, pos); added

src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
@fingolfin fingolfin added topic: kernel release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes labels Dec 13, 2022
@fingolfin fingolfin marked this pull request as draft December 13, 2022 09:42
@fingolfin
Copy link
Member

I've converted this to "draft" status so that it is clear that it should not be merged. You can turn mark it as "ready for review" as soon as profiling is updated (and despite the name, I'll be happy to review it regardless of the "draft" status ;-)

@fingolfin
Copy link
Member

So this now needs a rebase, but hopefully it won't be too hard.

Also you wanted to update profiling to match before merging this PR -- so I assume you'll do that next (if this is just about dropping the ActivateProfileColour code there should be no problem to release the profiling update first, and put it into the package distro, before merging this PR. Right?)

@ChrisJefferson
Copy link
Contributor Author

I believe this PR is now complete. I haven't switched to using blists as the memory usage isn't that big, and I would prefer some general and tested code for extending blists, rather than a poorly tested bit of code straight in the profiling.

This does make GAP as a whole a bit slower, but not massively, and that's probably unavoidable as we used to be able to just dig into the STAT, while now we have to go and look up through a couple of plists. After merging, and ensuring everything is working correctly, I might have a go at optimising.

There is, however one issue -- this pushes the time taken to test HPC-GAP very close to 1 hour, which is (a) too long, and (b) close to where the test will fail if it hits the one-hour limit and times out. I'm going to investigate if this slowness is related to this PR, or a more general issue.

@fingolfin
Copy link
Member

fingolfin commented Dec 23, 2022

Note that the 1 hour limit is self-imposed by us, we could increase it -- but I'd rather not. HPC-GAP has been exceptionally slow for quite some time now, much more so in the CI than on my laptop, and I don't know why

src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated
** see this when we use ActivateProfileColour. However, without some
** serious additional overheads, I can't see how to provide this
** 3) Operating at just a line basis can sometimes be too course. However,
** without some serious additional overheads, I can't see how to provide this
** functionality in output (basically we would have to store
** line and character positions for the start and end of every expression).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I do think we could provide that without a problem nor overhead, by replacing the line number in the stat by a byte offset into the file. Converting this into line number + offset into the line can then be deferred to a post-processing step.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bit more problem here ( I could describe it better).

We tend to set up statements we tend to (as you would expect) parse, then make a Stat, so if we just grabbed the position when making the Stat we would point to the end of the statement / loop, / whatever.

With just covering lines this (usually) isn't too bad, but if you want to capture expressions usefully you probably want to catch their start and end (as expressions are often nested), and then we'd need to catch the start of expressions.

@ChrisJefferson
Copy link
Contributor Author

You are right, I looked on my machine and profiling is a little slower, unfortunately for CI generating coverage is particularly hit, because once a line is covered it was very cheap to check if the stat was covered and exit. However, the slowdown isn't unreasonable.

On my machine, HPC-GAP is about 50% slower than master, and coverage is about 50% slower than no coverage.

I could make a PR where we do a much smaller set of HPC-GAP testing? We could consider stopping testing it, but I don't know if the code is being partially used for the Julia integration?

@fingolfin

This comment was marked as outdated.

@fingolfin
Copy link
Member

I have rebased this now

@fingolfin
Copy link
Member

Rebased again.

Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks fine to me, except for some HPC-GAP related locking concerns. Also, do you understand why this makes HPC-GAP so much slower? Or does it also make regular GAP with profiling a lot slower?

src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
src/profile.c Outdated Show resolved Hide resolved
NEW_PLIST(T_PLIST, 0));
}

Obj linelist = ELM_PLIST(profileState.visitedStatements, fileid);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess executing all this for every single file is quite excessive. A possible optimization: we do something like for PtrBody and PtrLVars, by tracking a STATE(ProfileLineList) which essential contains linelist from above and is updated by SWITCH_TO_OLD_LVARS if profiling is active (well and also by the code which activate/deactivates profiling). The all of the above can be replaced by something like this:

    Obj linelist = STATE(ProfileLineList);
    if (!linelist)
        return visited;

Or one could go one step further and store ADDR_OBJ(linelist) (and then update that pointer in VarsAfterCollectBags, too). Then of course LEN_PLIST, ELM_PLIST etc. below would need to be substituted.

Copy link
Member

@fingolfin fingolfin Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I have no idea if either of these would be worthwhile, but then again, this is executed for every single expression or statement, so we should measure and make it as fast as possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

markedVisited is only about 2% of the time taken in profiling (that's less than I expected, but it's what cachegrind reports). Almost all the time is spent in printing.

src/profile.c Outdated Show resolved Hide resolved
@fingolfin
Copy link
Member

Rebased once again

@@ -532,11 +530,36 @@ static inline void printOutput(int fileid, int line, BOOL exec, BOOL visited)
}
}

// Mark line as visited, and return true if the line has been previously
// visited (executed)
static BOOL markVisited(int fileid, UInt line)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not thread safe, yet is not called from a critical section either. Can't that cause problems in HPC-GAP?

So perhaps it needs a lock? Or perhaps we can make profileState.visitedStatements thread local -- it suffices to merge this into a single plist at the time we write out the profiles, no? (Though I guess that would break if a thread was killed before we do that...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(old reply!)

We only call this from one thread -- we lock profiling to a single thread (you can see this from the caller).

There could be issues to do with ownership of the datastructures here in HPC-GAP, if they are created in one thread and used in another. I'm not opposed to a fix for that, but it wouldn't shock me if various things don't work in HPC-GAP with profiling.

@ChrisJefferson ChrisJefferson force-pushed the remove_visited branch 2 times, most recently from 636c70f to c9fe133 Compare December 14, 2023 04:55
@ChrisJefferson
Copy link
Contributor Author

Almost all the time taken in profiling is from the fprintf printing out JSON -- the main way to speed this up would be to switch to a binary format, or do something where we just print less output. Both of these are reasonable projects, but I'm not planning on working on either of them soon.

@fingolfin
Copy link
Member

@ChrisJefferson what are your intentions with this PR? It is still marked as draft. What is needed to complete it then?

// `visited` starts out as 0 and is set to 1 if the statement or
// expression has ever been executed while profiling is turned on
unsigned int visited : 1;

// `line` records the line number in the source file in which the
// statement or expression started
unsigned int line : 31;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well put that bit to use again (and possibly allow generation of slightly nicer code?)

Suggested change
unsigned int line : 31;
unsigned int line : 32;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: not needed PRs introducing changes that are wholly irrelevant to the release notes topic: kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants