-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmt files as input and output #24
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the feature proposal sensible, consistent with our recent support for .cmir-linear
in the backend. It should also be fairly simple to implement. I'm in favor.
(We also have flags such as -stop-after typing
, which in theory could be combined with -bin-annot
to get .cmt[i]
files, but having input/output targets is clearer and easier to specify precisely.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also very much in favour, just with some suggested tweaks to the handling of -o
The feature is easy to implement and maintain, so I don't object. However, I don't think it is the best way to achieve your aims in the medium to long term. The main issue is that cmt files are huge and mostly useless. I think there are only two tools in the OCaml ecosystem that currently consume cmt files:
For dune's checking mode I think that the mode should not be trying to produce any output beyond the possible errors. So For sharing type-checking time, it would be much better to store the lambda code to disk. The translation from the typedtree to lambda is inexpensive and the result should be a lot smaller. At the moment there are some differences between the lambda code for bytecode and the lambda code for native, but I think these differences can and should be eliminated anyway. |
rfcs/cmt_input.md
Outdated
|
||
## Speed up compilation when type checking is a bottle neck | ||
|
||
This likely doesn't occur very often, but there are cases where type checking is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, type checking is consistently the majority of compilation time at Jane Street.
ocp-index should be added to that list. I don't know how alive libmonda is today, but I know it uses cmt files too. I think it would be somewhat of a shame if cmt files were to go away. It's true that they're very bulky, but they've been pretty good at allowing people to experiment with tooling. For example, I know someone is experimenting with adding a workspace wide "go to symbol" using cmt files. In the end, I could certainly manage if cmt files went away, but will be saddened that one of the few successful hooks for cool hacks is gone.
Doesn't that mean checking mode and lambda generation will not be able to reuse the type checking phase? If |
Just as a data point, at LexiFi we have several refactoring tools that we use often and that make use of All of this to say that for me it would be a mistake to make |
I certainly don't think we should stop supporting cmt files. They are occasionally useful for the kinds of things mentioned above. But I would like to get to a point where they are only built on request, rather than always being built during day-to-day development. Building them all the time has a noticeable cost and I don't think they are used enough to justify it.
Yeah. I'm assuming that people would either use checking mode or try to build things, not both. I would have thought that the loss of incrementality when changing modes was worth not having to write out the cmt files. |
I mean, the previous RFC, if merged, would also roughly divide the size of cmt files by two.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
It would be good if this was possible, but I'm not sure how to make this work in practice. We need cmt files for installed libraries for many applications, and I'm not sure how to generate these "on demand" without just installing all the build rules and sources.
The workflow I'm thinking of where this could matter is when one is editing sources and running the test suite in the background in watch mode. E.g. |
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
Signed-off-by: Rudi Grinberg <[email protected]>
I was actually in process of implementing support for the complete So this for me is a nice movement but also makes me wonder, there was any discussion of unifying the multiple files that we have today? I know that ocaml/ocaml#608 proposes adding the flambda to type t = {
typedtree: typedtree option;
lambda: lambda option;
cmm: cmm option;
flambda: flambda option;
} Of course this could be compressed and storing only what was configured, so |
I'm not sure that the size is worth it even if divided in two. With the previous RFC the core compilation pipeline for each file will go:
This doesn't seem a good route to improving compilation times.
I suspect this isn't actually true. Once merlin stops using them I think we will only need cmt files for ml files without mli files, for odoc. I think we are currently installing these files everywhere and then basically never using them. However, even if it were true, the vast majority of builds are not installing builds. In a monorepo style approach, e.g. where you pull all dependencies into a single dune workspace to build them, you are never doing an installing build. Even if you still build cmt files whenever you are installing a library it will still be a big improvement.
For that workflow, I think building bytecode only is probably the best approach. I would have thought that building |
Mutation in place really complicates the implementation of build systems. At the very least, it would require to implement locking to write to this mega cmt file since it is now used for so many different things. Not to mention that this would probably make dune's cache useless as well.
That's a good compromise. Things that only need to be built for install builds are much less problematic.
The problem is that users sometimes prefer to run their tests in native mode. I don't want to speak for all users here, but in my experience Iarge test suites can get slowish. In any case, I prefer to run things closer to how they are in production when testing. |
On a more practical note, I had a brief look at implementing this and it isn't as easy as I thought. To compile the typedtree to lambda, we need to obtain the
|
Yeah I did the same yesterday, seems like storing the coercion is the best idea as it is a simple structure, overall I think it will not be a significant upgrade in size. |
This may be a really niche property, but because of this any PPX that is used through |
Like @lpw25, I dont't like .cmt files much. It's data hoarding: we don't quite know what information external tools need, so we'll dump the whole typed AST to disk, no matter how big and redundant it is, no matter how much disk space it consumes in OPAM package installations, no matter how much time this adds to compilation times. (And it adds more now that marshaling took a big performance hit in preparation for Multicore OCaml.) So, I'm skeptical that using .cmt files as an intermediate step in builds is a good idea. Expect noticeable slowdowns for essentially zero benefits. If you want typechecking + ability to run tests on the fly, compilation to bytecode does the job already and more efficiently. |
I remember reviewing a project on analysis and refactoring of source code that had problems scaling up to large code bases because their Java representation of the annotated AST consumed about 1 kilobyte of disk space for every line of source code. It sounded crazy to me at the time. But I'm not sure .cmt files are doing much better than that today. |
It was pointed out that type checking is a bottle neck at jane street. So I'm not sure about the slowdowns. I've listed other benefits in the RFC. I'm in full agreement with the concerns about data hoarding and bloat, but perhaps that discussion is not directly relevant to this RFC. The RFC does not suggest to add any bloat, it only proposes to use the already existing bloat to solve current problems. If there was a way to avoid cmt's and not lose any tooling support, I'm sure nobody would object. Unfortunately, that does not seem to be on the cards in the short term. So why not reuse what's already there to work around current pain points until whatever is going to replace cmt's is ready? |
To add to the tools/progams that use the .cmt files there is also OCamlEditor which uses .cmt(i) files for displaying an outline of OCaml files and displaying expression type on hover Which is not shown on the screenshot are the type hovers, but you can probably believe me on that. It runs on current OCaml (4.12) version, and will run on 4.13, but on the other side there are probably just 3 people in the world using it (and I'm not sure who the third is), so it should not be such a big concern. |
Rendered