Clean up imports. #106

xfbs · 2023-10-10T08:54:29Z

This commit "cleans up" the code base somewhat by making sure that types, functions and modules that are referred to multiple times in the code are properly imported, instead of referring to them by their absolute path.

To me, this makes code more readable. I like to keep visual noise to the minimum. There are some cases where it makes sense to use an absolute path multiple times, for example to make code less ambiguous. But for the most part, I try to avoid it.

Unqualified vs qualified types

In Rust, we have the choice between using types in a qualified way:

fn do_something(client: reqwest::Client, method: http::Method, response: axum::request::Response) -> eyre::Result<serde_json::Value> {
  // ..
}

and using them in an unqualified way, after importing:

use reqwest::Client;
use http::Method;
use axum::request::Response;
use eyre::Result;
use serde_json::Value;

fn do_something(client: Client, method: Method, response: Response) -> Result<Value> {
  // ..
}

Both of these options produce valid code. I have some personal preferences for this, which kind of relate to my development workflow. I tend to (have) to read a lot of code, and as such I try to optimize for code that I find easy to read.
I also tend to develop in the terminal, with a split pane, such that only half my screen is used for my editor of choice.

With these two constraints, there comes some preferences:

Shorter lines of code are more legible for me (< 80 or 100 lines, depending on terminal and screen)
Less repetition is better for me. If I work in a file that talks to an HTTP API, then I know what the Client is, I do not need it spelled out as reqwest::Client.

On the other hand, if I'm reading in a file that handles a HTTP API and a database connection, then it might make sense to disambiguate the postgres::Client from the reqwest::Client. Although typically I would solve that by renaming the import, for example:
```
use reqwest::Client as HttpClient;
use postgres::Client as PostgresClient;
```
In the example code above it might be clearer to import serde_json::Value as JsonValue.

In the example code above it might be clearer to import serde_json::Value as JsonValue if that is how it is used, because that way it more clearly communicates the intent (this is a JSON value) without being overly explicit (serde_json::Value), because when reading the code, I might not particularly care that it is a JSON value from the serde_json crate (as I can get that information easily from rust-analyzer), I just care about the meaning of this type, which is to hold arbitrary parsed JSON data.

As a result of this, my personal preference is that when in doubt, I will import things. I will always explicitly import things if they are used multiple times in the same file. Sometimes, I will import things and rename them to make it more clear.

When in doubt, prefer brevity over explicitness.

There is one thing to note here: A lot of people use VSCode with rust-analyzer. The default setup for this will show the full, qualified path for types. So, if you are used to using VSCode or other IDEs, you will not really notice a difference between writing Client versus reqwest::Client, because VSCode will show you the reqwest:: path anyways (in gray). This means that if you are used to VSCode, you may tend to write out the full, unqualified paths just because you are used to reading code that way. VSCode can be told to not show the full paths unless you are hovering over them, if you want to try that. I do feel that there is still a benefit from keeping code short, even if it does not make a difference to some.

Error handling

When using error handling libraries such as anyhow or eyre, I tend to import their Result type and use that in an unqualified way (as in, use Result<T> rather than eyre::Result<T>).

Why? In those crates, Result is defined as:

pub type Result<T, E = eyre::Report> = std::result::Result<T, E>;

This means that eyre::Result<T> is just an std::result::Result<T, E> with the E defaulting to eyre::Report. For that reason, importing eyre::Result and using it as Result is fine, since you can always override the error type. In other words, eyre::Result<A, B> and std::result::Result<A, B> are equivalent, the only different with the eyre version is that if you do not specify the error type, it falls back to being an eyre::Report.

use eyre::Result;
use std::io::Error as IoError;

// returns eyre::Result<String>
fn my_name() -> Result<String> {
  Ok("Name".into())
}

// returns an `std::result::Result<String, std::io::Error>`
fn do_io() -> Result<String, IoError> {
  std::fs::read_to_string("/etc/passwd")
}

Since projects typically do not mix and match several error handling libraries in the same crate, whenever you see Result<T> it is immediately obvious that this will be an eyre result, or whichever error handling library the current project uses. So writing out eyre::Result does not add any necessary context for understanding the code, unless perhaps there are multiple error libraries in use in the current file.

Logging

In general, glob imports are discouraged a little bit, it is always nice to not use them so that it is immediately obvious what comes from where. For that reason, there is a clippy lint that can be used to disallow using glob imports.

Logging or tracing is the one thing where I feel it can make sense to do a glob import:

use tracing::*;

fn do_something() {
    info!("Hello");
}

The reason is that:

You don't want to clutter the codebase with tracing::. Tracing or logging statements should be short and sweet. There is nothing shorter and sweeter than info!("XYZ").
You just want to be able to drop an info!(...) anywhere in the code and expect it to work, and not have to edit the import to add info to the list because you had only imported {debug, error} before.
This makes switching between the log crate and the tracing crate easy, as they both have similar macros.

Deriving Traits

In the PR, I have also removed implementations of PartialOrd and Ord and replaced them with derives. I believe that they were maybe added in error, because it was not know that they could be derived automatically.

My rule of thumb is:

If a trait can be derived automatically, it must be derived automatically. Less code is better.
If a trait can be derived automatically, but it has constraints that make it so it needs to be done manually, perhaps because a specific ordering is desired, then it needs to be clearly communicated with a comment explaining that this is deliberately implemented manually, and outlining the reason for it. It should also come with accompanying unit tests to ensure that this is not accidentally refactored.

Other notes

Note that this PR includes strong (personal) opinions, which you may or may not agree with. A part of the reason for creating this PR is to spark a conversation about coding styles and preferences, so feel free to chime in and tell me why I'm wrong! Also feel free to close this. Writing up these style preferences is a lot of fun, makes you think about what preferences you have and why you might have them.

I hope this all makes sense

tetigi

These all seem like sensible things to change (and thanks for the detailed write-up!)

I would add that I believe most of these things happen here not for lack of knowledge, but mostly due to a lack of enforcement through clippy / linting on this kind of stuff.

src/generator.rs

mara-schulke · 2023-10-10T10:05:16Z

The Why

So the rule of thumb (i; and this is not standardized) followed in regards to imports is to make things as easy as possible for someone new / else to read. I dont optimize for line length or conciseness, i try to add as much context with as little chars as sensibly possible.

So whenever absolute imports are a) short and b) add context and c) resolve ambiguity i personally prefer absolute imports like eyre::Result > Result. This also makes code review easier, as if someone by accident used std::io::Result as import, it is not obvious to the reviewer that this is a bug / not meant to be like this.

Also many crates like reqwest export very generically named structs like Client which a) is not clear on what this thing does. (It lacks context of whether its an http, grpc or whatever type of client) and b) it is very likely that these structs may collide sooner or later with other crates' exports. So the reasoning behind this was to keep people from preventing renamed imports, while retaining context.

I personally find reqwest::Client more readable than use reqwest::Client as HttpClient because i have one redirection less when looking up documentation or alike.

My Concerns

Last nit: Citing your example do_thing i think is a good example when i would gravitate towards explicitness. There are three crates operating in the same domain and it is unobvious that the Response type is incompatible with the Client type without external tooling.

Also a genuine question i have: How would you, when moving on with this approach, solve the problem where you have two structs, that do the same, but come from two crates, like reqwest::Response and axum::request::Response? You can't rename them sensibly (like done in the Client example

Last last nit: When using aliases, it is harder to keep them consistent in a large codebase, compared to absolute imports; So this means we would sometimes have Client, DbClient, DatabaseClient, PgClient and whatever people would else do

Consensus Note

Also personal opinions; Im happy to agree to the general consensus

src/registry/artifactory.rs

asmello · 2023-10-10T20:36:59Z

I tend to agree with Mara on the points raised above, and the general sentiment from Patrick. My own decision criteria wrt to imports is the following:

By default, add the unqualified name to scope with a use statement.
If there's a clash, add the minimum amount of namespace qualifiers to disambiguate.
As a general rule, don't alias and don't shadow the std definitions.

Aliases

In particular, the reason why I consider aliases bad is because they are a source of surprise, and minimising surprise is a good way to counter code entropy. What this actually means in practice is that when one sees reqwest::Client there's little room for misunderstanding what Client this stands for — by contrast, HttpClient could also be a ureq Client, or a crate-local Client. To find out, one must trace the import, which adds a bit of extra moving around, and then mentally translate every occurrence of the symbol to its alias, which adds some cognitive overhead. It's usually not enough overhead to matter, but it can add up in a project. Besides, I find that disambiguation with the namespace doesn't usually add that much length — names that may lead to conflicts are, as a matter of best practice, exported at the root module, so it's usually enough to use the crate name to differentiate.

Shadowing

Shadowing names from std is a very common source of confusion, because it's hard to detect and, once again, it generates surprise. One might try to use the shadowed symbol as they would the standard one, only to find inconsistencies with what they are used to: missing (or additional) associated functions and trait implementations, and sometimes unexpected runtime behaviour. The Error type provided by eyre doesn't actually implement the std::Error trait! 🤯

Note on line width

I think worrying about line width doesn't really make sense in Rust, not only because modern screens have plenty of space for 120+ characters even on split-screen setups, but also because rustfmt is very opinionated and will generally apply sensible line breaks for any code you can throw at it. Further, I'm of the school of thought that a long but descriptive variable name is much better than a short but ambiguous and/or obscure one. Of course, the law of locality applies, and the smaller the scope the more conciseness gains importance while descriptiveness loses it. Ultimately it's all about readability, which can mean long or short names depending on the context. But IMO it's never about keeping lines short.

xfbs · 2023-10-11T08:44:52Z

Awesome points from everyone! I feel like I agree with everything that's been said. I'll try to change this PR to be in line with the sentiment and apply the suggestions from Mara.

xfbs · 2023-10-12T09:24:58Z

So, two things:

First: I have applied all of the suggestions made in here. Maybe someone look them over to make sure I did not miss anything? It took me a bit to address some of the things here.

Second: coding style is something that is very fascinating. Everyone has their own workflow and axioms that allow them to be productive. I feel that I'm constantly evolving and picking up new ideas. I hope it is not too annoying to talk about it and I understand what you optimize for.

I wanted to address some statements and questions that I forgot to address previously:

Also many crates like reqwest export very generically named structs like Client which a) is not clear on what this thing does. (It lacks context of whether its an http, grpc or whatever type of client) and b) it is very likely that these structs may collide sooner or later with other crates' exports. So the reasoning behind this was to keep people from preventing renamed imports, while retaining context.

I personally find reqwest::Client more readable than use reqwest::Client as HttpClient because i have one redirection less when looking up documentation or alike.

I think that is a fair point. Also, in this case the ergonomic benefit is not major.

It probably depends on who you are optimizing for. For a casual reader of the code, HttpClient is what they are looking for (they only care what it does, not which specific crate is being used). If they do not know the reqwest crate, they have to look it up first. For people writing code (keep in mind that code is read more often than it is written), the details are important.

Also a genuine question i have: How would you, when moving on with this approach, solve the problem where you have two structs, that do the same, but come from two crates, like reqwest::Response and axum::request::Response? You can't rename them sensibly (like done in the Client example).

Interacting with both axum and reqwest directly in one file would be a code smell for me. My rule is that I want to have one module do one thing. So, if this module is an Axum API, I will not directly talk to other crates. Instead, I would have a separate client module that encapsulates the details of how the HTTP request is made, and call functions from that. In the code example, there was a mix of a blocking and non-blocking mutex. Again not a very common occurrence, really just an edge case.

Otherwise, this would be a case for where renaming would be the most sensible thing to do, for me at least.

use reqwest::Response as ReqwestResponse;
use axum::request::Response as AxumResponse;

But there are more considerations:

The ergonomics matter. In this case, reqwest::Response is not that much longer, so maybe it does not make sense to alias it.
The context matters. If this is inside an Axum handler, then I will definitely have an import for axum::request::Response and as such I will use Response to mean an axum response. This is because this whole file is an axum request handler, and as such anytime I see Response I know it is going to be an Axum response. So in that case, it would look like this:

use axum::{
  extractor::{State, Headers},
  request::{Response, IntoResponse},
  http::StatusCode,
  Json
};

// turn reqwest response into axum response
fn convert(response: reqwest::Response) -> Response {
  // ..
}

I tend to like using aliases when I can use them to convey meaning beyond the absolute path of the crate. For example, to convey the intent. Check out this code, where I am using both tokio::sync::Mutex and std::sync::Mutex. There is sometimes a speed advantage to using blocking mutexes in async code when there is very little contention. In this example, I can use aliases to give my types a more helpful name.

use tokio::sync::Mutex as AsyncMutex;
use std::sync::Mutex as SyncMutex;
use hashbrown::HashMap;

// with aliases: convery meaning, remove noise.
fn get_cache() -> AsyncMutex<HashMap<ImString, SyncMutex<CacheData>>> {
  // ..
}

// without aliases: more explicit, but adds unneccessary implementation details
// (we don't really care that the mutex is from tokio, we don't care that the hashmap is from
// hash brown, they are both just abstractions)
fn get_cache() -> tokio::sync::Mutex<hashbrown::HashMap<imstr::sync::ImString, std::sync::Mutex<CacheData>>> {
}

What's nice about this is that I can go ahead and swap out the type I use for the AsyncMutex for another crate without needing to mess with the code. I feel that there are some corner cases like this where aliases are actually more helpful than using the absolute path and lead to more understandable code.

Last last nit: When using aliases, it is harder to keep them consistent in a large codebase, compared to absolute imports; So this means we would sometimes have Client, DbClient, DatabaseClient, PgClient and whatever people would else do.

As I said, aliases are something that are sprinkled conservatively into the code base ideally. I have not had issues with them being inconsistent. Through code review you see other people's style and imitate it. But if they are not, it is also not too bad, as long as the meaning is conveyed. For example, when some of the code base uses DbClient and some uses DatabaseClient, with both of these aliases I understand the semantics (it's a database client!).

I would much rather read DatabaseClient than deadpool::managed::Pool<postgres_tokio::non_blocking::Database>, because the former tells me all I need to know to understand the meaning, whereas the latter is unambiguous, but tells me implementation details I do not care about (and could find out easily anyways if I needed to).

But I do agree that they should not be overused. It's actually rather rare that I need to use import aliases. Instead, something I like to do is create type aliases in the crate like this:

pub type Database = deadpool::managed::Pool<postgres_tokio::non_blocking::Database>;

And then just use crate::Database; which effectively standardizes the name. Even better would be to encapsulate this into some kind of wrapper type such that the choice of database pool and connection crate are abstracted away.

In particular, the reason why I consider aliases bad is because they are a source of surprise, and minimising surprise is a good way to counter code entropy. What this actually means in practice is that when one sees reqwest::Client there's little room for misunderstanding what Client this stands for — by contrast, HttpClient could also be a ureq Client, or a crate-local Client. To find out, one must trace the import, which adds a bit of extra moving around, and then mentally translate every occurrence of the symbol to its alias, which adds some cognitive overhead. It's usually not enough overhead to matter, but it can add up in a project.

For this I would say: ideally, when I read code I don't really care if a Client is a reqwest client, or a ureq client. I'm just trying to understand the "big picture" of what is happening. So the only thing I care about is what protocol this client targets. So for me, as an outside reading code in a massive codebase I don't understand, HttpClient is the most useful thing for me to see, everything else is implementation detail (or means I need to look up a crate to see what it does).

Most people have some kind of a setup using rust-analyzer that lets you hover over anything and it tells you what it resolves to. This means that you never really have to trace anything, and you can give entities names (aliases) that are useful to you (the programmer) or others (people reading the code).

Shadowing names from std is a very common source of confusion, because it's hard to detect and, once again, it generates surprise. One might try to use the shadowed symbol as they would the standard one, only to find inconsistencies with what they are used to: missing (or additional) associated functions and trait implementations, and sometimes unexpected runtime behaviour. The Error type provided by eyre doesn't actually implement the std::Error trait!

There is some truth to this, and the missing std::error::Error implementation of eyre::Error has bitten me before (I was trying to build something generic with an E: std::error::Error bound). That being said, there are plenty of good reasons to shadow things in the standard library:

tokio for example is structured in a way that mimicks the standard library. And this is done with intent! Generally, you should never mix tokio (non-blocking async) code with standard library (blocking) code. So having tokio shadow the standard library kind of helps to not accidentally mix the two.
For the Result types offered by eyre and anyhow, they are build in a way that you can safely import them anytime. The reason for this is that eyre::Result IS a std::result::Result, the only difference being that it has a default for the error type. So for example, if you write Result<A, B> then it does not matter at all whether you have imported eyre::Result or not — it's the same type! The only difference is when you write Result<A>. Without importing eyre::Result, this would be a compiler error. With importing it, this would be fine.

Errors due to shadowing are very rare, because the compiler catches nearly all of them. Some APIs are specifically designed so that it is safe to import them and shadow built-in stuff. For example, if you import tokio then you code needs .awaits everywhere, so this is never something that you can just forget. Everything has a learning curve, and maybe it is not something that you might be used to. But I would not be too critical of it, as long as it is reasonable.

Damn this thing turned into an essay.

asmello · 2023-10-12T19:45:47Z

I tend to like using aliases when I can use them to convey meaning beyond the absolute path of the crate. For example, to convey the intent. Check out this code, where I am using both tokio::sync::Mutex and std::sync::Mutex. There is sometimes a speed advantage to using blocking mutexes in async code when there is very little contention. In this example, I can use aliases to give my types a more helpful name.

I actually agree that aliasing adds value here, but I also think this is a good argument against shadowing std.

Generally, you should never mix tokio (non-blocking async) code with standard library (blocking) code. So having tokio shadow the standard library kind of helps to not accidentally mix the two.

I don't think it actually does, as evidenced by my mistake when reviewing your other PR, it's easy to forget which implementation we're dealing with, especially when we're used to using the std one. If we had a way to enforce not mixing implementations, that'd be a different story.

Errors due to shadowing are very rare, because the compiler catches nearly all of them.

I'm not sure I agree with that statement. If you accidentally use a tokio function when you want an std implementation, yes, the compiler will tell you something is wrong (although arguably the error is not very helpful). But the converse doesn't hold, and evidence of that is how we ended up using a std::fs in Buffers in a bunch of places where we meant to use tokio::fs instead (thanks for fixing that, btw). In this case, as luck would have it, the resulting code still works as intended, but this can be a source of hard to track bugs, as behavioural differences between compatible implementations may only appear in esoteric circumstances.

It probably depends on who you are optimizing for. For a casual reader of the code, HttpClient is what they are looking for (they only care what it does, not which specific crate is being used). If they do not know the reqwest crate, they have to look it up first. For people writing code (keep in mind that code is read more often than it is written), the details are important.

While I agree the namespace doesn't add semantics when used like this, that is confounding its purpose, which is to disambiguate. Aliasing for the sake of clarifying what a symbol represents is a valid, but orthogonal, use-case, which has more to do with compensating for bad API design outside of our control than something that should be regular practice. In fact it is perfectly plausible that if you adhere to the practice of renaming web clients to HttpClient, you might end up with a situation where you have both use reqwest::Client as HttpClient and use ureq::Client as HttpClient, in which case you have to either turn them into ReqwestHttpClient and UreqHttpClient or drop the aliases and revert to qualifying. BTW, relatedly:

Interacting with both axum and reqwest directly in one file would be a code smell for me. My rule is that I want to have one module do one thing.

I get that realistically you wouldn't mix and match third-party web clients like this, but it would be perfectly sane to have a local HttpClient that wraps an external one, in which case you end up with a similar situation.

But I do agree that they should not be overused. It's actually rather rare that I need to use import aliases. Instead, something I like to do is create type aliases in the crate like this:
pub type Database = deadpool::managed::Pool<postgres_tokio::non_blocking::Database>;
And then just use crate::Database; which effectively standardizes the name. Even better would be to encapsulate this into > some kind of wrapper type such that the choice of database pool and connection crate are abstracted away.

I'm fully in agreement with this approach, in fact I'll argue that we probably run into valid use-cases for type declarations like this more often than import aliases.

As a final note, I'm actually not particularly concerned about mixing styles when it comes to this. Code style consistency is important, but rustfmt does a good enough job of avoiding entropy accumulation. I suggest we just allow contributors to employ the style they like for new code, and ask people not to put energy into making things adhere to the style they prefer (we'd not merge style-changing PRs unless they add clear, uncontroversial, value).

If you feel strongly about Buffrs having a consistent style, I suggest you put up a proposal for project guidelines and we can do a vote. Once ratified I'd be happy to go with whatever style is most popular (though enforcing may be tricky).

xfbs · 2023-10-16T08:22:51Z

I have just rebased this on to of the current main branch.

src/lock.rs

- Derives `PartialOrd` and `Ord` instead of manually implementing it. - Imports modules and types that are used multiple times. - Cleans up `use` statements in some places.

xfbs · 2023-10-17T14:25:46Z

Does that mean all issues with this PR are resolved?

xfbs requested a review from asmello October 10, 2023 08:54

xfbs self-assigned this Oct 10, 2023

tetigi reviewed Oct 10, 2023

View reviewed changes

src/generator.rs Outdated Show resolved Hide resolved

mara-schulke added this to the Stabilization milestone Oct 10, 2023

mara-schulke added complexity::low Issues or ideas with a low implementation cost type::style Related to personal or community opinions type::refactoring Changing the inner-workings of buffrs type::idea Rough idea or proposal for buffrs labels Oct 10, 2023

mara-schulke reviewed Oct 10, 2023

View reviewed changes