-
-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added foundation for DE_DE data #75
Conversation
Hi there! Thank you for opening this PR! As another German (who didn't have enough time to stem this), I'm particularly happy this finally gets addressed. With last names, I'd allow for double names by rule instead of putting them into the data. The rule should have special handling for equal last names to just return that name so we can avoid "Müller-Müller". I would avoid "Musterfrau". Street names can be hyphenated names, last names, city names or a variety of words + any of "Straße", "Weg", "Gasse", "Allee", "Chaussee" selected with decreasing probability. City names can be built from a variety of suffixes like. "stadt", "berg", "dorf", "kirchen", "burg", "feld", "au", "ing" or "heim" and optional prefixes "Groß ", "Klein ", "Ober", "Unter". That plus a list of name parts will give us a good selection. |
Thanks for the feedback and that is exactly why I opened the PR. The whole cityname with prefixes, suffixes and some static strings in between opened the question in my head how to integrate it into the library. Would you enhance the fake-rs/fake/src/faker/impls/address.rs Line 35 in 3264649
impl per locale which could then use its custom TPL with {prefix}{base}{suffix} ?
These are just ideas and I'm a bit lost how it could work best for the library :) |
Good question. Perhaps having a kind of templating functionality that we can reuse might be useful, perhaps along with a few rules (e.g. no uppercase letters after lowercase letters). |
I would prefer templates for the default implementation, in case it does not fit for specific locale, can use impl specialisation. But not sure is it in stable or not |
Added german CityPrefix and Suffixes
Small update time. DataI've added the 50 most common firstnames for female and male Source:
Regarding the City names I've integrated the suggestions from @llogiq A sample company currently looks like this:
IssuesDuring testing I found that only using templates will not suffice for all of the german specifics and probably for other locales too. Some edge cases I encountered where: Proper lowercasing in city namesExample: Due to the fake-rs/fake/src/faker/impls/address.rs Line 43 in 3264649
the function simply takes a LastName and plugs it into the Template string. which could be prevented by lowercasing all templated strings except the first one. CitystatesExample:
Should be:
In germany there are a few Citystates e.g. Berlin, Bremen, Hamburg. These share the same name My understanding from the examples is that the library only provides parts of bigger structures like My feeling is that some kind of customization logic might be needed in the future internally in the Static contentDue to a lot of edge cases which might not be easily generated but provide a benefit for testing purposes it FeedbackI would really like to get your feedback on this since I have the feeling a lot of other locales will face similar problems Therefore we might need a bit more logic in each respective Or the other way round it would make sense to have And since I'm only contributing I would like to get your input on this and ask for a bit of help or examples how I could implement either way you want it to be integrated in the future 😄 |
Regarding the uppercase problem: I would suggest adding a function that lowercases all characters unless a) the first and b) following any whitespace. This would leave e.g. "Gross Annaheim" in uppercase but lowercase "Unterdirkfurt" |
Sad to see this is dead in the water. Would have been nice to fake some German data next to the other available languages. |
Yeah unfortunately my work focus changed and I didn't have any time to maintain the PR or even keep it up to date. |
Hey guys I've found a bit of spare time and merged the new master into my old branch. But there is still the issue regarding the casing in It feels kinda wrong to add the suggested function from @llogiq to the fake-rs/fake/src/faker/impls/address.rs Line 35 in 8c92743
Is there a way in which I could add a specific This would open up the opportunity to provide locale specific logic and solve issues like my uppercase concatenation one. Cheers and Happy Holidays 🎄 |
I added you can now implement custom logic for it like below impl Data for MyLocale {
const ADDRESS_CITY_GEN_FN: Option<fn() -> String> = Some(my_city_fn);
}
fn my_city_fn() -> String {
"custom logic".to_owned()
} Can add others gen fn for others if needed. |
Hey @cksac I've added a first little It tries to mimic the existing code structure from One problem I faced was that I could not access the So I could not change fn() -> String to fn <R: Rng + ?Sized>(rng: &mut R) which is unfortunate and I haven't measured the performance impact of generating a new Here is some generated sampledata which looks good to me and should cover most edge cases ( dashes spaces and parantheses in city names)
|
yes, that's need to be addressed later. |
… the manual tests which were only helper functions.
Perfect! I made a last small commit removing my manual test file and adding the |
Nice, thank you! |
Hi @cksac,
I've been using your library thanks to LukeMathWalker/zero-to-production
Since I've had the need to generate a bit of german fake data I took a closer look into the project.
I added a
de_de.rs
module to provide a few basic attributes like State,Company Type,Names and so on from the official government ID test card datahttps://persosim.secunet.com/fileadmin/user_upload/PersoSim_Personalisierungsdaten_eID_UB.pdf
It's working with the
check_determinism
test as far as I can tell.To verify the data that gets generated I added a
de_manual_test.rs
just so see some generic company struct on the terminal.Run via
cargo test de_manual_test -- --nocapture
it produces some results likeSo far so good. 😄
But that is the reason I opened the PR.
"Musterfrau land"
is not a valid city name in german.Most city names would consist of two nouns or names and a suffix.
Germany and to further extend the "DACH" Region (Germany, Austria, Switzerland) all have a pretty different layout for thinks like street names and addresses.
And I would like to discuss how I could implement it in a way it suites the library and their users.
For example a street name might be separated by dashes if it contains one or multiple names e.g.
Karl-Marx-Allee
https://en.wikipedia.org/wiki/Karl-Marx-Allee or can contain an apostrophe as seen here in https://de.wikipedia.org/wiki/Stra%C3%9Fenname inLaehr'scher Jagdweg
There are already some
impl
with random branching for data generationfake-rs/fake/src/faker/impls/address.rs
Line 108 in 3264649
And I don't know if I should start adding
impl
s special for theDE_DE
locale or how it would fit the overall project structure.Any feedback or guidance is appreciated and I would gladly implement more of the german locale.
Cheers xoryouyou