MIDS levels as their own terms, what do you think? #80
Replies: 3 comments 2 replies
-
Hi @ManonGros , regarding the LtC part of your inquiry, should there be interest by the community in collection-level MIDS-summaries, it would make sense to me to create a LtC class dedicated to data quality. This class will build on ltc:MeasurementsOrFacts, though have dedicated schemes and vocabularies. Such an endeavor should be discussed with the MIDS and BDQ working groups and align with their developments. That's my intuitive 5c, happy for others to chime in. Best, Jutta |
Beta Was this translation helpful? Give feedback.
-
@ben-norton How do you see both standards integrating in an implementation? |
Beta Was this translation helpful? Give feedback.
-
@ManonGros & @mswoodburn : RE flattening As I understand it, one is developing/aiming for a normalized data model and its normalized implementation mainly for data quality and versatility reasons. Latimer Core provides a framework that supports such normalization and its resulting data quality and flexible use/applications. Thus, the decision of where to localize a solution in the continuum between normalization and flattening becomes one of purpose. What is the main business objective of the infrastructure? For biodiversity data infrastructures I can see three, not necessarily exhaustive:
As soon as we are not talking about expert infrastructures used by highly trained and experienced experts, all three purposes need to be covered simultaneously. Fortunately, there is no hard, fundamental conflict between the purposes. This is not a "wicked" situation. It's more a question of the least common denominator. To cover bullet number 3, the infrastructure needs to offer at least some of its data in high quality and with quality assurance measures. Which is exactly what you are planning to do, Marie. In that case, flattening the model at the same time doesn't really seem to make sense. Focusing on GBIF and considering what I know about your infrastructure system, I don't see the problem? Which simply shows that I'm not an expert, and should you and your colleagues be willing, I'm happy to learn. Here the (naive, simple-minded?) reasons from somebody who likely still has a much too theoretical approach to this: Intuitive interface for user acceptance and satisfaction: The "flatten but keep aligned" approach proposed by you, Matt might seem like a simple solution, but seems to create even more complexity elsewhere. Mirroring a property/field by adding a copy elsewhere requires a set of defined validation rules, constraints, synchronization processes, etc. to assure data integrity and (thus) quality. Thus quality assurance is lifted from the data model level to an additional level in which rules and constraints need to be explicitly defined (instead of being in many ways implicitly implemented at the level of the normalized data structure). What I am seeing is that this level of validation rules is extensive and needs to be maintained, requires consideration whenever only a small update is made since large impacts might result and the question is how to find the source of an error once one become apparent. Wouldn't one rather avoid expanding that level even further? Search Search isn't monolithic. Depending on use case, there are different approaches that are themselves optimized for different search questions and contexts. Thus, flattening a property might provide advantages in some search contexts, but disadvantages in others. A set of complementary approaches seems promising:
For these kinds of searches it seems important that the entered keywords are provided exactly as recorded in the indices and that name variants, eg. for location and agent are easily searchable in one go. With names normalized this should be straightforward, flattened into different properties for birth name, latinized name and "primary" name it might be difficult to find all the fitting records (I seem to regularly have a hard time finding institutions by name in GRSciColl, both on the web frontend and the registry). Similar for MIDS-level even if it seems less of a problem: flattened out into level-specific properties it might be quite difficult to develop the correct search filter.
In sum, even after way too much thought and text, why would you, Marie & Matt, consider flattening the MIDS-properties? What parts of search do you consider difficult? |
Beta Was this translation helpful? Give feedback.
-
I would like to get a sense of how widespread are MIDS level percentages when characterising collections. Would it make sense to have them defined as LtC Terms?
In the context of the GRSciColl roadmap (https://scientific-collections.gbif.org/road-map#2-support-structured-collection-descriptors), I want collections to be searchable by a number of fields. Right now in LtC, something like the percentage of specimens digitised at a particular MIDS level has to be formatted as measurmentsOrFacts. This makes sense but this isn't something that could easily be implemented for searching.
I would like a sense of how many institutions track MIDS level (and would potentially be able to provide those) and how interesting it would be for find collections based on the digitisation level of some groups of scientific names (for a given taxon for example).
I am tagging @mswoodburn since I think this is something that the MHN has. Maybe folks from the MIDs task group could give their input (https://github.com/tdwg/mids), @CatChapman?
Beta Was this translation helpful? Give feedback.
All reactions