Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for 116 #236

Merged
merged 3 commits into from
Oct 24, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,8 @@ class DataImportService {
if (file.getName().startsWith(EML_FILE)) {
//open the XML file that contains the EML details for the GBIF resource
def xml = new XmlSlurper().parseText(zipFile.getInputStream(file).getText("UTF-8"))
contacts = emlImportService.extractContactsFromEml(xml, dataResource)
def result = emlImportService.extractContactsFromEml(xml, dataResource)
contacts = result.contacts
}
}
}
Expand Down
80 changes: 60 additions & 20 deletions grails-app/services/au/org/ala/collectory/EmlImportService.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ class EmlImportService {
def extractContactsFromEml(eml, dataResource){

def contacts = []
def primaryContacts = []

emlFields.each { name, accessor ->
def val = accessor(eml)
Expand All @@ -123,10 +124,10 @@ class EmlImportService {
}
}

//add a contacts...
//add contacts...
if (eml.dataset.creator){
eml.dataset.creator.each {
def contact = addContact(it)
def contact = addOrUpdateContact(it)
if (contact){
contacts << contact
}
Expand All @@ -137,36 +138,75 @@ class EmlImportService {
&& eml.dataset.metadataProvider.electronicMailAddress != eml.dataset.creator.electronicMailAddress){

eml.dataset.metadataProvider.each {
def contact = addContact(it)
def contact = addOrUpdateContact(it)
if (contact){
contacts << contact
}
}
}

contacts
// Add additional contacts
if (eml.dataset.contact){
eml.dataset.contact.each {
def contact = addOrUpdateContact(it)
if (contact){
contacts << contact
primaryContacts << contact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change to only use eml.dataset.contacts as primaryContacts is OK?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what my data manager suggested.

}
}
}

if (eml.dataset.associatedParty){
eml.dataset.associatedParty.each {
def contact = addOrUpdateContact(it)
if (contact){
contacts << contact
}
}
}

[contacts: contacts, primaryContacts: primaryContacts]
}

private def addContact(emlElement){
def contact = Contact.findByEmail(emlElement.electronicMailAddress)
if (!contact){
private def addOrUpdateContact(emlElement) {
def contact = null
if (emlElement.electronicMailAddress && !emlElement.electronicMailAddress.isEmpty()) {
String email = emlElement.electronicMailAddress.text().trim()
contact = Contact.findByEmail(email)
} else if (emlElement.individualName.givenName && emlElement.individualName.surName) {
contact = Contact.findByFirstNameAndLastName(emlElement.individualName.givenName, emlElement.individualName.surName)
} else if (emlElement.individualName.surName) {
// surName is mandatory
contact = Contact.findByLastName(emlElement.individualName.surName)
}

// Create the contact if it doesn't exist and it's a individualName with email or surName
// to prevent empty contacts (e.g. with emlElement.organizationName only)
boolean hasEmail = emlElement?.electronicMailAddress?.text()?.trim()?.isEmpty() == false
boolean hasName = emlElement?.individualName?.surName?.text()?.trim()?.isEmpty() == false

if (!contact && (hasEmail || hasName)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inclusion of && (hasEmail || hasName) might cause null contact exceptions further along, if it is ever false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the case in our 500 contacts. Looking to the eml expecification:

image

The only case I can imagine is a contact with only organizationName and no email and no individualName that seems to me something quite useless for a "contact" (Are you suggesting that we should take into account the case of "contact: organizationName: CSIRO"?).

contact = new Contact()
contact.firstName = emlElement.individualName.givenName
contact.lastName = emlElement.individualName.surName
contact.email = emlElement.electronicMailAddress
contact.setUserLastModified(collectoryAuthService.username())
Contact.withTransaction {
if (contact.validate()) {
contact.save(flush: true, failOnError: true)
return contact
} else {
contact.errors.each {
log.error("Problem creating contact: " + it)
}
return null
}

// Update the contact details
contact.firstName = emlElement.individualName.givenName
contact.lastName = emlElement.individualName.surName
// some email has leading/trailing spaces causing the email constrain regexp to fail, lets trim
contact.email = emlElement.electronicMailAddress.text().trim()
contact.setUserLastModified(collectoryAuthService.username())
Contact.withTransaction {
if (contact.validate()) {
contact.save(flush: true, failOnError: true)
return contact
} else {
contact.errors.each {
log.error("Problem creating contact: " + it)
}
return null
}
}

contact
}
}
22 changes: 15 additions & 7 deletions grails-app/services/au/org/ala/collectory/IptService.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,16 @@ class IptService {
}
}

def emails = old.getContacts().collect { it.contact.email }

//sync contacts
update.contacts.each { contact ->
if (!emails.contains(contact.email)) {
old.addToContacts(contact, null, false, true, collectoryAuthService.username())
def existingContact = old.getContacts().find {
(it.contact.email && !it.contact.email.isEmpty() && it.contact.email == contact.email) ||
(it.contact.firstName == contact.firstName && it.contact.lastName == contact.lastName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with the context so this might be misinformed.

The firstName/lastName comparison may be a problem. For example:

  1. A contact has a new email (same firstName and lastName). This no longer adds a new contact with the new email. I would always want to add the new email.
  2. A contact has same firstName and lastName as an old contact. The new contact is not added. While unlikely, I do not want prevent two contacts with the same name from existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's problematic, but as email is not mandatory and we don't have a primary key, we should use the names.

For instance, this is the IPT:
image
Duplicated contacts but with different names or emails results in duplicate contacts. The IPT only fix these duplicates if you use exactly the same names and email.

image

I was thinking that probably a better solution for us to maintain contacts in sync, is to always remove all contact_for from a data resource and re-add with the new contacts instead of trying to compare and update. In this case we can have "orphans" old contacts without any dr, so maybe we can also check this and remove the orphans contacts (imagine a contact without email or a misspelled name that is corrected in a new version of a dr).

I was thinking of adding in a future PR and doing this PR something similar to what we have but improved and less drastic and easy to review.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this PR, when merging new information, it must not exclude more new contacts that it already does. I now understand it is already excluding those without an email or those that share an email (given that it is not the primary key).

At the very least, this name comparison line needs a change. I'll leave it to you if you want to address it in this PR or remove this line and address it a new PR.

I agree with your suggestion. It is preferable to:

  1. Add all new contacts. There will be a comparison of all contact fields to determine what is new.
  2. Remove old contacts (the ContactFor) that are no longer in the IPT source. This makes the assumption no contact information, for an IPT resource, will be manually altered in the collectory.
  3. Deleting of contacts that are orphaned (those without a ContactFor)

Having looked at the code a little longer I have another question. Previously the creator contact was added as a primary=true. The additional dataset.contacts and dataset.associatedParties are also added as primary=true. Should these be primary=false?

Copy link
Contributor Author

@vjrj vjrj Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we have a list of primary contacts and we check if it should be marked as primary or not. See below the part:
boolean isPrimaryContact = update.primaryContacts.contains(contact)
I think is better to continue with the rest of contactFor part in a new PR to simplify the review.

This PR already addresses the original issues described in #116. Currently the drs with contacts without email are all assigned (incorrectly) to the first contact without email. In our case:

image

}
if (!existingContact) {
// Add new contact
boolean isPrimaryContact = update.primaryContacts.contains(contact)
old.addToContacts(contact, null, false, isPrimaryContact, collectoryAuthService.username())
}
}
}
Expand All @@ -142,7 +146,8 @@ class IptService {
DataResource.withTransaction {
update.resource.save(flush: true, failOnError: true)
update.contacts.each { contact ->
update.resource.addToContacts(contact, null, false, true, collectoryAuthService.username())
boolean isPrimaryContact = update.primaryContacts.contains(contact)
update.resource.addToContacts(contact, null, false, isPrimaryContact, collectoryAuthService.username())
}
}
activityLogService.log username, admin, Action.CREATE, "Created new IPT data resource for provider " + provider.uid + " with uid " + update.resource.uid + " for dataset " + update.resource.websiteUrl
Expand Down Expand Up @@ -203,11 +208,14 @@ class IptService {
resource.isShareableWithGBIF = isShareableWithGBIF

def contacts = []
def primaryContacts = []
if (eml != null && eml != "") {
contacts = retrieveEml(resource, eml)
def result = retrieveEml(resource, eml)
contacts = result.contacts
primaryContacts = result.primaryContacts
}

[resource: resource, contacts: contacts]
[resource: resource, contacts: contacts, primaryContacts: primaryContacts]
}

/**
Expand Down