-
Notifications
You must be signed in to change notification settings - Fork 4
Minutes ODEX use case team meetings
##Minutes meeting on 26-10-2016 Arnold, Anand, Wytze, and Anneke at DTL
Agenda points in italics will be dealt with in the next meeting
####Announcements
IOS Press
Deadline for detailed use case description and content expert communicated to Astrid.
SPOT visualization
Network visualization is added, infrastructure updated. Anand will be updated tomorrow. Further developments will not take ODEX budget.
Data integration (status, documentation)
Delayed, postponed every time. NIZO and DSM usecases can run into problems. Communicate with Albert/Luiz and the respective partners. Data will be ingested over the weekend and NOT wait for complete VLPB data sources set (no decision yet on OMA/Ensembl COMPARA).
PigQTLdb data import document serves as a template (heading and descriptions added), standard is not possible for different data sources formats. No further detailed discussions. Docs provided by Aram will be carefully read and checked and amendments should be communicated with Aram. Docs are checked by use case/workflow owner. Anneke to send docs DSM data sources (when received from Aram) as well as HumanCyc doc to Anand.
Anneke to ask Aram for data ingestion reports (unitesting).
Planning December
Wytze: no holidays
Anand: last 2 wks Dec/2 wks Jan
Arnold: last 2 wks Dec
Anneke: last wk of Dec
ORKA
Triple annotator (ODEX product). Availability and use will be discussed with partners interested (Bayer, VLPB) at partner meeting.
####Updates use cases we work on:
Roche
Research question redefined. Target genes - diseases. Output from EKP target identification workflow for confirmed targets, random genes - random diseases set. Not completely clear yet how all measures (eg functional association, drugability, are calculated). Anand/Arnold will discuss machine learning techniques with Wytze after meeting.
NIZO
Wytze showed first output results of workflow for direct and indirect interactions.
Next, start discussing refinement steps (filtering etc.) also with Aram
SPOT: no possibility to display provenance (PubmedID, source and evidence)
Code: blocks with wrapper -> Taverna for biologist to play around without having to work with the code
B4F
Arnold: PigQTLdb (GFF) using OpenRefine. Next, Ensembl (all annotations genes, functions, orthologs, disease associations) and proprietary GWAS data (different breeds) (Virtuoso only).
VLPB
Arnold: Virtuoso 3 months to add SGN (GFF, mappings to UniProt missing). Generated for CandyGene project. Currently, 3 graphs: SGN HEINZ and wild tomato, Ensembl HEINZ (no wild tomato available). Could be cross-mapped through the genetic markers. Next, Ensembl - UniProt - links out anywhere. Phenotype data will be the mutation database (TGRC, micro_tom mutants).
- Ranking (set enrichment) outside Virtuoso
- Visualization tool for network is available
- Interface only if there is time left
- To be documented
RijkZwaan (former collegue of Arnold) is working on Semantic web, interested in a training. How this has been built? how to add proprietary data?
Wytze: when is workflow output feasible? Building knowledge base instead of creating workflow output. Is it not too much?
Discussion: present it as a generic platform to other partners. Different subjects, different (technical) solutions. It's a learning process and we are educating the partners about linked data (integration). Euretos cut corners, because of the semantics, deliver earlier.
Lessons learned: think big, act small (lots of nice to haves, but choices have to be made). This is recognized by VLPB and they are happy with plan B (Virtuoso and table mining instead of text-mining).
DSM
Research question: how do these genes have a role in butanol-1 tolerance.
Anand used: Jupiter Notebook, Python (infrastructure and notebook), R (analysis)
- Downstream analysis SPOT, exploratory analysis with network graph
- R wrapper around EKP - possible to make a library out of it? Put it in CWL (turn into machine readable)
- Composite concept -> use intersections
- Data sources: STRINGdb -> put in all
- Refinements:
Wytze: don't search with free text
look at code together before consulting Aram. Inform partner after meeting with Aram.
Limagrain
Anand showed flowchart for patent-mining workflow
Prior art set = collection of TR abstracts
Search within patents (pattern matching)
Date is important
Text-mining: normalise the terms (stemming etc)
Generate doc - term matrix in two weeks
Validation will be done by Limagrain
Possible extensions: EKP or journal articles for prior art database. Ontology mapping is nice to have.
Text-mining plan
Text-mining will be done for the Limagrain use case. If results are added to the KP (nice to have), the search terms will be mapped to ontologies later. In principle, the text-mining and analysis of the results will be done outside ODEX KP and for that reason we will use the keywords provided by Limagrain and not ontology terms at the start.
Text-mining was communicated as outside the scope of the project. Textminig for B4F, NIZO and DSM was deprioritised (without opposition of partners).
Text-mining was discussed last week with VLPB. Text-mining will be replaced with table mining (more relevant). Text-mining terms collected by VLPB will be used to select relevant papers, identify traits in the tables? Gold standard?
Textmining to be discussed with Bayer: table mining will be proposed (QTLs: Gramene, textmining: STRING?)
Engineers are no curators
Set up text-mining pipeline as general as possible
####Other points related to workflows
Workflow refinement
How to deal with many different questions - through very flexible interface to EKP/data set. Anand/Arnold set up meeting with Wytze with eScience to discuss this.
New concepts: final guideline
-> tbd next meeting
Updating scheme: proposal
Proposal to run the update every Monday morning. Run scripts every week before updates. Wytze will contact Aram.
Documenting/Reporting Github
-> tbd next meeting
Partner contact (content team, partner meeting, planning)
-> tbd next meeting
Data output/visualization
-> tbd next meeting
####Other?
Extension: if there is no budget left, Wytze will stop working on the project from March 1. eScience engineers work all possible hours on ODEX.
Division of tasks, overflow of meetings: agreed upon two-weekly focussed meetings, alternating Skype and f2f. Discuss use cases plenary.
####Agenda Agenda points in bold need thorough discussion
Start 10.30 am
Announcements (15')
- IOS Press
- SPOT visualization
- Data integration (status, documentation)
- Planning December
- ORKA
Updates use cases we work on:
- Roche (15')
- NIZO (15')
- B4F (15')
- VLPB (15')
5. DSM (25')
6. Limagrain (25')
7. Text-mining plan (VLPB? Bayer?) (25')
Other points related to workflows (40')
1. Workflow refinement
2. New concepts: final guideline
3. Updating scheme: proposal
4. Documenting/Reporting Github
5. Partner contact (content team, partner meeting, planning)
6. Data output/visualization
Other?
We aim to finish the meeting around 13.30
15.30 - 17.00 Telco with Bayer (unfortunately the only time slot possible)
##Minutes meeting on 05-10-2016 Anand, Anneke and Wytze by Skype
####Agenda
[11:14:07 AM] Wytze Vlietstra: Butanol tolerance concept
[11:14:16 AM] Anneke Sijbers: how to communicate with DSM
[11:14:23 AM] Wytze Vlietstra: Text mining
[11:14:28 AM] Wytze Vlietstra: Euretos meeting next week
[11:14:32 AM] Wytze Vlietstra: NIZO use case
[11:14:37 AM] Wytze Vlietstra: IOS press use case
[11:14:44 AM] Anneke Sijbers: partner access to wiki?
[11:14:46 AM] Wytze Vlietstra: Roche use case?
[11:15:12 AM] Anneke Sijbers: Limagrain use case
[11:16:55 AM] Wytze Vlietstra: Data integration overall
####Text-mining
Anand will do the patent/text-mining for Limagrain.
Wytze will cancel text-mining session at ErasmusMC.
####Data integration overall Regular Skype updates with Aram. Communication improved.
-
Discussed new planning
Sept: ChEBI, B4F
Oct: B4F, DSM, NIZO
Nov: VLPB, Bayer
Dec: Text-mined data -
Status Of note, Euretos is updating the data integration street. Integration of our data sources will be done next week.
-
Received first documentation on data sources to be integrated (needs to be checked with Arnold/Egon?)
-
VLPB and Bayer use cases to be discussed with Aram this month
####Euretos meeting next week
Most important point on the agenda is to discuss our way of working. Would be useful if Anand and/or Arnold could join this meeting.
Anneke has sent links to documentation to Egon. Tomorrow, Wytze, Anneke, Luiz and Egon will have a first Skype meeting to prepare our meeting with Euretos.
####DSM use case
Anand generated a workflow in EKP (see wiki)
Short list of genes (16) → map to concepts
Fetch all concepts for resistence to chemicals and sub categories (9)
Issue here is the composite concept 'resistance to chemicals' and 'butanol-1' → indirect paths only
How to include indirect concepts. Aram was going to send a scheme how to circumvent this issue in a workflow (via pubs?). Anand and Wytze: preferably merge the two concepts, but in that case all resistance to chemicals - chemical should be merged. Euretos thesaurus not consistent anyway, indexing no argument, more convenient for adding data later.
Anand proposed to collect info on GO etc outside EKP and merge it with the output data for Priscilla. But DSM is more interested in the possibilities of EKP rather than the actual data.
Output workflow
1. Graph visualisation triple viewer (Java)
2. SPOT (eScience Center tool)
Visualization SPOT and integration of graphical viewer can be done at eScience Center for DSM use case
Anand will generate a docker with code, output and visualisations for DSM
Anneke will set up a meeting for Anand to show and discuss these preliminary data (Wytze or Anneke joining?)
####IOS Press use case
No reply from IOS Press to reminders
Wytze wrote code (like migraine case)
Parkinson's - genes/proteins/ - genes (all)
1. relation all species
2. relation only human
→ thousands of indirect relations
Suggestion Anand: check out HMDB PD info metabolite - gene - bodyfluid etc.
Wytze will send triples to Anand for first analysis with SPOT
Look at IOS Press data together on Oct 12 or plan a meeting later if needed (Anneke).
####Roche use case Use case changed a bit. A list of non-effective drugs is hard to find to solve the original research question. For the new approach, Aram will produce data using the target identification workflow for Wytze. Wytze will apply machine learning algorithms to improve ranking of candidate targets.
####Limagrain use case
Anand will read documentation
Introduce Anand to Nicolas Heslot
Literature-mining needed? Pubmed and/or Elsevier (abstracts/full articles?).
Check out 290 abstracts from TR
- title terms not separated
- which abstracts to mine?
- Which date to use?
- Include distance of words?
Wytze will check availability of Elsevier API
####Bayer Cropscience use case
Introduce Arnold and Anand to Jens Hollunder
Discuss QTL table mining tool, define text-mining and workflow design requirements
####NIZO use case Wytze has rewritten the scripts for the new API and produced the first output. At first sight:
- It found more direct relations than last time
- Table of indirect relations needs some adjustments
- Anneke will use SPOT to look at the output data
- Wytze will link names to the semantic type identifiers to allow for filtering
- Amount of data is not a problem at this stage
####ODEX Github Partners can have access to the ODEX Github (it is open anyway)
Based on the meeting on 07-09-2016, we have agreed upon the following:
- Weekly meeting for the next few months. These will be through Skype or face-to-face at 10:00 at the DTL offices.
- use case comes first, do not dependent only on the Euretos Knowledge Platform (EKP) particularly in the prototyping phase, which requires flexibility and agility
- create/update Wiki pages within the repo to document the progress on the use cases
- keep track of issues and original tickets/URLs from the Euretos Support System
- use primary data sources as much as possible and document e.g., download URL, format, version or release number and license (if provided)
- evaluate the data sets (and their distributions) to be integrated in terms of quality, consistency, completeness or conformance to community-agreed specifications or standards
ODEX4all