-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nd explore multi 2 #356
Nd explore multi 2 #356
Conversation
set those people to excluded. When exclusions occur when disease has not occurred, also set those people to excluded.
(1) Need to apply the death censor dates. (2) Need to harmonize field names.
rather than emitting null values.
date, set the censor date to enrollment date, rather than birthdate.
censor schema for the same purpose, and allow the missing_fields field to be NA if we actually have everything we need.
add ml4cvd files
* wip rehaul tensorizer * wip tensorizer rehaul * rehaul tensor writer, get metadata in tensorize * bug fixes * bug fixes and correctness regarding missingness * new voltage qc tmaps * default sampling frequency tmaps calculate based on length * multiprocess -> multiprocessing
Also * Remove obsolete instructions now that ml4cvd has the needed packages and permissions. * Use newer hd5 for ECGs. * Switch to local paths when on a ML4CVD VM.
* better generator stats, and stats multiprocessing bug fixed
* Plots histograms of continuous tensors in explore mode * Updates figure subplot size and file name * Fixes file extension
* wip multiple time windows * cross reference multiple time windows * help docs * which -> order, reduce output * exact/at least toggle, all/any window toggle, summary count formatting * group by join tensor and time tensor * global N per window, multilabel counts * description of counts * shorten line in output
* dynamic time series tmaps * time series persistence #305 and redo cardiac surgery tmaps * voltage _exact length tmaps, population_normalize -> normalization in TMap * validator for voltage * remove apollo xref, get newest surgery * fixes selection of mrn_col_name in _sample_csv_to_set * fix validator * warning -> debug * dsw infection * columns * outcome * prolonged vent column name * reformat voltage tmaps * explicit _pc tmps * type hint * delete redundant length and zero tmaps * use xref output csv to get newest surgery with preop ecg * adds train_simple_model (#317) * patient sex categorical tmaps * explicit voltage tmaps * sex tmap cats * dsw outcomes resolved * gender -> sex in plots * voltage stats * train/valid/test not useful in progress bar * report median, generator * consolidate simple shallow model * revert change * version TFA TFP #320 * fix abbreviations Co-authored-by: Erik Reinertsen <[email protected]>
* notebooks for mnist and hyperoptimization, survival analysis plots and documentation
* do not crash if sts data not found * log and do not call build if no sts tmaps * raise only when using sts tmaps
* define filter sizes per conv layer * multiline fstring * args in test * filter size per layer or block * standalone helper
* multiprocess -> multiprocessing * cover all paths * simplify * fix cardiac surgery tmaps bug * revert bug fix for separate pr * revert bug fix for separate pr * whitespace
* Fixed ecg_plot_rest to run from command line (--mode plot_resting_ecgs) with new tensors
* infer metrics * time range consistency with cross reference
* explore now allows multidimensional tmaps * tests for explore * Bug found in default cts tff
* user/group of ml4cvd output is no longer root * tf.sh options: run as root, set up jupyter. Closes #334 * root is no longer default * disable silent reporting for -j in tf.sh * many echo statements to check array vals * user added to all user's groups in docker -> bash * fix indentation * adds --env to printed tf.sh call
* #314 ecg voltage plots * #314 ecg voltage plots * Formatting #315 * rehaul plot mode to use calculated scales * readability Co-authored-by: StevieSong <[email protected]>
currently so like if a tmap were to return all ECGs from a hd5 that had 3 ECGs, there would be 3 rows in |
ml4cvd/explorations.py
Outdated
|
||
|
||
def _channel_explore_error_header(tm: TensorMap, channel: str) -> str: | ||
return f'{tm.name} {channel}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesnt seem like this function does anything different from _channel_explore_header
- also don't think this function is used
ml4cvd/explorations.py
Outdated
if tm.shape[0] is not None: | ||
# If not a multi-tensor tensor, wrap in array to loop through | ||
tensors = np.array([tensors]) | ||
for i, tensor in enumerate(tensors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this for loop iterates over the tensors in a time series - if a tmap is not a time series tensors, it wraps the tensor in another dimension to simulate a time series of 1 time sample
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did these different samples get distinguished in the output CSVs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if patient 123
has 2 total ECGs taken on 5/20
and 5/21
, a tmap is given 123.hd5
and returns an array [5/20 data, 5/21 data]
- in the output csv, the 2 ECGs are counted as separate samples, each ECG gets its own row in the output
New way to handle multidimensional
TensorMap
s that doesn't run into problems with validators or normalizers. Addresses #354. This PR is missing a test for variable shapeTensorMaps
. Can someone (@paolodi ?) explain what the expected behavior for explore is for time series or other variable shape tmaps?