09 April status

April 9, 2020 · 6 min read

UX Designer @ Red Hat

Sandip demoed an image segmentation model training setup, combining multiple outputs in a tree discussion, admin panel ui, website

Sandip demoed a AI system that can be trained to look through raw and segmented versions of images, to train it to a new model.

has input data for different subjects. have image and their segmented images - raw. want to use this data to train a new model. have a set of test data to predict the outcome.
my goal is to train a model to learn and predict segmented outcomes.
two different plugins, pl-converter, pl-mc*
needs to convert zzz files to something the system will understand, is using a png converter to convert to png.
- docker application to do the conversion
- inputs directory, needs docker plugin to create output directory
- converts each of the files, slices it into PNG files. takes around 20 seconds, because i have 10 subjects. if amount of data grows then this time will significantly grow.
- if you go in the output folder you can see all the subjects... go back to unsegmented data see the same folders unsegmented. this is now ideal for my training because this format is understandable by the training tool
now i need to run my image on the processed input data (in PNG format)
- has a script saved to run the training
- runs the training, this runs on the GPU.
- with the CPU took 1 h 30 m, will take 2 min with GPU
input data are mgz files. hard to see in PNG files... do the PNG files have different shades of gray?
what is the architecture of the trainer?
- u-shaped, is an accurate structure, left side is downsampling, and on the right side is upsampling. so it's a u structure.
- one mgz file is sliced into 256 PNG files. if i have 10 different subjects,
- as you have more subjects, input layer becomes bigger and bigger and bigger. will there be a limit in the size of inputs you can train because GPU will run out of memory?
- S: has done 300 subjects, doesn't run out of memory, but training process pretty slow.
Parul: if you can modify this to run on the MOC, you could present this at DevConf (if it happens.) That would take it to the next level.
Rudolph: are you using stoi for this?
- S: no, I tried it, but I was getting some yum issues, if I use CentOS. If I use Ubuntu base container I can use.
Rudolph: you can do this through the UI right now too, don't have to run manually
S: How to get working on PowerPC and x86? What do I have to change?
- Rudolph: we have a powerpc machine you can log into and run the container.
- Bill: when it comes to get back to stoi stuff, come to me, I can show you how to run that. There's an ansible tool that detects the package manager and can resolve package / yum issues.
- Parul: you don't need to change anything in your image to have it run on MOC. Once you create an env and have it ready on MOC, you can run any x86 image, irrespective of resource being used. You can use your own repo. All you need to have is create an env of your own (or can use current radiology env but can't debug bc no access, sobetter to create your own.) If you need any help, go through the recording of our session and ping me.

-Rudolph: what is the difference between ground truth and freesurfer data for this subject? (See screenshot.) 10 subjects, 50 epocs. The prediction here is not that good, but with more data and epocs we'll get a better one, so we have the workflow now and it's a huge win, and we can get the better prediction using it.

Rudolph: has some ideas about what to put on end of workflow. quantitative error signal, showing difference between two images (ground truth and prediction) would be great. Will be optional bc don't have ground truth for each one.

Mo status

Have been working on chrisproject.org.

Bill

Team reorg, not much progress.

Rudolph

happy with chrisproject.org site, have an actual .org page now, long overdue :) central point to push people too
continuing to work on pacspull javascript standalone app, dont have it hooked up to get data pulled from swift storage yet. trying to handle all vagrancies of what a data set can look like between studies, series, etc. could show a demo at some point.
we've been thinking about in discussions, need to think about within a workflow in ChRIS, plugins that can actually do stuff on the graph structure of an execution tree. plugins that can take a node, and automatically create child nodes from that. like where we had Sandip show a folder with 10 subjects. there might be a need in a processing stream to take that input folder and create 10 separate child descendents from that, ecah child having one of the subjects in it. how do we combine that again - multiple outputs - meaningfully in the current chris framework. we've been talking about these kind of things. goes back to a discussion we had a long time ago - we thought about harvester plugins. if you have a workflow, 5 nodes down and suddenly want data that was output by a node a couple nodes back... how do we do that meaningfully without breaking things. we have some ideas, but we've been doing some intial thinking along those lines. we typically have this discussion in the slack channel.

Jorge

Been working on the ChRIS admin UI as chris user. Demoed it.
Can manage resources and plugins. Can change compute resources per plugin.
Can register plugins with name and version, or with the ChRIS store URL.
Type of plugin, that comes from the plugin representation. ds or fs. ds = data plugin, fs = file plugin.

Rudolph: what is the current state of the ChRIS store? Jorge: Hasn't been much work on the ChRIS store. The last thing I did was created all the issues I talked about. It's very clear what work has to be done. Joe said he could work on that but is busy at the moment. We'll need some developer who could work on that - it's not as complicated as ChRIS. There is some backend work that has to be done too - the stars for the plugin. But mainly what we need is filed in the issues under the project. Could be good work for an intern.