Skip to main content

4 April status - GPU troubleshooting, more node parameter UI, and a website for ChRIS

· 9 min read
Máirín Duffy
UX Designer @ Red Hat

Sandip, Parul, and Rudolph talked about errors Sandip was hitting with GPU code, Gideon demoed the latest iteration of the node parameter UI, and Rudolph discussed setting up a new website for ChRIS.

screenshot

  • GPU work completed and tested
  • Parul would like to show how to run the workflow on the MOC, test against GPUs on MOC

GPU Running out of space issues

Multiple files created inside the plugin, figuring out a way to store those, how to refer to those folders

  • Jorge: when you work on overwriting the RAM method, pass an options object. That options object already has an input dir and an output dir. Use that output dir.

  • Sandip: Only the input and output dirs... have some indeterminate inbetween. If I could mount them and store them and later on refer to them...

  • Parul: Are these files generated by the plugin or are they needed by the training pugin?

  • Sandip: They are generated inbetween, intermediate

  • Rudolph: If they're generated inbetween, I don't understand the difficulty. Plugin has input and output dir from outside world. In the intermediate space can store whatever in its own local filespace. The story is that anything the plugin wants to preserve for something downstream, it has to write to output dir. It can create as many files as it wants inside its own container space though, there's no limit there, it doesn't have to go to output dir.

  • Sandip: I was looking at the size of the images of my application - it was growing. I was trying to think of how to put a check on the size. The intermediate files create a lot of space. Concerned about it taking up too much space. Rudolph: If you keep saving them, say your python plugin saves to /tmp - that's fine, not a problem. The only thing you want to keep is your final output model - save to the output dir. As soon as plugin is finished, all that stuff in /tmp is gone. Only thing left is in output dir.

  • Sandip: Okay that helps. I can refer to stuff in output dir, a folder for weights, a folder for mpy fodlers, another folder for png files.

  • Rudolph: Sounds perfect, anything you want to save, just put it in the output dir. If you don't need it just keep it in container space.

  • Jorge: does OpenShift have a limit on how much space a container can use? Large volumes of files, MOC makes sure it has enough space to write, right?

  • Parul: When we create the job, we give space to the container and share the volume, but I don't think space usage is a concern. What was your thought process Sandip?

  • Sandip: I saw that the container was taking a lot of space and was worried it would run out of space and was looking to optimize it. Sometimes I would get a weird message about the GPU ran out of space and could not process. When I cleaned up files and tried again it ran.

  • Parul: When you run your workflow on the GPU, when there were too many intermediate files generated, it didn't work until you deleted some images?

  • Sandip: That is what I suspected. Don't know for sure.

  • Rudolph: The GPU running out of space shouldn't be related to container size, could be a RAM issue.

  • Parul: Or could be a code optimization issue. The files inside the container aren't residing in memory, so they should not affect the GPU. Your GPU is more like your RAM.

  • Rudolph: "Image" can be a confusing term. In docker, image means all the files on the hard drive in the container, that bundled group is the image. But for a GPU the image is the active memory - the textures / data / etc you're putting in the GPU. No relationship to size of a container image to a GPU image. GPU is just whatever you write to it, just like RAM.

  • Parul: Will you have a demo sometime soon?

  • Sandip: Sure, next meeting

Gideon Status

Gideon demoed his progress on the node parameter configuration UI.

https://raw.githubusercontent.com/FNNDSC/cube-design/master/CHRIS-UI/screenshots/node-config/add-new-node_04-02-2020_01.png https://raw.githubusercontent.com/FNNDSC/cube-design/master/CHRIS-UI/screenshots/node-config/add-new-node_04-02-2020_02.png https://raw.githubusercontent.com/FNNDSC/cube-design/master/CHRIS-UI/screenshots/node-config/add-new-node_04-02-2020_03.png

Issues

  • Rudolph confused by "guided configuration" language - we should rethink the verbiage there.
  • Once you've filled out the parameters and hit next, the back button from the review screen always goes to guided config, not the full text. If you last visited the full text version, is there a way to make it sticky so when you hit 'back' you go to the full text version?
  • The user is not exposed to providing the input and output dirs when configuring plugin parameters, so we don't have to worry about enabling that here.
  • Required parameters aren't input into the freeform editor as a suggestion right now, maybe a good idea to do this?

Design discussion for parameter inputs

We had an active discussion about the design of the parameter input screen. There's two modes - a "guided configuration" mode that has dropdowns pre-filled with the parameters available for the plugin and help text for each parameter. There is also a "full text" or "advanced" mode where you type out the full parameters yourself as if you were on a command line, and you have full freedom to type whatever (even make mistakes that will never succeed.)

  • We had recently decided to have a common buffer of the command either 'side' (guided, or full text) generated, but in the demo there were some inconsistencies in the representation of that buffer (e.g., you can type stuff in the full text field that can't be represented in the form-based guided mode.) The inconsistency means users will not understand there's a common buffer and can be confused about what will actually get sent to the plugin / MOC once they click the "next" button.
  • The "next" button is a pretty heavy / intimidating button, because of the lack of clarity as to what will get sent to the server.
  • Mo talked about a similarity between this UI and the kickstart configuration UI in RHEL (system-config-kickstart and the RHN Satellite kickstart config UI.) It has a similar problem in that some users prefer to handwrite or have a library of handwritten ks files they don't want to input into a mass of form fields, and also the format supports freeform scripts and the form has to enable that. Any mistakes mean you're deploying a system that gets hung. Mo went back to the designs of those tools and notes around their design, and noted:
    • The RHEL tool, is just the form, and it spits out the kickstart file in plaintext and then you can just edit however you want afterwards in a text editor. It's one direction, so to "go back" you have to fill out the fields and generate a new/fresh ks file to then hand edit in an editor. You also have to input the ks file itself into the system yourself (eg on a PXE server or bootstrap a new RHEL system with it)
    • In the Satellite tool - which is web-based and closer to what the ChRIS plugin parameter UI does -it has two modes:
      • One mode where you fill out the form fields and there's a freeform text area at the end where you can add stuff that isn't covered by the form fields. you don't get access to the output of the form fields to edit it.
      • Another mode where you literally just upload a plaintext ks file to the UI and it passes that to the backend to be executed... you don't have any option to work with the form but you can manually update the plaintext.
    • Mo observed: "in pretty much every case of software interfaces that are in any way analogous to what we're trying to do... it seems like they avoid trying to have multiple ways of editing the same buffer im guessing because of the complexity of the interaction"

Gideon and Mo decided on the following approach:

  1. Have one mode with free form fields and a text area with an auto-generated text (The use-case for a freeform text area won't necessarily apply here right? All the parameters that are required by the plugin are present in the dropdown and appending missing parameters will lead to an error)
  2. An Editor mode where you have the option to copy paste text (We can provide a plugin config document below as help , like the previous UI). If you go back to the form, we can display a warning saying, 'your changes will be lost'.
  3. The review page can act as a final sanity check.

Rudolph Status

  • Continuing to work in javascript on pacs query tool
  • Concurrently with Parul and some students, working on additional items with MOC including Power9 ChRIS plugins, should be coming online soonish. Ability for chRIS to do stuff on MOC and allow Power9 plugins to work too
  • Sandip is looking at students working on testing GPU type plugins on the MOC
  • Jorge and I working on possible homepages for the project. Bought chrisproject.org, https domain, hosted on one of our hosted sites, in motion hosting, hooked up a wordpress on it
  • https://chrisproject.org - http goes somewhere else
    • a wordpress site
    • does anyone want to jump in on the wordpress? wordpress is one of the hosts options
    • my daughter is trying to teach herself wix site building, keen to help with wordpress stuff, anyone with suggestions or ideas
    • jorge - issue with cert, cert is not valid
    • want to set something up in the spirit of celeryproject.org