Week 5

This is it. This is the half way point of the REU. I’m really proud of everything I was able to get done this week. We started looking at data and moved into the Bayesian structure learning phase. Next week, we hope to identify some interesting, common connections in the network structures. Dr. Natarajan is optimistic that we will be able to completely finish our project by August, which is the deadline for papers in the BIBM conference. I would be absolutely thrilled if this research gets published, especially in the time frame of  a few months.


Goals

  1.  Figure out how to make a Bayesian Network from the CARDIA data
    • After a long process of finding a structure learning program, I settled on bnlearn in R. Using this, I was able to implement a hill climbing algorithm using both BDe and BIC scoring metrics to create a model of year 0 data. I need to iterate on this to deal with missing values and better define the layers of the models. I might also look into using the bnstruct library.
  2.  Create a professional website
    • The website is online to view here. It still needs a lot of work.
  3.  Get something written in each section of the paper
    • Since we redefined our research topic, we are having to rewrite pretty much everything we had written. It’s been hard to keep up with the paper deliverable deadlines since I have been working about 12 hours a day just keeping up with the research goals.

Weekend

This weekend was full of fun activities, especially because my parents were visiting.

On Saturday, we spent several hours admiring people’s work at the Arts Fair on the Square. Perhaps the most unique thing I saw was a stand where poets from the Bloomington Writers Guild were making Poetry on Demand. I just talked to them for a few minutes, then looked at other things for a little while. When I came back, Eric Rensberger had prepared a poem about my hometown. It was even typed up on an old-fashioned typewriter!

Week5_BatesvillePoetry.jpeg

For lunch, we went to the Taste of Bloomington food festival, which was happening just down the road. It wasn’t how I thought it would be, but the atmosphere and food was good.

In the evening my parents took me and several of my friends to watch standup at the Comedy Attic. It’s supposed to be one of the best comedy clubs in the country, so I’m glad I had the chance to go.

Sunday was Father’s Day. I gave him this card I made when practicing my paper circuit skills.

Week5_Card.jpeg

Dad also picked out the church we went to. He decided on The Salvation Army Church. I had no idea that The Salvation Army had it’s own denomination. Apparently not many people do, going by the first sentence of their website. I really liked how happy the church looked, as well as the focus on helping children and the elderly.

Week5_church.jpg

Before heading home, my parents and I went out for a Father’s Day lunch. I am thankful that they drove all the way up to Indiana just to see me for a weekend. It was a lot of fun!

Week5_FathersDay.jpeg

Monday

The SOIC (School of Informatics and Computing) camp was this morning! We taught campers how to make paper circuits, which was really fun. My table made some pretty sweet cards.

Week5_SOIC.jpeg

In the afternoon, Nandini met with Rob and I to explain how the data file she sent us was organized. I was expecting for all of the data to be in a massive CSV, but we got the cleaned .txt version from an earlier project. Each data point had the same format:

feature(ID, value, year)

This format was good for the temporal modeling they were doing at the time, but isn’t helpful for our project. When we talked to Dr. Natarajan today, he said that we are going to be focusing on looking at intra-time slices rather than inter-time slices. This means that we need to have the data for each individual year, rather than combined data with year markers.

Basically, I am supposed to code up a file reader today that puts each person’s data into a tab separated .txt document that can then be put into a .csv file and uploaded to a structure learning package to be analyzed. While I work on this, Rob is continuing to work on figuring out a good structure learning package or software to use.

Tuesday

This morning, everyone also got to get a professional photo taken. I am really happy about this because I didn’t have a nice picture to put on my networking materials.

Week5_professionalpic.jpg

Most of my morning was spent improving upon the file reading code. I made a function to split the data up by year. I also made functions to get the mean, max, and mode (deleted) of each person’s cumulative temporal data for each feature. You can view the code on my section of the ProHealth GitHub.  If you look at it, you can see that a lot of it is very repetitive. It was annoying to have to do that much copying and pasting. Looking back, I should have just made a function to do the appending part and just used each needed dictionary as a parameter. Oh well.

I also read another one of Dr. Natarajan’s paper’s: Early Prediction of Coronary Artery Calcification Levels Using Machine Learning. Rather than Bayesian Networks, Dr. Natarajan used SRL methods for this analysis. The paper raised a few questions that I plan to ask Nandini or Dr. Natarajan later.

In the early afternoon, Dr. Natarajan invited Rob and I to watch Phillip Odom practice his dissertation. I had never seen a dissertation before, so I said yes without hesitation.  Phillip’s presentation was titled “Effective Human-in-the-Loop Learning in Structured Noisy Domains”. Basically, this SRL boosting algorithm asks an expert for advise periodically and adds the advise gradient to the data gradient. This reduces both the quality of data and expert time needed to construct a good model. It was a very good presentation; Phillip was obviously well-rehearsed. He had practiced so much that he knew exactly how long the presentation took: 44 minutes. I can’t imagine giving a presentation that long, much less doing so much research that it takes 3/4 of an hour to summarize.

Wednesday

This morning was spent trying to rewrite the paper since everything we had written about behavioral modeling is void now. The paper is so far from being okay that I’m a little overwhelmed. Mostly I worked on a background section to explain what Bayesian Networks are and how algorithms can find the best network structure given data. Rob did the peer-review response spreadsheet, which you can view here.

In the early afternoon, Nandini, Rob, and I were able to meet with Dr. Natarajan. He told us that by tomorrow he wants us to both use a different Bayesian structure learning package  to learn a network for the year 0 data.  That made me nervous because I’ve spent the last week doing a lot of coding while Rob was trying to figure out the Bayesian learning packages. Also, my Wednesdays after lunch are just very busy in general, mostly because of Wednesday Workshop.

During Wednesday Workshop, we talked about networking. The main thing I learned was that during conferences it is better to take the “hallway track” and network with people rather than attend a lot of presentations. Also, it is important to do background research on the people you are trying to connect with.

The presenter also showed us lmgtfy.com, which stands for Let Me Google That For You. It’s a very passive aggressive website that demonstrates how to input a question into Google. I can see myself using it a lot.

I’m not going to detail each of my frustrations in trying to find and use a structure learning package because that would take a while, but I will summarize. I tried to use bayespy and openbayes in Spyder, but then Spyder began shutting down whenever I tried to save or open a file. Then I tried BDAGL in java eclipse, which didn’t work because eclipse is still sort of broken from a couple of weeks ago and I never uninstalled/reinstalled it.  After that, I tried making the structure directly in Weka and Netica. In Netica I got pretty close, but the specific learning algorithms I needed to run were not there. Also, since I am using the lite version, I couldn’t save my 17 node network. Looking online for more options, everything good seemed to by in MATLAB or R.

Since I didn’t want invest in MATLAB, I decided to try my luck with R. The bnlearn package looked like it had pretty much everything I needed, other than the ability to put nodes in layers. It actually wasn’t too hard to learn the basics from the main website and a lot of Googling. After a few hours I  was able to display my first network. I still need to make all of the values discrete so I can run different scoring methods, but at least it is outputting something that looks like a network.

Week5_BN1.png

Thursday

Once I got into the lab, I figured out how to “cut” the data into different discrete values within R.  However, I wasn’t really sure how to break up the values. I also learned about “blacklisting” nodes so that certain nodes could only be parents. As you will see below, I didn’t blacklist all of the necessary nodes because I am lazy. I will do it later. Before meeting with Dr. Natarajan, I printed out the following networks to show my progress. I used the BIC and BDe scoring metrics. I also tried to do MI, but I made a typo entering the name and when I ran it, my computer crashed. My computer is okay, but I lost my work.

Week5_BN2.png

Week5_BN3.png

Dr. Natarajan didn’t like that bnlearn doesn’t have layering, but he said that we could work with it for now. By Monday he wants me to have graphs for years 0, 5, 7, 15, and 20 using the hill climbing with the BIC, AIC, and BDe scoring metrics. Rob is doing the same, only using different structure learning algorithms. On Monday, we are going to tape them all to the wall and do “knowledge discovery” by comparing networks and identifying interesting comparisons.

The rest of the day was spent figuring out the best way to make the data values discrete. I ended up cutting the data into quintiles based on frequency. I also tried to get Rob started with a few tips and tricks I learned in R last night.

It was pretty late when I realized that I hadn’t done my peer review for the week. Though I was very sleepy, I tried my best to give detailed, constructive feedback on the paper I reviewed.

Friday

It’s my mom’s birthday! Happy birthday!

Week5_Mom.jpeg

Today had even more meetings than a typical Friday. The first was one was at 10. The slide show theme was animal GIFs, so everyone spent at least half an hour on GIPHY for “research”. My slide was almost entirely GIFs of pandas falling off of things.

I also made the first draft of my website during the pre-meeting time. You can view it here, but it is not nearly what I would consider “presentable”. I want to redo my resume, make a CV, and figure out how to better integrate other WordPress sites with my own so that people don’t have to click on a link to view my ProHealth blog or listen to Hendrix Compline podcasts.

Right after the first meeting, we had a midterm assessment. I thought it focused too much on mentor relationships. Though I understand the need to highlight areas for growth in mentor-mentee relationships, I feel that I should bring any complaints I have straight to my mentor rather than submitting “anonymous” feedback that could potentially create more conflict.

There was time to eat lunch and show Rob a few more things in R before the Friday Tea. During the tea we did a survey that showed what socioeconomic class we fell into and highlighted the fact that everyone has different lived experiences. We also did a Lego activity where some people had missing Legos or missing instructions. It was a pretty neat way to discuss privilege and the importance of understanding people from different socioeconomic backgrounds.

Right after the tea, I drove everyone who wanted to go to the Cyber-infrastructure building. We did a two hour tour that included seeing Big Red II, which is IU’s supercomputer.

Week5_BigRedGroup.jpeg

I also sat in nearly every sort of chair in the building, which was an adventure in and of itself. They were all pretty, but very few were comfortable. My personal favorite was the “hotwheels” bench.

Week5_hotwheels.jpeg

My evening didn’t contain any more adventures; I’m tired from today and this whole week in general. Gabby and I did give the kitchen a good clean, though. It was very needed.


Unknown Animal of the Week: Blanket Octopus

This octopus’ name comes from the webbing connecting two pairs of legs. Perhaps the strangest thing about this species, however, is the extreme size difference between males and females. Female Blanket Octopi are around 2 m long, whereas the males are typically just over 2 cm long.

https://mrbarlow.files.wordpress.com/2011/05/blanket_octopus.jpg