Week 6

This week was very productive. Dr. Natarajan thinks that the research we are doing will be ready to submit to a conference by August. I also got a lot of advise about pursuing a graduate degree from both Dr. Siek and Dr. Natarajan.

Also, I realized that the pictures I put in here don’t show up on any computer but my own, so I am just going to put in links. Sorry.


Goals

  1.  Find stories in the CARDIA data.
    • After constructing several Bayesian Networks and CPTs, several interesting connections are coming through. Particularly, I am interested in how race and education level influence whether a person smokes.
  2.  Finish my career interviews for Odyssey credit.
    •  You can view the interview summaries for Dr. Siek and Dr. Radivojac  here. I enjoyed having the opportunity to speak with successful professors; it helped me get more direction for my own career.
  3. Work on the paper.
    • I made good progress this week in the background section for both the data and algorithms used in structure learning.

Weekend

Not much happened this weekend, mostly due to how busy this past week was.

I woke up really early on Saturday to make an easy way for creating Bayesian networks in R. You can view train_input_examples.txt on my GitHub. For the most part, you can copy and paste the commands from the document into R and just change the file name for each year. Since Rob had basically the same task, I sent him the code and told him to replace my hill climbing algorithm with the three algorithms he used. This made the work Rob and I had to do significantly easier. It would have been a pain to type in commands separately for all of the networks.

One thing wrong that I noticed was that the bnlearn package was omitting a person from the analysis if they had any missing values. In later years of the study, this majorly affected sample size. I modified my file reading program to to replace an individual’s missing feature data with the mean of their feature data in observed years. This largely reduced the number of individuals omitted when constructing the network.

I then took a 5 hour nap. Mina barely dragged me outside to get pizza at Momma Bear’s.

On Sunday, Disaiah, Thomas and I went to the Free Methodist Church (picture), then out to lunch at Noodle’s and Company. Then I took another 5 hour nap. I woke up in time to get Indian food with my suite-mates and format the Bayesian Networks better before going back to sleep.

Monday

This morning I finished up the hill-climbing networks. There was one network for each of the three scoring metrics (AIC, BIC, BDe) for each year. I printed them all out and stapled the metrics together for each year.

The remainder of the morning was spent improving on the paper. I mostly worked on the data section of methods, describing the CARDIA dataset in detail. I also started typing up information about the three scoring metrics (AIC, BIC, BDe) I used in the hill-climbing algorithm.

When we meet Dr. Natarajan in the afternoon, he seemed impressed with the Bayesian Networks, especially for year 20. He told me to create a union and intersection of the AIC, BIC, and BDe networks for years 15 and 20. I spent a few hours doing that since it takes a good deal of concentration compare and contrast the connections between each network. For the scoring metric intersection network, I erased a few arrows from the BIC model in Paint. I added arrows to the AIC model when creating the union network. Here are links to the union and intersection networks of the scoring metrics for year 15 as well as the union and intersection for year 20. In the evening I helped Rob construct his union and intersection networks correctly.

Dr. Natarajan also told us to start working on the paper, particularly the background section. Rob is supposed write a page on the data while I write a page on the Bayesian basics behind our methods. I spent part of my evening better understanding the algorithms behind structure learning so I could explain them in my own words.

A few hours after Rob and I met with Dr. Natarajan, he told me he had time to meet with me individually. I had been wanting to ask him about the math classes I would need to take at Hendrix if I wanted to pursue something like this in graduate school. He was a lot of help; I think I know what classes I am going to take next year now. I’m basically replacing Organic Chemistry both semesters with math classes.

Tuesday

When Rob and I met with Dr. Natrajan today, he was very excited about my scoring metrics intersection graph. To better visualize the causes and effects of the network, Dr. Natarajan asked me to create hierarchical versions of the intersection and union networks for scoring metrics in year 20 as well as a hierarchical union network for BDe scores in all years. He also asked me to make and print out CPTs for the scoring metrics in year 20.

The hierarchically formatted model was pretty fun to make. I basically drew in all the edges from my circular network and then untangled the web of connections. It felt like getting a knot out of a thin-chained necklace. My only complaint is that I used Netica, which has the tendency to crash at inconvenient times. After a few crashes, I began taking screen shots every time I made a significant alteration. By clicking on the following links, you can see my final union and intersection networks of the scoring metrics for year 20 as well as the union of all years’ networks created using BDe.

bnlearn is great for parameter learning, so the CPT data was easily obtained. However, the formatting was not great. The text files for the CPTs also too much paper to print out without feeling guilty. I started writing another Python program to convert the ugly CPT text file into a more legible version, but I wasn’t sure how necessary it really was; it’s hanging out in my files in case I ever need to finish it.

I also aired up my yoga ball today! I wanted to try it out to see if it would help my posture and make me less fidgety. It’s a little too short, but I still like it (picture).

Wednesday

I started out the morning with a very long and informative interview with Dr. Siek. This was for my Odyssey credit at Hendrix College; however, I was wanting to ask her a lot of those questions anyways since she is such a role model for me. We had the interview outside of the office and several of the other young women at the ProHealth REU told me how much they liked hearing more about Dr. Siek’s awesome life. Basically, Dr. Siek has managed to make systematic change at each academic institution she has attended using organization, networking, and passion. She also gave some valuable advise in what to look for in a graduate school and faculty position, as well as family planning tips for this career. The full summery of the interview, as well as Dr. Radivojac’s interview, can be viewed here.

Rob and I met with Dr. Natarajan in the morning as well. Again, he was very interested in the model of the intersection of all scoring metrics for all years. He drew in some of the expected connections, such as the cluster of blood pressure measurements, as well as some more interesting connections, such as the influence of race and education on whether an individual smokes.

When I showed Dr. Natarajan the CPTs for BDe year 20, he said he had trouble reading it because there were too many numbers after the decimal. He asked me to write up a quick Python program to truncate the values so that there are only 3 numbers after the decimal. As expected, this took very little time. Though the code is under 30 lines, I still posted it to GitHub as trunkate_values_cpt.py because I can’t resist a pun.

Right after lunch, we had Wednesday Workshop. Today we talked about lighting talks and posters. I’m a little intimidated by how good the poster from Dr. Natarajan’s group looked last year, but I’m sure whatever Rob and I come up with will be presentable. We were also told that we would be giving 1 minute lighting talks at the tea next Friday. There is so much to talk about that I don’t know how to fit it into just a minute.

Later in the afternoon, I gave Dr. Natarajan the trunkated CPTs he wanted. As I expected, they were still too messy for him to get anything from. He showed me how he wanted the table formatted and told be to do the formatting to 4 CPTs for the BDe year 20 model. The CPTs he wanted me to create showed the quantitative affects of: CAC given sex and glucose levels; glucose levels given BMI and sex; smoker status given education and race; and education given race. The formatted CPTs can be viewed here.

Tonight was a lot of fun since it was the first ProHealth Girl’s Night! We majorly treated ourselves by going to Red Robin, getting pedicures (picture), and finishing off with cookies from Baked. It felt good to have a night were I was completely separated from any possibility of doing work.

Thursday

Rob and I didn’t have much to do this morning besides work on our paper. I refined my sections on the AIC  and BIC scoring metrics. I also added a description and equation for the log-likelihood metric, since it is the basis of both AIC and BIC.

Dr. Natarajan had mentioned at an earlier meeting that we should be using the AAAI format rather than CHI, so Rob spent a while reformatting our paper this morning. Rob also filled out the Reviewer-Response table for this week, which can be viewed here.

Nandini came in in the early afternoon to explain the math behind the BDe scoring metric since I don’t understand the equation well enough to put it in my paper. However, she had only begun speaking to us when Dr. Natarajan came in to tell us he had time to meet.

I gave Dr. Natarajan the formatted CPTs for BDe year 20 and he showed Rob and I the stories the numbers told. It was really exciting to see the connections so clearly.

Dr. Natarajan then asked if we knew the special guest he was hosting this coming week. Apparently Dr. Kristian Kersting, a big name in the machine-learning research community, is flying in from Germany to spend a few days with Dr. Natarajan. As a way of introducing him to the lab, Dr. Natarajan wants all of his students, including us, to give a research presentation. It’s both exciting and daunting for me.

Needless to say, Rob and I spent the remainder of the afternoon working on creating the PowerPoint and associated dialogue.

Friday

Today wasn’t super productive. I spent most of the morning finishing up what Rob and I should say during our presentation on Monday. Rob worked on integrating the data into our slideshow in an interesting way.

There was a break halfway through the morning for the weekly ProHealth Meeting. It’s always interesting to hear how other groups’ projects are progressing.

After lunch, there wasn’t much for Rob and I to do until we could speak to Dr. Natarajan about our presentation. When we finally got to speak with him, he gave us a lot more direction for the slides. He told us to not show the CPTs, but tell a story about the data instead. He also helped us with the conclusion slide. I think that we can still make what I wrote for us to say work with just a few modifications.

After we finished meeting with Dr. Natarajan, I headed home. It’s been a long week. The only time I my dorm since was to get Thai food with Anne and Gabby.


Unknown Animal of the Week: Pyrops candelaria

These South-East Asian planthoppers use their long nose to pierce tree bark.

Pyrops candelaria.jpg