Week 4

This week has absolutely flown by. I think it is because I’ve been coding so much; staring at a computer screen seems to make the time pass quicker. My work this week involved a lot of trial and error as I put theory to practice in creating a Bayesian Network.


Goals

  1. Create a Bayesian Network program than can perform inference through Variable Elimination and Likelihood Weighting
    • This has been my big project this week. I have uploaded my code, text files for each test network, and screenshots of those networks in Netica to GitHub. Seeing the code work on large networks at the end of the week was extremely satisfying because at the beginning of the week, I had no idea of how to go about creating this program.
  2. Outline the final paper Abstract, draft the Introduction, and Peer Review someone’s Related Work and Methods sections
    • Rob has done most of the paper work this week. Dr. Natarajan said that the timeline for writing our paper is different than other group’s timelines due to the nature of Machine Learning research. The majority of our paper writing and editing will be done in the last week or two of the program.

Weekend

This weekend was a lot of fun. As usual, I went to the farmer’s market on Saturday morning. Anne, Gabby, and I decided to stop by The Village Deli for brunch afterwards. The giant pancake was fantastic.

Week4_brunch.jpg

I was so drained from the previous week that I took a 4 hour nap. Thankfully, I woke up in time to go to Dr. Siek’s house for dinner.

Dr. Siek’s party was a blast. All of the ProHealth students and most of the faculty and graduate student advisors were there. Some of us played on Dr. Siek’s awesome play-structure with her daughters, which was a lot of fun.  The homemade food was delicious, especially the butternut squash strata. I would be elated if she hosts another ProHealth party this summer.

Week4_BBQ1.jpg

On Sunday, I checked out the Bloomington’s Christian Science Church alongside Thomas and Disaiah. It was a different sort of service than I am used to. I’m glad that I now know what attending a Christian Science Church is like.

Week4_church.JPG

I spent a good portion of the afternoon working on modifying the BayesNets code I had to perform variable elimination. Most of my trouble was with keeping track of where nodes landed in the probability tables when multiplied. Too many .index(item) calls were being made.

Monday

Today was a good learning experience, if a tad frustrating. The code I wrote today is trash, but now I have a good idea of what I actually need to do to make a working program. The way I was keeping track of Boolean values in the truth tables was inefficient, so I’ve decided to make each row a tuple of booleans and map the tuple to it’s corresponding value.  There is probably an even better way to do this, but at least I’m improving on what I had. Using this structure, multiplying tables and summing out variables should become easier to implement.

Tuesday

After a full day of coding, I have a working program! Given the query, evidence, and elimination order, the program can find the probability of the query using variable elimination, an exact inference technique. This worked well on the basic 5 node ALARM network. I kind of cut it close with this program since we are meeting with Dr. Natarajan tomorrow to discuss it.

Wednesday

Rob and I talked to Dr. Natarajan today about the program. He told me to stress test the program with a network of 30-40 nodes. Also, I am supposed extend the program with a likelihood weighing algorithm. Rob is supposed to find a good Bayesian Network module such as OpenBayes or GPy so that we can compare the results.

Right after my meeting with Dr. Natarajan, we went to the Wednesday Workshop. Dr. Siek talked about how to write an abstract and review a paper. We also looked at some professional websites to get ideas for the website we will be creating this summer.

Once that meeting ended, I started to work making a large network. As a base, I used a complex network already constructed in Netica. I made all of the nodes hold true/false values and renamed them with letters of the alphabet. There were 37 nodes. I made a copy of the file and deleted some nodes so that I could have a 26 node network as well. Then, I wrote the data from each node into a text file that my program could read.

At first, I tried to just throw in nodes for the evidence and elimination order. My computer worked on one sample for an hour without spitting out the probability. Then I realized that I actually needed to input a logical elimination order for the program to work. On the 26 node network , I computed P(M = true | O = true, Z = true, E = true) with the elimination order V, W, Y, G, N, S, T, X. This quickly returned the value 0.10247018, which matched up the the value Netica gave for the same configuration.

Week4_BaysNets2.png

Figuring out elimination orders is tedious work. Though I was able to get values in the 37 node network, they didn’t match up with the Netica results. I think that is because I either had a number wrong in the txt file for the data or I made a mistake with the input values.

Thursday

I worked on the likelihood weighting code for an hour or so this morning. The website I used as a reference was handy for explaining how to construct the basic code. However, I didn’t feel like I have a very good grasp on how likelihood weighting actually works.

When Rob and I talked to Dr. Natarajan, I quickly realized what I had done wrong. Basically, my code was correct, but needed to automatically run hundreds/thousands/millions of times depending on the size of the network. He told me to get the code working well for the small network within the hour.

When Dr. Natarajan came back, he told us that we now had his permission to start looking at the CARDIA data, which Nandini should send us tonight. Rob and I need to look over the data over the weekend to figure out what we want to explore. We also need to read some more of Dr. Natarajan’s papers and figure out how to used a Bayesian Network software.

Because I had a major headache, I left work earlier than usual to take a nap. After laying down for 20 minutes, I figured out how to fix one of the biggest inefficiencies in the likelihood weighting code. Once I changed those few lines, I was able to run  thousands of samples on the 26 and 37 node networks and get an accurate result in seconds.

Week4_BayesNets3.png

During my free-time today I worked on the weekly deliverables I had to do myself. Rob did the paper writing this week while I wrote the code, but I still had to Peer Review one of the other group’s paper draft. It took me a while because I wanted to give thorough, constructive advise. Also, reading other people’s papers made me aware of the work our own paper needs. It will obviously get better once we pin down what we are doing, but a lot of work should still be done in refining the related works section especially.

Speaking of related works, I also earned an Advanced Level Certificate in identifying and avoiding plagiarism. It was interesting to read the exact identifiers for different types of plagiarism.

Friday

I feel like I was unproductive today, but I did a lot. It was just a lot of smaller things rather than one big accomplishment.

First thing in the morning, I decided to go big and run likelihood weighting on the 37 node network using 1,000,000 samples. It took several minutes for the code to output the result, but I was really close to the actual value of the query. You can view my final code here along with a folder of txtfiles needed to run different networks are some screenshots of different Netica network configurations to test.

Week4_BayesNets4.png

At 10, we had a ProHealth update meeting, where I learned more about the progress other people made this week. I was happy because I was able to say I did more than read this week.

When I got back from the meeting, I saw that Nandini had sent me the CARDIA data file. However, it was too big for me to open in DropBox without paying the $9.99 per month. After trying to figure out how to increase my DropBox size for about an hour, I asked Nandini to just send it to my IU Box.

In the early afternoon, I had the opportunity to meet with Dr. Predrag Radivojac, a Computer Science professor that does a lot of work in Computational Biology and Bioinformatics. Despite his busy schedule, he was able to talk to me for an hour about his career. It was very informative; he gave me a lot of very honest advise. He told me that it is better to go deep in either Computer Science or Biology in College, then pick up what you need to by auditing classes later one. The most important classes to take in college are math classes, according to Dr. Radivojac, because they can be applied to so many different fields. You can view the interview summary for both Dr. Siek and Dr. Radivojac here.

Once I got back, I talked to Nandini for a bit about how the accuracy of likelihood weighting increases with higher sample sizes. She said that likelihood weighting is almost always used over variable elimination because it gets so close to the correct answer is much easier to compute on the user’s end. We also briefly talked about different scoring metrics such as MLE and MAP.

In the afternoon, I worked on making my code more readable by adding comments and renaming variables before posting the code to GitHub. I also went down to the prototyping lab to relearn how to make paper circuits for the SOIC camp Monday. Gabby helped me a lot.

Week4_papercircuits.JPG

I stayed in the lab until my parents came and picked me up around 6:30. They drove up from Arkansas for Father’s Day. I treated them to dinner at Noodles & Co. before we went to watch Dames at Sea. The singing/dancing/acting was phenomenal, which I should have expected given that this is one of the top music schools in the country.  I’m excited for the 2 other plays that are being performed at IU this summer.

Week4_DamesAtSea.jpg

After the musical ended, my dad drove us around campus for a while and admired how beautiful everything was, especially the stonework. Then we all got Baked together. They got our order wrong the first time, so now I have 9 cookies back at the dorm to eat this week.

Week4_bakedparents.JPG

I’m excited for the adventures this weekend holds!


Unknown Animal of the Week: Glaucus atlanticus

This hermaphroditic blue sea slug uses the surface tension of water to float around the open ocean. If you happen to see one, though, don’t pick it up; it’s sting can cause the same symptoms as the sting of a man ‘o war.

http://ianimal.ru/wp-content/uploads/2010/10/glaucus02.jpg