ProHealth Summer REU – Week 6 – July 1, 2016


Date Hours Worked Total Hours Wall-sits Caffeine
Monday, 6/27/16 8:16am – 5:43pm 9hr27min 3min Starbucks Americano: 150mg
Tuesday, 6/28/16 8:53am – 6:15pm 9hr22min 0min 0 cups: 0 mg
Wednesday, 6/29/16 * 10hr47min 0min 1 cups: 91mg
Thursday, 6/30/16 10:01am – 8:54pm 10hr53min 6min 2 cups: 182mg
Friday, 7/1/16 9:18am – 2:34pm 5hr16min 5min 3 cups: 273mg

* – 8:35am – 5:40pm, 7:48pm – 9:30pm
Totals: 45hr45min, 14minutes wall-sits, 696mg caffeine

I took things a little slower this week to recover from my previous one.
My website is available at
There are 10 images in my 9 previous posts.  This post contains 9 images.


The script finished executing over the weekend, resulting in around 663,411 files (number is based on entries in the LOG file, exact total is unconfirmed at the moment).  This amounts to 5.4 GB of PubMed abstracts.  It took around 7 hours to download the data from Odin, and another 45 minutes to get it on the hard drive.

terminal with information about the data pulled

Overview of the PubMed data, showing that there are 663,411 lines in the LOG file. Further down, an overview of the abstracts (sorted by first letter).

I was a bit at a loss for what to do at this point, the task list we established during week 2 basically ended here for me.  Ciabhan was labeling, Savannah was pulling blog data, and Devendra would be analyzing the abstracts to determine what was usable (or if there were too many).

pipeline / order of operations for the project.

The order-of-operations for our project, Ciabhan is working on labeling.

To keep busy I focused on some of the deliverables, specifically my website:

I used Twitter’s bootstrap as a starting point for its excellent focus on webpages that transition effectively between desktop and mobile.  Then for the most part I focused on my homepage, planning for a consistent theme throughout the site.

Typically we have our weekly lab meetings on Monday, but Sriraam forewent it in favor of taking us out to Starbucks–three of his papers were accepted and he wanted to celebrate.  The weather and the walk were each nice.  I ordered a tall Americano.


I gave Devendra the hard drive with the PubMed data on it.  I spent some more time working on my homepage, focusing more on the background.  Yesterday I used a stock photo but wanted something more dynamic based on how the rest of the page was working (simply covering the background with a stretched image looked messy).

Low-poly designs always appealed to me, and I found a project by Samuel Marchal (Twitter, GitHub, codepen) for creating moving backgrounds with CSS and JavaScript.  After quite a bit of tweaking I got something I was extremely satisfied with. cover page, website for Alexander Hayes IU

Click image to view page.

Later in the day ProHealth and SROC met in the University Club Presidents room for the SUR/REU reception, where we had food and enjoyed a short presentation and discussion by SOIC interim dean Dr. Brad Wheeler.

ProHealth and SROC members at the REU Reception in the University Club President's Room.

Ayush (ProHealth Blog, Website) and I returned to the Informatics building around 4pm.  After working for half an hour my Chromebook crashed, something similar happened during Week 1, but it would take some time to fix.  To keep occupied I pushed my schedule ahead a little bit, and went downstairs to meet with Devendra.  He proposed that we use the data from openFDA to evaluate how likely the drug combinations from PubMed were.

chromeosDevendra's Proposal

This approach was similar to the method for comparing openFDA’s data to WebMD’s drug interaction checker that I used for forming the graph pictured below.  Previously, WebMD was the control that openFDA was checked against; now openFDA is the control that PubMed will be compared against.  I need to write a script to do this automatically.  The final output will be a text file listing:
[Confidence as TP/FP/FN/TN] [Drug1] [Drug2] [Adverse Event from openFDA]



I spent a couple hours trying to resurrect my Chromebook but became increasingly frustrated with my inability to do so.   Nothing was lost because everything was either backed up or on one of IU’s Linux workstations, but not having my normal resources would throw a wrench into my productivity.

Most of my day was spent whiteboarding the script Devendra proposed yesterday, while also trying to predict the primary challenges.

the next script

Our 2pm Wednesday session was lead by Dr. Connelly and Majdah, discussing lightning talks and posters, respectfully.  While drafting my poster I talked to Dr. Connelly about data visualization, and she sent an introductory email on my behalf to Dr. David Wild (SOIC Profile, Website), inquiring about software for displaying dangerous drug interactions as a network.

After the session I wrapped up my whiteboard draft and reviewed the algorithm with Devendra. Similar to the script for pulling PubMed abstracts, this would likely need to check every combination of drugs (around 23 million), however it could also be cut in half (around 11.5 million) using a similar technique to

I had GRE prep from 6:00pm-7:30pm, but followed up with Dr. Wild and his PhD student Jeremy Yang (Website) on visualization.  Jeremy directed me toward a tool called Cytoscape.  I was able to get some results, but I have a lot to learn before I can make something that looks decent.  I used the data from the Abstracts LOG to construct a network, giving me the first real glimpse into what the graph may look like (alternate view on imgur).

alexander hayes drug interaction network, generated using cytoscape and abstract data pulled from pubmed

Nodes (light grey) represent drugs (total: 4881), the grey edges represent a reaction (~600,000). The solid grey line that bisects the image is either a rendering error or caused by a duplication bug that left artifacts in LOG.txt.



I started my day by updating my slide for our ProHealth Meeting tomorrow, then made some blog updates.

Following yesterday’s discussion on introductions and lightning talks, I watched quite a few prescription drug commercials for inspiration.  Along the way I discovered a lot of video blogs belonging to people taking these medications.  I mentioned this to Savannah as another potential source of consumer information, but both of us were skeptical on whether it could be scaled (for example: Google’s automatic captions could likely be downloaded with a script, but the text-to-speech accuracy is extremely unreliable).

My laptop was having a lot of issues so I worked on my blog at the computer on Aislinn’s desk while she was out of the office.

Ben, Sam, Olivia, and I had a fantastic 3@3 session: doing two 3-minute wall-sits with a 30-second break in between.

STARAI met a little before 4pm for reading group, tackling: “Exploiting Causal Independence in Markov Logic Networks: Combining Undirected and Directed Models.”  Professor had to leave a little earlier than normal, but we got through everything but the conclusion.

Annoyed with my Windows laptop, I worked a bit more on the artwork for Devon and Aislinn’s app before working from home for the rest of my day.  I finished updating and posting my blog, rendered more images of the network with Cytoscape, and changed a few things on my slide for the ProHealth meeting tomorrow.



I presented my progress at the 10am ProHealth meeting.  Then Professor Connelly, Majdah, Devon, Aislinn, and I met to further discuss the app artwork.  We set a goal to have the designs finished by next Friday.

I did a bit more white-board coding then left early, heading home for the long weekend to see friends and family.