Week 5

For me, most of this week has been going back and forth with Fernando and Nandini on the forum data. For reference, the problem is using the plain text of forum data to automatically generate the qualitative codes that Fernando worked on previously, which we will then be able to use to analyze the relationships between different types of support seeking strategies and the responses they receive.

Some of the qualitative tags in the forum data were different for the same post in different places, so I had to examine each post manually and bring the discrepancies to Fernando so I could be sure about all of the codes that I was working with for the machine learning part. At the same time, I met with Nandini practically every day this week to discuss the formatting of the data and the information we need. In looking through all of the data, I pulled groups of common words for the initial pass of machine learning analysis. Most likely, this won’t yield the greatest results and we’ll end up using natural language processing and showing how that form of analysis compares favorably to this method. The groups of common words, phrases, and characters, and the number of the ~400 forum posts that contained at least one of them is as follows (excuse the language, sexual organs were often explicitly mentioned within the data when posters were describing sexual experiences, which is relevant to the concerns of individuals living with HIV):

Moods (130 instances)
don’t know, dont know, dunno, I hope, lost, hard, struggle, stress, Worst, horrible, awful,
suicid, crap, sad, guilt, alone, anxiety, anxious, worry, scaring, fear, scare

Reaching Out (158 instances)
please, can you, advise/advice, should I, ?

Sex and Relationships (103 instances)
relationship, love, bf, gf, boyfriend, girlfriend, wife, husband, fiance, faithful, Sex,
vagina, penis, kiss, hookup, condom, protection

Family (52 instances)
Brother, sister, daughter, father, mother, family

Diagnosis (68 instances)
Diagnose, results

Jargon (119 instances)
cd4, viral, medication, meds, cytometry, doses, didanosine, emtricitabine,
lamivudine, stavudine, disoproxil, fumarate, zidovudine, nevirapine, rilpivirine, avir,
ovir, efuvirtide, cobicistat, inhibitor

This is not finalized. Julia said that she would give it a pass as well, and these categories really only apply to those seeking support. I haven’t sifted through for themes in support provision yet (though I have some ideas). The first pass of analysis will only be looking at support seeking anyway. The idea will be to use machine learning to find links between posts that contain words in the given categories above and whether that predicts the type of support the post is seeking, and the strategy that the poster used (also, possibly the distinction between seeking and providing). Nandini has given me some direction for the algorithms to run. If I can figure it out and have the time I’ll do that this weekend, otherwise I will meet with her Monday so that she can walk me through it. After we’re done with that, Nandini has suggested that I work with either Kaushik or Alex on the natural language processing element.

I helped out with the paper circuits summer camp on Monday, and happened to run into Alex there. He gave me some resources to check out in advance for NLP, but I haven’t gotten to it yet.

We finally have a timeline (more or less) for the ARC project. We did not manage to launch one of the groups this week, but for sure on next Friday we will be launching a group of the people we’ve recruited through Facebook (we definitely have enough recruits there now). Fernando still has yet to hear back from Positive Link and Drexel, so we’re waiting on that at the moment. Haley MacLeod (author of the original ARC paper) made comments on our current plan. Her notes were incredibly insightful, and have given us a lot to think about between now and launch.

Finally, the list of papers we’re going to be writing just keeps growing. Here’s the list as it stands now:

CHI undergraduate research competition paper: Julia will be first author, I will be second. This is the paper that will be our final deliverable for the REU (focusing on our adaptation of the ARC method), but we will likely continue to iterate on it before our deadline in January.

Full CHI paper: Patrick and Fernando seem pretty confident that they can get our ARC paper published in CHI. Fernando is writing it at the moment. Fernando will be the primary author, Julia and I will be second and third, and Patrick will be last author.

Forum data methods paper: This is going to be a machine learning paper focused on the process that I went through to analyze the forum data and how I achieved my results. Patrick has mentioned some venues where we could publish in the spring. I will be primary author most likely, and this paper may have quite a few. Patrick and Fernando will be on it in some capacity, Julia has started working with me more closely, and I still need to talk to Nandini/any one else who works on it in Dr. Natarajan’s lab about how they would like to be credited.

Forum data analysis paper: This will also be submitted sometime in the spring (again, Patrick talked about a few options for venues, and it’s on the backburner since their deadlines are all a ways off). It will be the more qualitative side of the forum data, and Fernando will write it/be primary author, but we will also be on it.

Here’s our updated timeline:

Week 1: Getting to know everyone, learning the ropes of the involved tech, and choosing mentors.

Week 2: Reading papers and getting familiar with the field. In addition some basic planning for how we would conduct our research.

Week 3: More in depth planning of our research.

Week 4: Finish up guides for the TreatYoSelf app and the structure for the ARC activities to make sure everyone is ok with them. Present them at the end of the week and make last minute edits. Either launch the facebook group here or at the beginning of week 5. Preparations for forum data (I had some extra time, so we’re a bit ahead of the game on this).

Week 5: Facebook group should be on by now (this is looking a little shakier now, we’re going to be doing more recruitment, and it’s hard to say when things are going to be fully set up). Since there’s going to be separation between the activities when we’re not handling immediate issues or focusing on the data right in front of us there will be a decent amount of downtime. During this downtime we will likely start putting effort into the forum data (I may end up putting even more effort into the forum data depending on how much wait there is for recruitment).

Week 5: Praise backup plans. I ended up doing exactly what I said I would if we didn’t have participants yet this week: putting a lot of work into the forum data.

Week 6: Facebook group starts up on Friday. In the week leading up we will incorporate Haley’s edits, work on implementing machine learning now that the data is good, and continue iterating on our deliverable as always.

Weeks 7- 8: Basically the same as week 6, conduct more activities (the order isn’t concrete now), continue reading and writing, and work with the forum data.

Week 9: This week we’ll likely be scrambling to finalize our deliverable, on top of everything else. Keep in mind that our final paper will really only cover some of our impressions of the methods and analysis, as the study will have been going on for ~3 of the planned 8 weeks at the time we finalize the writing (and that’s not even taking into account the possibility of additional ARC groups that will start later, and I would like to use all the information we have available to us when we submit our paper to the competition in January). 

Week 10: Final touches and presenting of deliverables.

Here is a (very rough) draft of what my website should look like: https://ciabhanconnelly.com/

Here is our reviewer response table as it stands: https://docs.google.com/spreadsheets/d/12uQFcqEj21MIsEPQTHZvxECEAbDIhpOQdVOCXtsD6DQ/edit?usp=sharing