Week 6

A lot got done this week, but we also now have quite a bit more that needs to get done. We got the first couple activities for the ARC study completely set, so we’re ready to launch on Monday. It’s a little scary working on a study like this. Details about our later activities are dependent on the results of our early ones, so we have to be making tweaks and finalize things week by week as we go.

The forum data finally made it to the machine learning pass. There was some promise with some of the tests, but certain codes that we were trying to pick were really sparse. What this meant was that the algorithm could perform “well” (as in get a high % correct) by categorizing all examples as false. To rectify this, Dr. Natarajan suggested that I manually divide up the data so that the algorithm is trained on a set with close to equal examples of positives and negatives. We’re looking into hiring through Mechanical Turk to code the data so that we can get a higher number of true positives to train on for these sets.

That itself presents further complications, as if what we assign to MTurk doesn’t have inter-rator reliability with the data coded by Fernando then it’s wasted money. There is literature to show that in general the averages of a handful of novice qualitative codes (like we would get from MTurk) ends up simulating the codes of researchers fairly accurately, but we still have to test this for our case. Patrick suggested that MTurk can be modular, so we can test it on a relatively small number first, then repeat the same task with different data slowly as we see success. In each block of data we would include data from the original set that Fernando coded so we could check back and make sure it’s consistent.

The big change of this week is that we may have to completely redesign our paper. We’re running up on the end of the program fast, and only just now starting to run ARC. The difficulties with recruitment that we’ve had would have made it near impossible to have a good paper by the end of the program under our original idea. What we’re thinking of now is that we write a paper describing how activities should be selected for the ARC method based on the demographic and the ultimate goals of the researchers. We would use our experience, as well as the previous ARC papers, in order to write something that we could be proud of at the end of these 10 weeks and ultimately iterate further on and submit to the CHI undergraduate research competition.

In terms of future researches, Patrick has expressed interest in hiring Julia and I for ~10 hours a week over the course of the semester. I think both of us are planning on taking him up on it (we have a lot of projects we want to see through), but there is the question of whether we want to do the research for money or credit.

Finally, I’ve done more work on the other deliverables this week as well. Here is the current state of my website. It still needs a lot of work, but it’s getting there. I plan on running it by Patrick next week. I’ve also made a twitter, though I haven’t done much with it yet.