Fall 2016 – Week 5: Shooting for 99% accuracy

Monday, September 19, 2016: 9:48am – 4:06pm (6.3 hours):

Three emails from Patrick were waiting for me this morning, most of which dealt with the results Kaushik and I produced last week.  Patrick noticed that certain sentences were obviously primary shares but were not picked up by the model:

0.4559: “Units offered by us 12,000,000 Units”
0.2911: “We are selling 20,000,000 Class A shares”
0.7237: “We are selling 4,600,000 common units”
0.0493: “Common shares offered by us22,500,000 shares”

In the case of the first three, the sentences were scored low because there were not enough training examples to consider them as strong examples.  In these case of the last one: “us22,500,000” is treated as a single word that appears nowhere else, so it was rated extremely low compared to the others.

Kaushik and I discussed how to solve this issue, and eventually settled on generating examples to fit the error in the previous model.  “us22,500,000” will probably be more difficult to solve, it may involve adding another predicate that detects when two words are scrunched together (updating makeTrainPredicates and makeTestPredicates).

Furthermore, I suggested that we should establish how we judge our results from here on out: between the primary and secondary shares our model is ~96% accurate, even though the precision and recall leaves a little to be desired in the fringe cases mentioned above.

Kaushik and I are pretty sure that we can achieve a few more percentage points, but the longer we work, we’re likely to overfit our model.  Generating data to fit our error could have unintended consequences as well.

Professor ordered Thai food for lunch, we discussed some progress at the lab meeting, but mostly took a break and talked about non-lab-related topics.  Hackathon got started a few minutes behind schedule, during which Professor met with the representative from Crane that we met over the summer.

During hackathon I approached modes from a different perspective.  Mayukh and Shuo were working on a database backend for the code, and I thought this would be an interesting way to consider modes.  I scribbled some examples out on paper and designed a rough interface for how they could be set.  I noticed that there was an interesting correlation between primary and foreign keys in SQL and positive and negative modes in RDN-boost.  I bounced some ideas off Kaushik and talked to Professor.  Professor was skeptical that I understood modes well enough to reason about them, but thought it was “brilliant” when I mentioned the relation to primary and foreign keys.  We talked for a while longer: the heuristic that positive modes (+) map to primary keys and negative modes (-) map to foreign keys breaks down in more complicated examples, and ground variables (#) were only loosely considered to be neither-nor in my approach.  A more complicated example might involve setting a mode to positive (or not specifying it) if we were not interested in searching over it.

I solved some more queries on paper, but didn’t contribute code this week.

Tuesday, September 20, 2016:

The last two times Professor has called on me in class I haven’t known the answer. My focus is usually on the lecture slides, but it seems like my fervor to take good notes is preventing me from actively participating in class discussion.  Both questions have been pretty straightforward: last time I was caught off guard by the limiting factor in the speed of searching decision trees (height), this time I couldn’t recall what the value of ‘h‘ stood for when maximizing the likelihood function (h represents a line that most accurately fits the training data).  I might change my note-taking.  Topics for today included the midterm in a few weeks (probably not relevant for me), logistic regression, and nearest neighbors.

Wednesday, September 21, 2016: 10:18am – 6:06pm (6.3 hours):

Kaushik and I were planning to give our update talk to the lab today, but Professor decided to push it back to next Monday so Phillip (who’s at a conference in Italy) can be around too.

Patrick sent us a followup email and we spent some time discussing how to proceed.  I wrote a python script for generating positive examples that were similar to the ones missed by the previous model.  These 22 sentences were only scored with 4%-86% confidence, so we wanted to generate enough positive examples to bring the confidence above 90%, but not so much that the generated examples overshadowed the working positive examples.

Algorithm Description:

Take a list of tuples detailing the sentence and the confidence they received.  Generate 100 new sentences from phrases within the missed sentences, compare them (using cosine similarity) to the missed sentence.  If the cosine similarity is greater than 98% compared to the test, keep the sentence.  Keep a number of new sentences equal to the inverse of the previous confidence out of 100 (if a sentence had 48% confidence, generate 52 sentences).

All of the values can be tweaked (we can easily generate 50% more examples), next time I’ll update the code to automatically generate files and add the updates to the posEx.txt file.

Kaushik and I talked for a while about the interface we would inevitably create and the features it would need to have.  A script that runs an boots an interface (RunThis.sh) could prompt the user for inputs, then handle all of the commands in the background.  An expert could read all of the code and make changes if they wanted, but it should be intuitive enough that a novice could treat the system like a black box and still get their work done.

I’ve wrote a similar system over the summer, Kaushik already has all of the instructions in a README, so adapting the README into an interface should be fairly straightforward (assuming all of the packages and prerequisites are installed correctly).  Key features should include: a setup/debug/update function (ensure packages are installed, ensure files are in the correct place, ensure files have not been modified, and update files from GitHub), built-in scraping from a list of web pages, import files to be learned from, export the models for future use, export a .csv for evaluation or auditing, and provide user-friendly instructions for each step.  Ideally this should also include a way to save the system state so it can be returned to later (making it possible to switch between docs500 and docs800 for instance), possibly involving saving everything to a .zip file that can be later be extracted.

Thursday, September 22, 2016: 12:00pm – 1:00pm (1 hour):

“Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network” – Jude Shavlik and Sriraam Natarajan.

Friday, September 23, 2016: (odd hours)

I spent my day at the Indiana ACTE Conference, hoping to talk with high school teachers about their classes, how they use technology, or some of our programs at Indiana University. Natalie, Cody, Payton, and I had mixed success at the table we setup at.  I did virtual reality demos with the HTC Vive, but to my surprise there wasn’t much interest in it among the attendees we spoke with.

There ended up being quite a bit of downtime while the conference attendees were in meetings, during this time I worked on a basic interface for the LVI project. The feature list will grow over the next week or so, in the meantime the basic outline exists for running, checking files, viewing the license agreement (which I’ll need to talk with professor about the specific details of our contract/funding agreement), and exiting.