This week was a major breakthrough because I finally finished scrapping data from the google play store now the only part of data collection that’s left is to finish collecting network data. The process of completing the scrapping tool this week had two notable failures. Most of my work was written in node.js but the final portion of the scraper was written in python. Something I did was develop code that would get the two languages to communicate with each other. Essentially my node.js script would take 90 percent of the data associated with an app and then activate my python script to gather the remaining 10 percent which would then return this data back to node.js to be placed in a file. This had one major complication which was timing, in which the data would be returned. If I chose to run everything asynchronously then the two languages could run at the same time but then I would have to place a certain amount of sleep time in my nodejs script so that it didn’t go ahead of the python script. Which would have undesirable results when it came to formatting the file because the script would exhibit non-deterministic behavior as I would be introducing a race condition. The secondary issue I realized was the sleep time had to scale up as n (the number of apps) increased which gave this a huge performance hit as the number of apps that needed to be scraped increased. Then I implemented this problem synchronously which eliminated the sleep issue from the asynchronous version but this had the same issue of being slow its contemporary. Breaking it down the issue wasn’t necessarily the communication between these two programs but it was that the python version took up a good deal of time. So whilst I was successful in running both programs in one for this particular application it was less optimal them separating useless I switched to a faster API framework in python then selenium. The solution that I settled with was having nodejs create a file with all the data it acquired then having the python script run go into the file and add all the new data to it.
With this recent development, I’m fairly confident that I’ll complete the project soon and be able to focus exclusively on the paper. For the upcoming weeks, the last set of tasks will be to run finish collecting network data and determine the best classifier for this project’s problem.
Work aside it was fun going and celebrating Nigel’s birthday at Asuka’s this week. This event was the first time I’ve been to a restaurant where the chef cooked the food in front of their clients so I was pretty excited about this development. Included below is a few pictures visual display the chef in action.
peer Review Table link: