This post is part of our Bookshelf series organized by the Data Science R&D department at Civis Analytics. In this series, Civis data scientists share links to interesting software tools, blog posts, scientific articles, and other things that they have read about recently, along with a little commentary about why these things are worth checking out. Are you reading anything interesting? We’d love to hear from you on Twitter.
Stack Overflow did an interesting piece on the astronomical growth of people viewing questions on Python in the last year. In the U.S., it is projected to be the most asked about programming language by 2018. And the second-fastest growing language, you ask? R. Clearly, interest in data science and machine learning is rapidly increasing, and it turns out that this is especially true amongst high-income countries.
Stochastic Gradient Descent (SGD) dramatically speeds up training of large models, making many recent deep learning architectures feasible. Unfortunately, the algorithm also suffers from getting “stuck” in saddle points, areas where the gradient doesn’t provide a clear direction in which to travel downhill and minimizing the cost function is sloooow. Jin et al. propose a new algorithm, Perturbed Gradient Descent (PGD), which alleviates this problem and speeds up training in a variety of real-world optimization problems. The blog post provides a good introduction to the main ideas, and you can check out the paper here for more detail.
One of the first podcasts on data science recorded its last(?) episode this week. Hosts Chris and Vidya are leaving, and the show is not planning to release anything new in the immediate future. Running for the last 2.5 years, the show explored topics as varied as analyzing bias in machine learning models and improving the quality of survey responses. You can check out archived episodes here.