Wednesday, March 20, 2013

Distributed Training of Logistic Models

Recently some of our work on Training large-scale logistic models got accepted into ICML. Basically, we have a training procedure for regularized Multinomial Logistic Regression (RMLR) with very large number of multinomial outcomes. Typically, with large number of outcomes and high dimensions, even holding all the parameters simultaneously might not be possible. Therefore we devise a parallel training of RMLR by replacing the objective using a more 'parallelizable' function. It turns out this that optimizing the new 'parallelizable' objective does not change the optimal solution ! Here is the paper and the large-scale Hadoop based code!

Paper      Code