Data-driven foreign-language learning: by the numbers

This is the story of a foreign language data mashup, and how thinking about study-time as an asset with returns can make your language-learning more efficient---in theory.

I am not a linguist, a computational linguist, or a language teacher, but I travel internationally a fair amount and have had reason to half-study a few languages.  In the course of that, I've compared many different approaches and methods as an interested learner.  I've found that with simple audio tapes (like the FSI or Pimsleur) and 6 months of self-study, it's possible for a native English speaker to get to B1 or B2 conversational level in a European language---which isn't much: you can then order coffee and comment on newspaper headlines with some ease.  It takes diligence, but is doable.  However, beyond about that level, you begin to plateau.  At that point, you've learned the grammar, you've mastered the common words, you are confident that you can get around.  Before this, every single word, every single grammatical structure had comparatively large "returns" in the sense that each additional word or rudimentary grammar element came up all the time, and improved remarkably your ability to understand.

The problem is...

