We started TechReach at Caltech two years ago to create a space for “CS + Social Good”: using tech to solve societal problems and creating opportunities for computer science students to help their surrounding world. Our mission donned a comfortable, virtuous halo. TechReach student teams, guided by alumni mentors, have conducted projects ranging from building a database for Miracle Messages, an organization that aims to end homelessness, to analyzing data for Arlington Garden, a local community garden.
But as we worked on these projects, we were confronted by questions that challenged our lofty ideals.
What does it mean to build…
How do learners browse around as a result of different searches? What topics are they looking for that we currently don’t provide? By examining search queries and their patterns, we can gauge learners’ interests and improve the site experience.
Sifting through queries to understand how our content is discovered, I quickly realized the difficulty of this task. Coursera gets millions of searches every day, so it’s hard to directly analyze them.
This led me down a rabbit hole of learning about word embeddings and building a pipeline to turn search queries into vectors that represent their relationships to one another…