Open Sourcing Our Labeled Data
January 27, 2015
At InboxVudu, we’ve been working on the problem of helping you manage email overload by detecting and showing you requests that you might have missed in our digest email.
To train our machine learning algorithms to detect requests is not easy. Luckily, we were able to leverage our extensive natural language processing and machine learning experience to build accurate sentence-level request detection.
Defining the Problem
But what constitutes a request? “Please send me the spreadsheet” is clearly a request, but how about “Why is the sky blue”?
We decided to start off by restricting the scope of the sentences we wanted to detect to primarily:
- ones where there is an explicit request by the sender for the recipient to perform an action
- ones where the sender is inviting the recipient to participate in an activity together
These are perhaps the most actionable types of sentences and most relevant for integration with other applications such as calendaring and todos.
The first step of most machine learning solutions is the acquisition of labeled data for training and evaluation. Labeling email sentences for requests turned out to be trickier than other labeling efforts we have been involved in in the past: there were many cases where the presence of a request was unclear. Warning: try this at home only if you have a lots of time to spend. After expending great effort, we finally got thousands of sentences.
Open Sourcing Our Labeled Data
Recently, we uploaded a significant part of our labeled request sentence data to github as open source. Why? Seeing how hard it was to acquire our labeled data, we believe that our open sourced data would be a valuable resource to help jumpstart work by other companies and researchers in the field of email understanding and automation. Email is a big problem and we would love for more people to get involved.
In a subsequent post, we will discuss our approach in training our machine learning system to detect requests on the sentence level, as well as our learnings. But for now, we do want to jump ahead and say that we are happy to have achieved good levels of precision and recall, and that this trained system is now powering InboxVudu’s request detections.
April 30, 2014
Some of the more eagle-eyed amongst you might have noticed a slight change to our book pages in the past few weeks. We’ve adapted our cutting-edge Natural Language Processing technology that filters out tweets about books from amidst the noise of Twitter so that it detects what the tweeter is saying about the book as well!
At a glance, you can now see which tweets on a book page are from people who want to read, want to buy, have read or are recommending the book. It’s a great way of understanding what sort of a buzz the book is generating (and makes our pages much more colorful into the bargain!)
We want to make it super-easy for you to jump aboard the discussion on Twitter so we’ve added ‘Me Too’ buttons as well. If you agree with the tweeter you can click the Me Too button to tell your friends on Twitter instantly.
Want to put it to the test? Just search for a book on the site and look at all the tweets about it. Or you can see it in action on the book page for John Steinbeck’s “Grapes of Wrath” — one of our favorite classics.
BookVibe Launches Book Pages
July 17, 2013
Today Parakweet is happy to announce the release of book pages on BookVibe, our social book discovery service.
- BookVibe has released pages for over 500,000 book titles highlighting the social discussions happening around each book, including sentiment and buzz.
- From over 400 million tweets being posted on Twitter every day, BookVibe accurately identifies over 100,000 discussions on books. We've analyzed nearly 50 million micro-reviews and book discussions, allowing us to measure the social buzz and sentiment of the books people are talking about.
- These entity pages are generated completely automatically, including the aggregation of the social discussions and sentiment about the book.
- Consumers, publishers and authors can now tap into micro-reviews on books that readers are generating organically in the social universe
- Parakweet’s natural language processing platform outperforms leading academic research in accurately identifying a huge number of book titles and the meaning of surrounding text. To detect a wide range of titles well is no easy task as many titles have their own quirks.
You can preview the new pages by visiting: http://www.bookvibe.com
Parakweet Launches BookVibe Recommendations
May 17, 2013
Parakweet Inc. today announces the launch of BookVibe a social graph-powered, personalized book recommendation engine.
Never miss a book tip from your Twitter friends again! BookVibe shows you the books that your friends on Twitter are discussing, generating a real-time and personalized book stream just for you. As examples, check out the book streams of noted Venture Capitalist and startup book author, Brad Feld: http://www.bookvibe.com/people/bfeld and leading author, Neil Gaiman: http://www.bookvibe.com/people/neilhimself
You can also use BookVibe to peek into the book streams of friends and competitors. Simply visit http://www.bookvibe.com and enter a Twitter handle to see their live book stream.
How it works:
The Parakweet platform is able to extract meaning from unstructured social chatter.
Using a proprietary Natural Language Processing based platform, Parakweet is able to extract conversations where customers are discussing products such as books, along with associated metadata such as intent, behaviors (“read”, “recommend”), and sentiment, with unprecedented accuracy.
How big is the haystack?
Twitter users send more than 400 million tweets a day. In the case of books, Parakweet identifies approximately 100,000 tweets a day that are actually about books with 96% precision. Keyword-based search techniques would identify approximately 10 million tweets per day that could be book titles, with around 99% false positives or “noise.” By maximizing the signal-to-noise ratio, Parakweet enables complex and nuanced operations to be performed based on the accurately identified entities.
How do we make money?
The BookVibe service is free to consumers. Parakweet uses the same technology platform to provide paid analytics services to media companies, enabling them to tap into these micro-reviews and other social signals that consumers are generating organically in the social universe. All Parakweet products are available through APIs which enable integration into other platforms as well as the combination of Social Media Metadata with customers’ proprietary internal data.
Try it today: http://www.bookvibe.com