Bionic Software, a letter to benjamin.
when i wrote this post i was on my way back from berlin in a Dash8-300 plane, which is one of those cigars featuring two propellers which make you feel like in a 60ties movie. I started reading the make: magazin and read Tim O’Reilly’s article “Games with a purbose” for the second time. It is a great article pointing out a tech talk at google from professor Luis von Ahn.
“In his talk, von Ahn pointed out that in 2003, about 9 billion human.hours were spent playing solitaire. By contrast, only 7 million human-hours were spent building the Empire State Building, and only 20 million human.hours on the Panama Canal. ….. Von Ahn waggishly pointed out that harnessing humans to play games, especially games that solve computer problems that Ai cannot yet solve, would have been a far more plausible pretext for the AI of the Matrix to keep humans around. In fact, he’s committed to just that goal, saying:” We’re going to consider all humanity as am extremely advanced and large-scale distribute processing unit that can solve large scale problems that computers cannot yet solve”
Most of our current projects are in need of recommendation and personalization algorithms. We are using different techniques but some of them are trying to be really clever. E.g. there is one project which is based on wordnet, which i would describe as a database of semantic relations. This database is extreme useful when analyzing texts and trying to find out what they are really about. I think the problem with wordnet is that it is to straight and the bionic aspect is not harnessed at all. Benjamin, you know i wrote this for you, maybe you can make some clever comments here ;-)
WordNet is data harvested by experts in linguistics, captured in a form that makes sense for doing Natural Language Processing (NLP). By itself its not capable of determining semantic relationships in text. Currently in my Haystack project its used to assist in statistical data mining of text by attempting to provide additional context around the words.
The issue with attempting to involve humans in the classification of text is that, well, there is a lot of it. Classification and mining techniques are generally used to help people discover items of interest in texts they haven’t read. Forcing more people to have to read a text in order to help others discover it might not be the best way to move forward.
There are a number of other solutions in place today. For years people have written abstracts at the heads of articles giving people a glimpse into the subject matter, but even that can offer too much information to digest. Today people mostly rely on community tagging to form localized taxonomies. Tagging works well but has issues of ambiguity and a loss of context.
If you’re seeking to harvest the efforts of the citizens of the Internet your best bet might just be a more sophisticated tagging interface that helps people
find tags in use and popular for given content (to prevent drift and duplicagtion)
allow some relationship between tags that can help convey context
indicate binding strength and popularity of tags relative to content
None of that is particularly hard and neatly avoids the issues of needing ever improving NLP and AI. It limits where you can go with in the future a little maybe, but I think it neatly pairs with your goals.
[...] Bionic Software, a letter to benjamin. [...]