• Couchdb-Lucene Search

    Updated: 2010-07-31 07:12:04
    The Beyond Search goslings noticed a post from R Newson about couchdb-lucene search. A bug fix was posted. Couchdb-lucene enables full text searching of couchdb documents. The Github detail page is at http://github.com/rnewson/couchdb-lucene/#readme. couchdb-lucene uses Apache Tika to index attachments. File types supported include Microsoft formats, Java class files, and jar archives, XML and about [...]

  • Nested data structures keep coming up, especially for log files

    Updated: 2010-07-31 04:42:06
    Nested data structures have come up several times now, almost always in the context of log files. Google has published about a project called Dremel. Per Tasso Agyros, one of Dremel’s key concepts is nested data structures. Those arrays that the XLDB/SciDB folks keep talking about are meant to be nested data structures. Scientific data is of [...]

  • Recorded Future Temporal Predictive Analytics Engine Media Analytics News Analysis

    Updated: 2010-07-30 16:56:47
    , Sign in How it works How people use it Pricing and plans Username Password Stay signed in Forgot password Enter your username and press . reset An e-mail with instructions on how to reset your password will be sent to . you Username Introducing the world's first Temporal Analytics Engine A new predictive analysis tool that allows you to visualize the future , past or present . Learn more . Signup for Futures Pricing and plans Welcome to Recorded Future What we anticipate seldom occurs what we least expected generally . happens About Jobs Support Videos Predictor's discussion 2010 Recorded Future , Inc . Terms of Use Privacy Policy Latest in our blog July 28 The Big Future of Social Gaming July 28 Search Suggestions July 26 New Open Source Intelligence Blog July 22 Recorded Future News

  • NIST Speech Group Website

    Updated: 2010-07-30 16:56:03
    Multimodal Information Group Home Benchmark Tests Tools Test Beds Publications Links Contacts Topic Detection and Tracking Evaluation Topic Detection and Tracking research was pursued under the DARPA Translingual Information Detection , Extraction , and Summarization TIDES program Topic Detection and Tracking is an integral part of the DARPA Translingual Information Detection , Extraction , and Summarization TIDES program . The goal of the TIDES program is to enable English-speaking users to access , correlate , and interpret multilingual sources of real-time information and to share the essence of this information with . collaborators As a TIDES evaluation community , TDT provides a forum to discuss applications and techniques for detecting and tracking events that occur in real-time and

  • James Allan

    Updated: 2010-07-30 16:55:49
    James Allan Department of Computer Science 140 Governors Drive University of Massachusetts Amherst , MA 01003-9264 Room 350 1 413 545-3240 1 413 545-1789 fax allan at cs.umass.edu I am a Professor in the Computer Science Department at UMass Amherst , and co-Director of the Center for Intelligent Information Retrieval CIIR My current work focuses on these areas : interactive information retrieval and organization , including browsing and other human-computer interactions topic detection and tracking TDT automatic information organization evaluation of information retrieval systems A list of my papers is available here This list is generated by the CIIR publications database . In Fall 2008, I am teaching cs187, an undergraduate course on data structures in Java In Spring 2008, I led a

  • The New Ask Is the Same Old Ask

    Updated: 2010-07-30 07:44:42
    I have written so much about Ask.com, formerly AskJeeves.com, that I am not going to go over the long and quite interesting history. I want to talk about DirectHit, the enterprise play, the fling with the Rutgers’ wizards, and the death of the smirking butler. Won’t do it. No. I want to direct your attention to “Can [...]

  • Add Comintelli to the Bandwagon

    Updated: 2010-07-30 07:33:16
    Toss another name into the club of search and enterprise programs utilizing open source technology. Comintelli,  the Stockholm-based developer of enterprise knowledge management software, recently improved its product by using Lucene. Red Orbit reported in its story, “New Enterprise Search Solution Based on Apache Solr Released by Comintelli” that Comintelli would be basing its Knowledge [...]

  • Jeff's Search Engine Caffè Quick Links of the Day KDD Cup Task Oriented Search ScalaNLP SIGIR

    Updated: 2010-07-29 16:51:40
    : : , , , Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 29 Quick Links of the Day : KDD Cup , Task Oriented Search , ScalaNLP , SIGIR Any of these stories could be a full blog post . But , for now I'll just have to give you a few quick : pointers SIGIR 2010 Industry day videos complete videos of all the talks , via Noisy Channel ScalaNLP A new NLP package in Scala from the Berkeley and Stanford NLP teams . Scala is hip new language for NLP that runs inside the JVM . See also the factorie project from UMass's IESL lab KDD Cup Challenge Results This year's competition asked participants to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring . Systems TabCandy

  • Eurospider 26.07.2010 ACM SIGIR Industry Track

    Updated: 2010-07-29 16:51:40
    : SIGIR 2010 Industry Track The SIGIR 2010 Industry Track organized by David Harper Google , Switzerland and Peter Schäuble Eurospider , Switzerland was a success . In the morning session four keynote talks were presented from influential technical leaders Baidu , Google , Bing , Yandex During the afternoon session , seven presentation showed interesting , novel , and innovative ideas from the search . industry Future Search : From Information Retrieval to Information Enabled Commerce : Speaker William Chang , Baidu : Abstract The China Economic Miracle has produced thirty years of sustained 10 GDP growth , allowing China to overtake Japan . Recently , concerned with social issues , debt safety , high commodity prices and weak exports , China has sought to tame that part of GDP derived

  • SIGIR 2010 Day 3 Industry Track Afternoon Sessions

    Updated: 2010-07-29 16:51:39
    : The Noisy Channel BOOK COMMUNITY ABOUT DISCLOSURE SIGIR 2010 : Day 3 Industry Track Keynotes SIGIR 2010 : Day 3 Industry Track Afternoon Sessions July 27th , 2010 No Comments General While the SIGIR 2010 Industry Track keynotes had the highest-profile speakers , the rest of the day assembled an impressive : line-up The new frontiers of Web search : going beyond the 10 blue links Ricardo Baeza-Yates , Andrei Broder , Yoelle Maarek , and Prabhakar Raghavan , Yahoo Labs Cross-Language Information Retrieval in the Legal Domain Samir Abdou and Thomas Arni , Eurospider Building and Configuring a Real-Time Indexing System Garret Swart , Ravi Palakodety , Mohammad Faisal , Wesley Lin , Oracle Lessons and Challenges from Product Search Daniel E . Rose , A9.com Amazon Being Social : Research in

  • ScalaNLP

    Updated: 2010-07-29 16:51:38
    ScalaNLP Home Contact Documentation Download People Welcome April 18, 2010 ScalaNLP is a collection of libraries for Natural Language Processing , Machine Learning , and Statistics . We have a number of subprojects , each with a different : focus Scalala is a high performance numeric linear algebra library for Scala , with rich Matlab-like operators on vectors and matrices a library of numerical routines and support for plotting functions and data . Scalala can be used interactively , or as a . library ScalaNLP-Data consists of support classes for data and text processing and computation . pipelining ScalaNLP-Learn includes commonly used learning and optimization algorithms , such as L-BFGS and a logistic classifier . It also contains statistical distributions and sampling . routines

  • IESL Main Home Page browse

    Updated: 2010-07-29 16:51:37
    IESL View Edit History Print IESL Home Community portal Current events Recent changes Help Log Out Information Extraction and Synthesis Laboratory Information A collection of facts , relations or events from which conclusions may be drawn . Knowledge that has been gathered or . received Extraction Obtaining materials in concentrated , usable form from a dilluted , unusable . source Synthesis The combining of separate elements or substances to form a coherent whole . Reasoning from the general to the particular logical . deduction Laboratory An organization performing scientific experimentation or . research Andrew McCallum Director . IESL aims to dramatically increase our ability to mine actionable knowledge from unstructured text . We are especially interested in information extraction

  • factorie Project Hosting on Google Code

    Updated: 2010-07-29 16:51:37
    My favorites Sign in factorie Probabilistic programming with imperatively-defined factor graphs Project Home Downloads Wiki Issues Source Summary Updates People : Activity Medium Code license : Eclipse Public License 1.0 Content license : Creative Commons 3.0 BY : Labels machinelearning graphicalmodels factorgraphs probabilisticprogramming scala nlp informationextraction informationintegration Featured : downloads factorie-0.8.1-src.tar.gz Show all Featured wiki : pages Examples Installation Overview Show all : Feeds Project feeds : Groups factorie-discuss : Owners andrew.k.mccallum : Committers sebastian.riedel sameeersingh tim.f.vieira kedar.bellare karlschultz thebiasedestimator isabel0913 adamchandra People details FACTORIE is a toolkit for deployable probabilistic modeling ,

  • Jeff's Search Engine Caffè Quick Links of the Day KDD Cup Task Oriented Search ScalaNLP SIGIR

    Updated: 2010-07-29 16:51:32
    : : , , , Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 29 Quick Links of the Day : KDD Cup , Task Oriented Search , ScalaNLP , SIGIR Any of these stories could be a full blog post . But , for now I'll just have to give you a few quick : pointers SIGIR 2010 Industry day videos complete videos of all the talks , via Noisy Channel ScalaNLP A new NLP package in Scala from the Berkeley and Stanford NLP teams . Scala is hip new language for NLP that runs inside the JVM . See also the factorie project from UMass's IESL lab KDD Cup Challenge Results This year's competition asked participants to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring . Systems TabCandy

  • Jeff's Search Engine Caffè Quick Links of the Day KDD Cup Task Oriented Search ScalaNLP SIGIR

    Updated: 2010-07-29 16:51:32
    : : , , , Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 29 Quick Links of the Day : KDD Cup , Task Oriented Search , ScalaNLP , SIGIR Any of these stories could be a full blog post . But , for now I'll just have to give you a few quick : pointers SIGIR 2010 Industry day videos complete videos of all the talks , via Noisy Channel ScalaNLP A new NLP package in Scala from the Berkeley and Stanford NLP teams . Scala is hip new language for NLP that runs inside the JVM . See also the factorie project from UMass's IESL lab KDD Cup Challenge Results This year's competition asked participants to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring . Systems TabCandy

  • Quote to Note: Google about Android and Content

    Updated: 2010-07-29 11:54:47
    Here’s a quote to note. Today is July 29, 2010, and I don’t want this puppy to slip away. The story “Eric Schmidt on Google’s Next Tricks” is about Google’s dependence on advertising revenue. There is nothing wrong with billions of dollars. The problem is that Apple’s revenues are more diversified and Apple is moving [...]

  • Google Books Israel Edition

    Updated: 2010-07-29 07:23:41
    Nobody ever said the next frontier of literature would be smooth, but it is a realm that will be conquered none the less. Google is learning all about the highs and lows of digital books these days. A recent Globes article, “Google Books Reaches Isreal,” [NOTE: Link may be dead when you read this Beyond [...]

  • Online Paywalls: British Users Click Elsewhere

    Updated: 2010-07-29 07:22:07
    Internet users in England are the biggest online penny pinchers. Net Imperative recently reported these finding in an article, “British Least Likely to Pay for Online Content According to a New Survey.” The survey, performed by global accounting firm KPMG, discovered nearly 81 percent of Brits polled would prefer not to pay for online content, [...]

  • SAP Picks Black Duck

    Updated: 2010-07-29 07:12:33
    We received an email with information about SAP’s open source activities. The good news is that Black Duck Software, a provider of products and services for accelerating software development through the managed use of open source software, issued this statement: SAP has selected to implement the Black Duck™ Suite. The comprehensive suite provides a platform for [...]

  • Facebook Runs Wide Open

    Updated: 2010-07-28 07:33:58
    Open source technology, once relegated to the furthest reaches of computer geekdom, is helping over 500 million people a day share status info, view photos and even poke a friend, they just don’t know it. Facebook, the King Kong of social media, has embraced open source tools, especially Lucene products, on several different levels so [...]

  • KDD 2010 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

    Updated: 2010-07-27 14:22:43
    : Conference Home Program Live Video Social Net Papers Posters Workshops Tutorials Panels Demos Exhibits KDD Cup Awards Attend Register Travel Dates Sponsors Organizers Welcome to KDD-2010 Click here for KDD Social networking The latest conference program schedule is available if you want to start planning your activities . The conference features plenty of lunches , coffee breaks , and evening activities . Click here to see all of the latest news Introduction The annual ACM SIGKDD conference is the premier international forum for data mining researchers and practitioners from academia , industry , and government to share their ideas , research results and experiences . KDD-2010 will feature keynote presentations , oral paper presentations , poster sessions , workshops , tutorials , panels

  • David Jensen

    Updated: 2010-07-27 14:22:43
    David Jensen I am an Associate Professor of Computer Science and Director of the Knowledge Discovery Laboratory at the University of Massachusetts Amherst My research focuses on the statistical aspects and architecture of systems for knowledge discovery in databases and the assessment of those systems for government and business . applications Research Guides to the major results of my work Writing Papers and talks Laboratory My research group , students , and colleagues Teaching Recent and upcoming courses Experience Education , work , and other information Contact Electronic and physical addresses Recent news Profiled in SAFE Tutorial at AFRL Invited talk at IAEA workshop Invited talk at DHS workshop SIGKDD panel on privacy Invited talk at Yahoo Labs Invited talk at LLNL Associate

  • ACM SIGKDD-2010 Conference

    Updated: 2010-07-27 14:22:43
    Home Program main schedule presentation files search create an account login Computational Social Science Speaker David Jensen : Track Plenary Talk 9:00 10:00AM on Tuesday , July 27 in Regency EF CTR David Jensen Department of Computer Science , University of Massachusetts Amherst Abstract Research and applications in knowledge discovery and data mining increasingly address some of the most fundamental questions of social science : What determines the structure and behavior of social networks What influences consumer and voter preferences How does participation in social systems affect behaviors such as fraud , technology adoption , or resource allocation Often for the first time , these questions are being examined by analyzing massive data sets that record the behavior and interactions

  • program:workshops ACM SIGIR 2010

    Updated: 2010-07-27 14:22:42
    ACM SIGIR 2010 Home News Important dates Participate Registration Google grants for female computer scientists Elsevier 2010 App Challenge awarded Visa Letters Venue Local info Events and catering Accommodation Free transportation Programme At-a-glance Keynotes Papers Best paper award Posters guideline Tutorials Workshops Demos Doctoral Consortium Industry Track Program Keynotes talks CfP Get Involved Sponsors Corporate Sponsorship opportunities Connect Facebook Call for : all closed Papers Posters and Demos Tutorials Workshops delayed Doctoral Consortium submissions Industry Track papers Submission content poster content guidelines Content guidelines Submission Time information Mentoring Request Committees Organization SPC PC Tutorial PC Workshop PC Poster Demo PC Mentoring Committee Best

  • Matthew Lease

    Updated: 2010-07-27 14:22:42
    Matthew Lease Matt Lease Assistant Professor School of Information and Department of Computer Science University of Texas at Austin School of Information 1616 Guadalupe Ste 5.202 Austin , TX 78701-1213 Voice : 512 471-9350 Fax : 512 471-3971 Office : UTA 5.450 Campus box : D8600 myinitials ischool.utexas.edu Publications Curriculum Vitae LinkedIn The iSchool Movement Other UT affiliations : Computational Linguistics and Division of Statistics and Scientific Computation Faculty advisor , UT Austin ASIS T student chapter TEACHING SPRING 2010 INF385T CS395T : Topics in Information Retrieval and Web Search INF383D : Mathematical Foundations of Information Studies FALL 2010 INF385T CS395T : Topics in Information Retrieval and Web Search INF384C : Organizing Information RESEARCH Areas :

  • No title

    Updated: 2010-07-27 14:22:42
    SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation Overview Program Call for Participation Organizers Overview While automated Information Retrieval IR technologies have enabled people to quickly and easily find desired information , development of these technologies has historically depended on slow , tedious , and expensive data annotation . For example , the Cranfield paradigm for evaluating IR systems depends on human judges manually assessing documents for topical relevance . Although recent advances in stochastic evaluation algorithms have greatly reduced the number of such assessments needed for reliable evaluation , assessment nonetheless remains an expensive and slow process . nbsp Crowdsourcing represents a promising new avenue for reducing effort , time , and cost

  • Vitor Carvalho Home Page Vitor Rocha de Carvalho frequently confused with Victor Carvalho

    Updated: 2010-07-27 14:22:42
    Vitor R . Carvalho : contact Senior Scientist at Microsoft Bing I'm a senior scientist at Bing.com Before that , nbsp I was a scientist at Microsoft Live Labs Before that , I defended my PhD thesis in Carnegie Mellon working under the ingenious William W . Cohen On a previous life , I was a telecom engineer working at Ericsson R D on cell phone networks , CDMA and all that . jazz I'm interested in applied research interfacing Machine Learning and Information Retrieval In particular , I have worked on learning ranking functions , text mining , information extraction , classification models stacked , online , collective , etc . user recommendation models for email speech acts , email leaks , recipient recommendation , etc . and a handful of other interesting topics . More recently , I've

  • Emine Yilmaz Microsoft Research

    Updated: 2010-07-27 14:22:42
    Share this page Live Favorites Digg del.icio.us Twitter Newsvine Facebook Projects Publications People Downloads Home Our Research Collaboration Careers Worldwide Labs Research Areas Research Groups Project Tuva Enhanced Video Player Watch the Feynman Lectures People Emine Yilmaz Emine Yilmaz I am a post doc researcher in the Information Retrieval and Analysis Group at Microsoft Research Cambridge . My current work involves evaluation of retrieval systems , the effect of evaluation metrics on learning to rank problems and modelling user behaviour . My main interests are information retrieval and applications of information theory , statistics and machine learning . Before joining Microsoft Research , I was a PhD student and a research assistant at Northeastern University . My work there

  • Home page of Gabriella Kazai

    Updated: 2010-07-27 14:22:14
    Home Research Projects Publications Photo album Links I am a PhD student and research assistant in the Department of Computer Science at Queen Mary University of London where I am a member of the Information Retrieval group . My PhD is on the Evaluation of Structured Document Retrieval . My supervisor is Mounia Lalmas I am interested in most things to do with information retrieval , but in particular , I enjoy working on the development of metrics for the evaluation of content-oriented XML retrieval , building models for structured document retrieval and applying these to XML and MPEG-7 data and the Web , investigating content analysis and knowledge representation techniques for information retrieval , interface design for IR systems , personalisation , recommendation , user modelling and

  • Stefano Mizzaro Home Page

    Updated: 2010-07-27 14:22:13
    Stefano Mizzaro Stefano Mizzaro Department of Mathematics and Computer Science Faculty of Science University of Udine Via delle Scienze , 206 33100 Udine ITALY ph : 39 0432 558456 fax : 39 0432 558499 email : mizzaro at dimi dot uniud dot it Hi Welcome to my home page . I was born in Udine , 21st January , 1966. I live in Udine , I'm male , married . I got my degree in 1992 and my PhD in 1997. I am Associate Professor at Department of Maths and Computer Science of the University of Udine Research My research activities are mainly in the fields of Web information retrieval , artificial intelligence , scholarly publishing , mobile devices . And other ones , of course . See the following links for more details . Research interests Detailed but quite outdated curriculum vitae in Italian , in

  • Jeff's Search Engine Caffè SIGIR 2010 Industry Day Lessons and Challenges from Product Search

    Updated: 2010-07-22 22:16:32
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Industry Day : Lessons and Challenges from Product Search Lessons and Challenges from Product Search Daniel Rose , A9 Different Domains , Different Solutions Traditional , IR Enterprise search Web search Product Search How are the issues different Let's go back to user goals . The Goals of Web Search Understsanding user goals in web search paper Why do people search on Amazon When they want to buy something Even ignoring the non-buying issues . The Goals of the product Search Depends on where you are in the buying . funnel Top : awareness , then Desire , then Interest , finally Action St . Elmo Lewis , 1898 Provide the right tools at the right stage in the

  • Jeff's Search Engine Caffè SIGIR 2010 Industry Day Lessons and Challenges from Product Search

    Updated: 2010-07-22 22:16:32
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Industry Day : Lessons and Challenges from Product Search Lessons and Challenges from Product Search Daniel Rose , A9 Different Domains , Different Solutions Traditional , IR Enterprise search Web search Product Search How are the issues different Let's go back to user goals . The Goals of Web Search Understsanding user goals in web search paper Why do people search on Amazon When they want to buy something Even ignoring the non-buying issues . The Goals of the product Search Depends on where you are in the buying . funnel Top : awareness , then Desire , then Interest , finally Action St . Elmo Lewis , 1898 Provide the right tools at the right stage in the

  • Jeff's Search Engine Caffè Microsoft Releases Learning to Rank Datasets

    Updated: 2010-07-22 14:13:03
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 Microsoft Releases Learning to Rank Datasets Microsoft Research announced that it is releasing a new MS LTR datase . t We release two large scale datasets for research on learning to rank : MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 . queries 136 features have been extracted for each query-url . pair The dataset is a retired dataset . What makes this quite interesting is that the features have been released . You can see the feature list See also the Y LTR datasets Posted by jeff.dalton at 6:59 AM 0 comments : Post a Comment Older Post Home Subscribe to : Post Comments Atom About Me Jeff Dalton I'm a Comp . Sci . grad

  • Microsoft Learning to Rank Datasets Microsoft Research

    Updated: 2010-07-22 14:13:03
    Share this page Live Favorites Digg del.icio.us Twitter Newsvine Facebook Projects Publications People Downloads Home Our Research Collaboration Careers Worldwide Labs Research Areas Research Groups Project Tuva Enhanced Video Player Watch the Feynman Lectures Projects Microsoft Learning to Rank Datasets Microsoft Learning to Rank Datasets We release two large scale datasets for research on learning to rank : MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 queries . Dataset Descriptions The datasets are machine learning data , in which queries and urls are represented by IDs . The datasets consist of feature vectors extracted from query-url pairs along with relevance judgment : labels 1 The relevance judgments are obtained from a retired

  • Feature List Microsoft Research

    Updated: 2010-07-22 14:13:01
    Share this page Live Favorites Digg del.icio.us Twitter Newsvine Facebook Projects Publications People Downloads Home Our Research Collaboration Careers Worldwide Labs Research Areas Research Groups Project Tuva Enhanced Video Player Watch the Feynman Lectures Projects Microsoft Learning to Rank Datasets Feature List Feature List Each query-url pair is represented by a 136-dimensional . vector Feature List of Microsoft Learning to Rank Datasets feature id feature description stream comments 1 covered query term number body 2 anchor 3 title 4 url 5 whole document 6 covered query term ratio body 7 anchor 8 title 9 url 10 whole document 11 stream length body 12 anchor 13 title 14 url 15 whole document 16 IDF(Inverse document frequency body 17 anchor 18 title 19 url 20 whole document 21 sum of

  • Jeff's Search Engine Caffè Microsoft Releases Learning to Rank Datasets

    Updated: 2010-07-22 14:13:00
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 Microsoft Releases Learning to Rank Datasets Microsoft Research announced that it is releasing a new MS LTR datase . t We release two large scale datasets for research on learning to rank : MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 . queries 136 features have been extracted for each query-url . pair The dataset is a retired dataset . What makes this quite interesting is that the features have been released . You can see the feature list See also the Y LTR datasets Posted by jeff.dalton at 6:59 AM 0 comments : Post a Comment Older Post Home Subscribe to : Post Comments Atom About Me Jeff Dalton I'm a Comp . Sci . grad

  • Jeff's Search Engine Caffè Microsoft Releases Learning to Rank Datasets

    Updated: 2010-07-22 14:13:00
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 Microsoft Releases Learning to Rank Datasets Microsoft Research announced that it is releasing a new MS LTR datase . t We release two large scale datasets for research on learning to rank : MSLR-WEB30k with more than 30,000 queries and a random sampling of it MSLR-WEB10K with 10,000 . queries 136 features have been extracted for each query-url . pair The dataset is a retired dataset . What makes this quite interesting is that the features have been released . You can see the feature list See also the Y LTR datasets Posted by jeff.dalton at 6:59 AM 0 comments : Post a Comment Older Post Home Subscribe to : Post Comments Atom About Me Jeff Dalton I'm a Comp . Sci . grad

  • Jeff's Search Engine Caffè SIGIR 2010 Industry Day Machine Learning in Search Quality at Yandex

    Updated: 2010-07-22 14:12:59
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Industry Day : Machine Learning in Search Quality at Yandex Machine Learning in Search Quality at Yandex Ilya Segalovich , Yandex Russian Search Market Yandex has 60+ market share It's all about small attention to details about the search A Yandex overview started in 1997 no 7 search engine in the world by of queries 150 million queries per day Variety of Markets 15 countries with cyrillic alphabet 77 regions in Russia different culture , standard of living , average income , for example : Moscow , Magadan large semi-autonomous ethnic groups tatar , chech , bashkir neighbouring bilingual markets Geo-specific queries Relevant result sets very significantly

  • Jeff's Search Engine Caffè SIGIR 2010 Industry Day Query Understanding at Bing

    Updated: 2010-07-22 14:12:57
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Industry Day : Query Understanding at Bing Query Understanding at Bing Jan Pederson Standard IR assumptions Queries are well-formed expressions of intent Best effort response to the query as given Reality : queries contain errors 10 of queries are mispelled incorrect use of terms large vocabulary gap Users will reformulate if results do not meet information need Reality : If you don't understand what's wrong you can't reformulate . You miss good content and go down dead ends Take the query , understand what is being set and modify the query to get better results Problem Definitions Best effort retrieval Find the most relevant results for the user query

  • Jeff's Search Engine Caffè SIGIR 2010 Industry Day Search Flavours at Google

    Updated: 2010-07-22 14:12:53
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Industry Day : Search Flavours at Google Search Flavours : Recent updates and Trends Yossi Matias Director of Israel R D Center , Google Solution for the search problem : imitate a person Wish list knows everything language agnostic always up to day context sensitive understands me Good sense of timing Good sense of scope Smart about interaction Suggest answers to questions I didn't ask or didn't ask accurately In short , things we expect from people when we interact from experts or friends . This is . subtle Demo of things auto suggest of weather an intelligent guess at what the user will ask flight information for ua 101 weather in the suggestion This is

  • Jeff's Search Engine Caffè SIGIR 2010 Best Paper Award Winners

    Updated: 2010-07-22 14:12:51
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR 2010 Best Paper Award Winners The best paper awards were awarded last night at the banquet . Best Paper Award Assessing the Scenic Route : Measuring the Value of Search Trails in Web Logs R . White , J . Huang In this paper , we present a log-based study estimating the user value of trail following . We compare the relevance , topic coverage , topic diversity , novelty , and utility of full trails over that provided by sub-trails , trail origins landing pages and trail destinations pages where trails end Our findings demonstrate significant value to users in following trails , especially for certain query types . The findings have implications for the design of

  • Jeff's Search Engine Caffè SIGIR Industry Day Baidu on Future Search

    Updated: 2010-07-22 14:12:44
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , July 22 SIGIR Industry Day : Baidu on Future Search Future Search : From Information Retrieval to Information Enabled Commerce William Chang , Baidu Two commerce revolutions 1995 the first web search engines ebay , amazon , etc . China miracle Early History of IEC Early shippers : created corporations , but more important there is a futures market Commerce : coming together to trade : trading goods and information Local : Yellow pages created in 1886 Local classified ads in papers Mail order : Sears catalogue in 1888 for farming supplies enabled by efficient postal service Credit cards : consumer production and data mining Development of advertising science print , radio , tv

  • Jeff's Search Engine Caffè SIGIR 2010 Keynote Donna Harman on Cranfield Paradigm

    Updated: 2010-07-21 14:10:27
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Wednesday , July 21 SIGIR 2010 Keynote : Donna Harman on Cranfield Paradigm Is the Cranfield Paradigm Outdated by Donna Harman , NIST Cranfield 1 1958 1960 Missed most of this due to a late . bus Cranfield 2 1962-1966 Goal : learn what makes a good descriptor new user model : researcher wanting all documents relevant to their question Documents : 1400 recent Papers in aeronautical engineering Questions gathered from authors of the papers , asking for the basic problem the paper addressed and also supplemental questions that could have been put to an information services Full relevance assessments at 5 levels complete answer to a question high degree of relevance . necessary for the work

  • Jeff's Search Engine Caffè Amit Singhal on the Evolution of Search Searching without Searches

    Updated: 2010-07-20 14:29:06
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Tuesday , July 20 Amit Singhal on the Evolution of Search : Searching without Searches Engadget has an article covering a presentation no details provided given by Amit Singhal on the evolution of search Most of the interview outlines the evolution of search towards multimedia , real-time search , etc . Most of it has been well covered in the past . One interesting note is that Amit outlines his vision for one possible future direction of . search Your phone knows about your shopping needs because they're in your to-do list and it knows about your meetings because they're in your schedule . All it needs is your location which , of course , it has and some local area information , and

  • Search in SharePoint 2010: Microsoft options

    Updated: 2010-07-20 07:08:03
    : Enterprise Search The business and technology of corporate search Home About Archives Subscribe Next Generation of Curating Tools Main July 20, 2010 Search in SharePoint 2010 : Microsoft options SharePoint 2010 seems to be gaining traction both among its existing customer base and with companies looking for new web content management WCM systems . Great search is critical in to a successful WCM deployment , and just about every serious search technology has a way to connect to SharePoint . Microsoft , not to be outdone by its competitors , has five unique search engines for SharePoint : 2010 SharePoint Foundation 2010 Search Search Server Express Search Server 2010 SharePoint Server 2010 FAST Search Server 2010 for SharePoint The foundation search is basic search within SharePoint only ,

  • Jeff's Search Engine Caffè Headed to SIGIR 2010

    Updated: 2010-07-19 14:21:42
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Monday , July 19 Headed to SIGIR 2010 I'm leaving for Geneva today to attend SIGIR I look forward to seeing you there I will be live-blogging the keynote talks subject to WiFi availability and providing other coverage . I will also be tweeting Today is tutorial . day The main talks start tomorrow . To get started , here are the best paper nominees from the . website A comparison of general vs personalized affective models for the prediction of topical relevance , I . Arapakis , K . Athanasakos , J . Jose Assessing the Scenic Route : Measuring the Value of Search Trails in Web Logs , R . White , J . Huang Caching Search Engine Results over Incremental Indices , F . Junqueira , R . Blanco ,

  • Jeff's Search Engine Caffè July 18, 2010

    Updated: 2010-07-19 14:21:33
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Monday , July 19 Headed to SIGIR 2010 I'm leaving for Geneva today to attend SIGIR I look forward to seeing you there I will be live-blogging the keynote talks subject to WiFi availability and providing other coverage . I will also be tweeting Today is tutorial . day The main talks start tomorrow . To get started , here are the best paper nominees from the . website A comparison of general vs personalized affective models for the prediction of topical relevance , I . Arapakis , K . Athanasakos , J . Jose Assessing the Scenic Route : Measuring the Value of Search Trails in Web Logs , R . White , J . Huang Caching Search Engine Results over Incremental Indices , F . Junqueira , R . Blanco ,

  • Jeff's Search Engine Caffè The Impact of TREC and its Future Directions

    Updated: 2010-07-16 21:51:46
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Friday , July 16 The Impact of TREC and its Future Directions This week NIST released a report on the Economic Impact Assessment of TREC In section 6 they report the results of a survey where stakeholders were asked to rate the importance of the different TREC tracks . The most important tracks were Adhoc Track with 77 rating it very important and the Web Track with 74 Other highly rated tracks were the TrecVid and Q A tracks . In contrast , the Spam Track mainly email and Speech track were ranked at the . bottom Here are few highlights from the conclusion . In monetary : terms As described in Section 6, 16 million of discounted investments have made by NIST and others in TREC have

  • Warning: Suspicious Conferences

    Updated: 2010-07-16 20:04:04
    I’ve been getting a lot of spam conference announcements. Unsubscribe requests are ignored. I went to their websites and checked their program committees. Some of the committee members I contacted said that their names were used without permission. The conferences share the following contact address: 10 Anson Road #14-04, International Plaza Singapore – 079903 DID [...]

Current Feed Items | Previous Months Items

Jun 2010 | May 2010 | Apr 2010 | Mar 2010 | Feb 2010 | Jan 2010