• Bloom filters and BigTable

    Updated: 2010-05-31 09:22:00
    It is said that Google's BigTable uses Bloom filters to reduce the disk lookups for non-existent rows or columns.

  • Property Mappings or Why Microsoft Enterprise Search Is a Consultants’ Treasure Chest

    Updated: 2010-05-31 08:04:46
    First, navigate to “Creating Enterprise Search Metadata Property Mappings with PowerShell.” Notice that you may have difficulty reading the story because the Microsoft ad’s close button auto positions itself so you can’t get rid of the ad. Pretty annoying on some netbooks, including my Toshiba NB305. Second, the author of the article is annoyed, but he [...]

  • Cyber Command Revealed

    Updated: 2010-05-31 07:23:49
    Short honk: The goose has no information about this “cyber command.” The goose did read “CYBERCOM – US Military’s Online Watchdog Quiet Startup.” You may want to read the article too. For me the most interesting comment in that write up was: The current status of CYBERCOM is as a preliminary upgrade/streamline of existing Strategic Command [...]

  • Google and Finance

    Updated: 2010-05-31 07:03:32
    Short honk: Google’s move into financial trading is official. “Google’s Latest Launch: Its Own Trading Floor” makes clear that it will manage its own money. According to the write up in Bloomberg Businessweek: Google’s trading room opened in January. The plan is to keep the war chest growing safely and ready to be deployed should the [...]

  • WalMarting: Apple and Google

    Updated: 2010-05-31 06:12:37
    ChannelWeb’s “Report: Next Apple TV Will Have No Screen, Cost $99” might be off base. Doesn’t matter. Say Apple to anyone and you hear the words “cool” and maybe “expensive”. A $99 Apple TV is a shocker because buying a case and a dongle for an Apple iPad costs that much. My thought is that [...]

  • SurfRay: Catching the Crest of the SharePoint Wave

    Updated: 2010-05-31 06:01:47
    Editor’s Note: I participated in an email exchange with SurfRay’s management and technical team. I have been tracking the company’s technology for many years. First, I provided some competitive background to the team largely responsible for the Mondosoft product five or six years ago. Then, the Speed of Mind database acceleration and search technology became [...]

  • Animated Zoom at Seattle International Film Festival

    Updated: 2010-05-30 15:12:15
    I love the Zoom books by Istvan Banyai, and I really like the SIFF trailer which animates the Zoom concept over classic cinema moments:

  • Royal at Autonomy

    Updated: 2010-05-30 08:03:41
    “His Royal Highness the Duke of York Visits Autonomy Headquarters” may not do much for the three North Americans reading this blog. A visit from a royal is a very big deal in Cambridge, England. For me the key passage in the write up was: Autonomy Corporation plc, a global leader in infrastructure software for the [...]

  • Endeca and Alfresco

    Updated: 2010-05-30 07:02:57
    Short honk: I wanted to document an item I saw in “Alfresco’s 3.3 ECM Upgrade Delivers CMIS Support, Integration with Lotus Notes, Outlook, Google Docs, Drupal”. Here’s the passage I noted: Alfresco customers include Boise Cascade, Merck, Air Force, Endeca, Cisco and H&R Block. Alfresco is a company I associated with open source software. Endeca is a [...]

  • WebM Fancy Dancing

    Updated: 2010-05-30 06:01:52
    Short honk: If you are tracking Google’s video encoder WebM, you will want to read “Google Asks for Delay in WebM License Consideration.” One tip. Perch your open source legal eagle on your shoulder to get some color for the Google request. You will want to tuck the WebM FAQ link in your bookmarks folder. [...]

  • Google Wi-Fi Data Maybe Both Harm and Foul

    Updated: 2010-05-29 08:03:35
    “Google Keeping WiFi Data from German, Hong Kong Governments” reminded me of British Petroleum’s oil leak. The darned thing has made a big mess and BP may be tarred and feathered. Google is no BP, but it has managed to create a data mess that sprawls from Germany to Hong Kong and maybe other places [...]

  • Top 1,000 Sites: Interesting and Odd

    Updated: 2010-05-29 07:02:16
    You can get Google’s version of the Top 1,000 Web sites via the Double Click Ad Planner. There are some anomalies. I could not spot Google.com nor YouTube.com. Microsoft’s sites were not rolled up but presented as individual sites; for example, Live.com at #3, MSN.com at #5, Microsoft.com at #6, and Bing.com at #14. Same [...]

  • A Visual Guide to the International Conference on Weblogs and Social Media (ICWSM)

    Updated: 2010-05-27 11:12:54
    I’m continually impressed by the breadth of content presented at ICWSM. This conference is truly achieving a unique position for social media and social network researchers: bringing together social science and computer science. I thought it would be interesting to...

  • Terrier/Blogs08

    Updated: 2010-05-26 15:15:32
    Information Retrieval Wiki : Search Login FrontPage RecentChanges FindPage HelpContents Terrier Blogs08 Show Parent Immutable Page Show Changes Get Info More : Actions Show Raw Text Show Print View Delete Cache Attachments Check Spelling Show Like Pages Show Local Site Map Rename Page Delete Page Terrier Blogs08 Blogs08 is a collection of blog posts and feeds used by the TREC Blog track 2009- . The larger brother of Terrier Blogs06 it has very many more posts . It can be obtained from the University of Glasgow Two tasks have been defined on the Blogs08 collection , namely blog distillation and top news story identification . The topics and qrels can be obtained from : 2009 Terrier can index the permalinks blog posts only of the Blogs06 collection with very little changes :

  • The Full Wiki Marries Mapping and Wikipedia

    Updated: 2010-05-26 13:37:54
    Due to an increased amount of traffic from Factbites, I visited the site today and found their new product: The Full Wiki. The Full Wiki extracts locatable concepts from Wikipedia articles and provides an accompanying map annotated with these locations.

  • Google Predict

    Updated: 2010-05-20 16:25:57
    Slashdot points out Google Predict. I’m not privy to the details, but this has the potential to be extremely useful, as in many applications simply having an easy mechanism to apply existing learning algorithms can be extremely helpful. This differs goalwise from MLcomp—instead of public comparisons for research purposes, it’s about private utilization [...]

  • Yahoo Research

    Updated: 2010-05-20 13:16:42
    Home Search Web Mining Machine Learning Econ Social Sys Computational Adv Web Info Management Enter your search query Labs Home About Yahoo Research News People Publications Contact Us Facebook Twitter Flickr People Projects Publications Search by typing the name in the : box Select a Project from the list : below Select Project A New Serving of Sponsored Search A Question of Relevancy Advertising Works Building Better Online Communities Friend Sense Graph Partitioning Hitting the Jackpot with a Greedy Bidding Strategy Ideological Search Match Game Mindset : Intent-driven Search Motif One Fast Wabbit Pig PNUTS Platform for Nimble Universal Table Storage Predictalot Predicting the Future with Basketball Bets Query the Obscure Quest Remixer Royal Jelly Similarity Caching Sparta Squeeze Every

  • FrontPage

    Updated: 2010-05-17 14:30:53
    typo

  • Eugene Agichtein Mathematics and Computer Science Department Emory University

    Updated: 2010-05-11 02:10:28
    , , Emory University Emory College Graduate School Emory IT Eugene Agichtein Assistant Professor Intelligent Information Access Lab IRLab Math CS Dept Emory University CV Publications from Google Scholar and DBLP Contact : information Web : http : www.mathcs.emory.edu eugene E-mail : eugene at mathcs dot emory dot edu Telephone 404 727-7962 : Fax 404 727-5611 Office E500 5th floor Emerson Hall Mailing Addess : Eugene Agichtein Math CS Department , Emory University 400 Dowman Drive , Suite W401 Atlanta Georgia 30322, USA News Slides for my WWW 2010 tutorial on Inferring Searcher Intent posted here Next version of intent tutorial to be presented at AAAI 2010 on July . 11th Research I lead the Emory Intelligent Information Access Lab IRLab We work on information retrieval and text and data

  • Inferring Searcher Intent Tutorial

    Updated: 2010-05-11 02:10:28
    Inferring Web Searcher Intent Tutorial Eugene Agichtein Emory University USA Web : http : www.mathcs.emory.edu eugene E-mail : eugene mathcs.emory.edu Overview This tutorial focuses on a crucial area of web search , namely inferring the intent of the searcher : developing computational models of searcher behavior , interests , and actions that could indicate what the searcher is attempting to accomplish . This problem has seen significant progress over the recent years , mainly due to the availability of search behavior data . Billions of users search the web , clicking on the results , submitting and refining queries and otherwise interacting with the search engines . Mining the vast amount of information generated by these interactions for inferring the search intent has resulted in

  • Jeff's Search Engine Caffè Rejected SIGIR 2010 Paper Consider Not Relevant

    Updated: 2010-05-11 02:10:14
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , March 25 Rejected SIGIR 2010 Paper Consider Not Relevant Yesterday the SIGIR paper notifications went out . I didn't submit a paper , but I know many other people who did . The results were mixed . I believe some quality papers were . rejected Did your paper get rejected unfairly Poor reviewing You should check out : Not Relevant a new online journal for unfairly rejected SIGIR . papers You can also join the discussion on twitter . Posted by jeff.dalton at 1:56 PM 4 comments : Anonymous said . I submitted my paper there . You wouldn't believe the review I . got h . 7:00 PM EDT Anonymous said . Like any other conf . getting a paper accepted should obey the mixture of quality

  • Jeff's Search Engine Caffè Spring 2010 Courses To Watch

    Updated: 2010-05-11 02:10:11
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Tuesday , March 23 Spring 2010 Courses To Watch Here are some of the courses I've run across that I found interesting . I'll start with the ones I'm currently taking here at : UMass CS645 Advanced Databases STAT 608 Bayesian Statistics by Michael Lavine see Chapters 5, 6, and 7 of his . book There's also a very interesting DB seminar , Large Scale Data Analysis by Yanlei which takes a DB perspective on MapReduce and other large-scale data analysis . problems Now here are a few from : elsewhere Jimmy Lin is teaching his course , Data-Intensive Information Processing Applications with . MapReduce Eugene Agitchtein's course on IR and Web Search at . Emory Added 3 25 Michael Jordan at

  • INF385T CS395T Topics in Information Retrieval and Web Search Spring 2010

    Updated: 2010-05-07 12:55:27
    INF385T CS395T : Topics in Information Retrieval and Web Search Spring 2010 INF385T CS395T Topics in Information Retrieval and Web Search Spring 2010 Instructor Matt Lease Day and Time Fridays 1-4pm Location UTA 1.208 in the iSchool Final Presentations Syllabus Schedule Assignments , Readings , and Slides from in-class presentations Restricted Course Information UT EID Required Presenter assignments Course blog for posting critiques Project info : Datasets and Resources Forms Presentation Feedback form to give presenters Presentation Score form to turn-in First day information form Guidelines for assigning authorship of published research National Academy of Sciencies each author should substantially contribute to one or more aspects such as research design , research execution , tool

  • Matthew Lease

    Updated: 2010-05-07 12:55:27
    Matthew Lease Matt Lease Assistant Professor School of Information and Department of Computer Science University of Texas at Austin School of Information 1616 Guadalupe Ste 5.202 Austin , TX 78701-1213 Voice : 512 471-9350 Fax : 512 471-3971 Office : UTA 5.450 Campus box : D8600 myinitials ischool.utexas.edu Publications Curriculum Vitae LinkedIn The iSchool Movement Other UT affiliations : Computational Linguistics and Division of Statistics and Scientific Computation Faculty advisor , UT Austin ASIS T student chapter TEACHING SPRING 2010 INF385T CS395T : Topics in Information Retrieval and Web Search INF383D : Mathematical Foundations of Information Studies FALL 2010 INF385T CS395T : Topics in Information Retrieval and Web Search INF384C : Organizing Information RESEARCH The First UT

  • Home

    Updated: 2010-05-07 12:55:25
    Kristen LeFevre Assistant Professor Computer Science and Engineering 2260 Hayward . Ave Ann Arbor , MI 48109 Office : 4705 CSE Building Phone : 734 763-3229 : E-Mail Research My research interests are in the areas of database systems , applications , and mining . Within this space , I am particularly interested in problems related to data privacy and security . I am affiliated with the Michigan Software Systems Lab Database Group and OpenData IGERT Program I am currently working with two Ph.D . students : Daniel Fabbri and Lujun Tony Fang Teaching EECS 584 : Advanced Database Systems F08, F09 EECS 484 : Database Management Systems W08, W09, W10 Service Activities Since 2008, I have had the pleasure of serving as faculty advisor to the U Michigan gEECs In 2009-10, I am the faculty

  • Jeff's Search Engine Caffè Search is Dead Long Live Search Panel at WWW 2010

    Updated: 2010-05-07 12:55:15
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Wednesday , April 28 Search is Dead Long Live Search Panel at WWW 2010 Continuing the WWW 2010 coverage , this after there was a panel , Search is Dead Long Live Search You can see a poor quality video stream of the panel . The following are the notes from my labmate , Michael You can also see the discussion on Twitter , searchisdead Search is dead Long Live . Search Search for 10 blue links is already dead A failure case is if a user sees just the 10 blue links There is much more diverse data sources and presentations than links to web pages Intense competition to get the tail queries right You miss everyone if you miss the tail It doesn’t take much to get into the tail 1 or 2 more

  • Personalized news recommendation based on click behavior

    Updated: 2010-05-07 12:55:12
    Subscribe Full Service Register Limited Service , Free Login : Search The ACM Digital Library The Guide Feedback Take a look at the new version of this page : beta version Tell us what you think . Personalized news recommendation based on click behavior Full text Pdf 626 KB Source International Conference on Intelligent User Interfaces archive Proceeding of the 14th international conference on Intelligent user interfaces table of contents Hong Kong , China SESSION : Smart reading table of contents Pages : 31-40 Year of Publication : nbsp 2010 ISBN:978-1-60558-515-4 Authors Jiahui Liu Google , Mountain View , CA , USA Peter Dolan Google , Mountain View , CA , USA Elin Rønby Pedersen Google , Mountain View , CA , USA Sponsors SIGCHI ACM Special Interest Group on Computer-Human Interaction

  • Jeff's Search Engine Caffè Intro to Search at Facebook User-centric relevance

    Updated: 2010-05-07 12:55:05
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Tuesday , April 27 Intro to Search at Facebook : User-centric relevance I'm not sure how I missed this , but last month the Facebook Engineering blog posted an intro to search at Facebook A key difference is that at Facebook , a query keywords . The query consists of a User's social context semi-structured profile keywords . Currently , the typical query is a search for a person or group . Computing the social context FoaF graph and using it during query processing is computationally challenging . They hint at this system for a future post . The personal context is critical to their ranking and make search : hard Since our most important ranking features depend on who the searcher is ,

  • Jeff's Search Engine Caffè WWW 2010 this week Semsearch and other events

    Updated: 2010-05-07 12:54:45
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Monday , April 26 WWW 2010 this week : Semsearch and other events WWW 2010 is happening this week in NC . You can follow it on twitter , www2010 I'm not attending , but some of my labmates are there . I'll also try to keep up with information I find online , so help me out by posting your . information Update : I recommend reading the coverage from Krisztian Balog and Christian Grant who both attended the . workshop Today , the SemSearch 2010 workshop is being held . You can follow it on twitter , semsearch2010 Barney Pell gave the keynote talk , Why users need semantic search I hope to have more details on that soon . From his abstract , Our research reveals three key problems of

  • Jeff's Search Engine Caffè April 11, 2010

    Updated: 2010-05-07 12:54:44
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Saturday , April 17 NESCAI 2010 Keynote : Knowledge Representation and Reasoning for Web Search I'm at NESCAI here at UMass this weekend . I'm liveblogging the keynote talk by Ron . Brachman Some opportunities for Knowledge Representation and Reasoning in Web Search and Advertising by Ron Brachman Background on Knowledge Representation Dartmouth 1956 Summer research project on AI it may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture . From this point of view forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others . 8221

  • Jeff's Search Engine Caffè NESCAI 2010 Keynote Knowledge Representation and Reasoning for Web Search

    Updated: 2010-05-07 12:54:43
    : : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Saturday , April 17 NESCAI 2010 Keynote : Knowledge Representation and Reasoning for Web Search I'm at NESCAI here at UMass this weekend . I'm liveblogging the keynote talk by Ron . Brachman Some opportunities for Knowledge Representation and Reasoning in Web Search and Advertising by Ron Brachman Background on Knowledge Representation Dartmouth 1956 Summer research project on AI it may be speculated that a large part of human thought consists of manipulating words according to rules of reasoning and rules of conjecture . From this point of view forming a generalization consists of admitting a new word and some rules whereby sentences containing it imply and are implied by others . 8221

  • Jeff's Search Engine Caffè ECIR 2010 Best Paper Award and other coverage

    Updated: 2010-05-07 12:54:42
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Friday , April 2 ECIR 2010 Best Paper Award and other coverage It's been a busy week , so I'm behind on catching up with ECIR 2010. However , I wanted to share the best paper award : winners Promoting Ranking Diversity for Biomedical Information Retrieval Using Wikipedia joint award Jimmy Huang York University , Canada Xiaoshi Yin York University , Canada I'm not exactly sure who the other join winner is , but inferring from the incomplete tweets , one candidate : is Evaluation of an adaptive search suggestion system Sascha Kriewel University of Duisburg-Essen , Germany Norbert Fuhr University of Duisburg-Essen , Germany Best Poster : Award Filtering document with subspaces Benjamin

  • Jeff's Search Engine Caffè March 21, 2010

    Updated: 2010-05-07 12:54:40
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Thursday , March 25 Rejected SIGIR 2010 Paper Consider Not Relevant Yesterday the SIGIR paper notifications went out . I didn't submit a paper , but I know many other people who did . The results were mixed . I believe some quality papers were . rejected Did your paper get rejected unfairly Poor reviewing You should check out : Not Relevant a new online journal for unfairly rejected SIGIR . papers You can also join the discussion on twitter . Posted by jeff.dalton at 1:56 PM 4 comments Links to this post Wednesday , March 24 AMPLab : Exploring Big Data With Algorithms , Machines , and People Today , there was a talk by Michael Franklin on the future of big data covering the databases vs .

  • Jeff's Search Engine Caffè Continuing SIGIR Uproar

    Updated: 2010-05-07 12:54:40
    : Jeff's Search Engine Caffè Information Retrieval research and search engine development . discussion Monday , March 29 Continuing SIGIR Uproar In a follow-up to the initial post last week about the new online journal Not Relevant Ian Soboroff has posted more information about its future Here is a summary of its : purpose The goal is to publish solid advances in the state of the art , game-changing work , research that pushes the boundaries of IR . We are not looking for incremental . improvements William Webber wrote a great post about the rapidly rising field of complaining about aspects of SIGIR . Highly entertaining and worth . reading Jeremy Pickens posted the reviews of his rejected paper and a discussion of . them If you want to join the discussion , Ian started a Google Group ,

  • A suffix tree implementation with Unicode support

    Updated: 2010-05-03 23:00:00
    : skip to main skip to sidebar Research on Search My study of machine learning , data mining , computational linguistics and information retrieval , towards the grand goal of developing the perfect search engine that understands exactly what you mean and gives you back exactly what you want Larry Page Tuesday , May 04, 2010 A suffix tree implementation with Unicode support It seems that there is currently no suffix tree implementation with Unicode support publicly available online . So I adapted Thomas Mailund's suffix tree implementation in C with a Python binding and put it here The changes that I made to the code were mainly to make it support Unicode text and be compatible with new version Python . It also includes an example program all_comsubstr.py that illustrates the extraction of

  • Longest Common Substring

    Updated: 2010-05-03 21:53:00
    : skip to main skip to sidebar Research on Search My study of machine learning , data mining , computational linguistics and information retrieval , towards the grand goal of developing the perfect search engine that understands exactly what you mean and gives you back exactly what you want Larry Page Monday , May 03, 2010 Longest Common Substring Given two strings , S of length m and T of length n their longest common substrings can be found in O( m n time using a generalised suffix tree or in O( m n time through dynamic programming e.g . the Python code here Posted by Dell Zhang at 10:53 PM 0 comments : Post a Comment Newer Post Older Post Home Subscribe to : Post Comments Atom About Me Dell Zhang View my complete profile Archive 2010 5 May 3 Bloom filters and BigTable A suffix tree

Current Feed Items | Previous Months Items

Apr 2010 | Mar 2010 | Feb 2010 | Jan 2010 | Dec 2009 | Nov 2009