《电子商务 E-business》阅读文献:Recommender Systems for Social Bookmarking

Recommender Systems for Social Bookmarking PROEFSCHRIFT ter verkrijging van de graad van doctor an de universiteit van Tilburg gezag van prof. dr. Ph. Eijlander in het openbaar te verdedigen ten overstaan van een door het college voor promotes aangewezen commissie in de aula van de universiteit op dinsdag 8 december 2009 om 14.15 uur door Antonius Marinus Bogers geboren op 21 september 1979 te Roosendaal en Nispen
Recommender Systems for Social Bookmarking PROEFSCHRIFT ter verkrijging van de graad van doctor aan de Universiteit van Tilburg, op gezag van de rector magnificus, prof. dr. Ph. Eijlander, in het openbaar te verdedigen ten overstaan van een door het college voor promoties aangewezen commissie in de aula van de Universiteit op dinsdag 8 december 2009 om 14.15 uur door Antonius Marinus Bogers, geboren op 21 september 1979 te Roosendaal en Nispen

Promotor Prof dr. A.pj. van den bosch Beoordelingscommissie Prof dr h.j. van den herik Prof dr M. de Rijke Prof dr L boves Dr B. Larsen The research reported in this thesis has been funded by SenterNovem /the Dutch Ministry of Economic Affairs as part of the IOP-MMI A Propos project SIKS Dissertation Series No. 2009-42 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems Tice TiCC Dissertation Series No. 10 ISBN97890-8559582-3 Copyright 2009, A M. Bogers All rights reserved. No part of this publication may be reproduced, stored in a retrieval sys tem, or transmitted, in any form or by any means, electronically, mechanically, photocopying, recording or otherwise, without prior permission of the author
Promotor: Prof. dr. A.P.J. van den Bosch Beoordelingscommissie: Prof. dr. H.J. van den Herik Prof. dr. M. de Rijke Prof. dr. L. Boves Dr. B. Larsen Dr. J.J. Paijmans The research reported in this thesis has been funded by SenterNovem / the Dutch Ministry of Economic Affairs as part of the IOP-MMI À Propos project. SIKS Dissertation Series No. 2009-42 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. TiCC Dissertation Series No. 10 ISBN 978-90-8559-582-3 Copyright c 2009, A.M. Bogers All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronically, mechanically, photocopying, recording or otherwise, without prior permission of the author

GC The Web, they say, is leaving the era of search and entering one of discovery. thing. Discovery is when something wonderful that you didn 't know existed, or didn 't know how to ask for, finds you Jeffrey M. o'Brien
“ The Web, they say, is leaving the era of search and entering one of discovery. What’s the difference? Search is what you do when you’re looking for something. Discovery is when something wonderful that you didn’t know existed, or didn’t know how to ask for, finds you. ” Jeffrey M. O’Brien

lll
iii

PREFACE First and foremost I would like to thank my supervisor and promotor Antal van den bosch who guided me in my first steps as a researcher, both for my Masters thesis and my Ph. D research. Antal always gave me free reign in investigating many different research prob- lems, while at the same time managing to steer me in the right direction when the time called for it. Antal was always able to make time for me or any of the other Ph. D. students and read and comment on paper or presentation drafts In addition to turning me into a better researcher, Antal was also instrumental in improving my Guitar Hero skills. Our thesis meetings during your sabbatical doubled as a kind of Rock n Roll Fantasy Camp, where we could both unwind from discussing yet another batch of experiments I had run or was planning to run. Rock on! Antal also shares my passion for ice hockey. This resulted in us attending Tilburg Trappers games in Stappegoor as well as our regular discussions of the latest hockey news. Thanks for inviting me to come see the HL All Star games in Breda. Hopefully we will meet again in spirit come May 2010 when the Canucks beat the Penguins in the Stanley Cup finals! The research presented in this thesis was performed in the context of the a Propos project I would like to acknowledge SenterNovem and the Dutch Ministry of Economic Affairs for funding this project as part of the IOp-MMI program. The a Propos project was started by Lou Boves, Antal, and Frank Hofstede. I would like to thank Lou and Frank in particular. Frank was always able to look at my research problems from a different and more practical angle, and as a result our discussions were always very stimulating. I would also like to Mari Carmen Puerta-Melguizo, Anita Deshpande, and Els den Os, as well as the other members and attendees of the project meetings for the pleasant cooperation and helpful comments and suggestions I wish to thank the members of my committee for taking time out of their busy schedules to read my dissertation and attending my defense: Jaap van den Herik, Maarten de Rijke, Lou Boves, Birger Larsen, and Hans Paijmans. Special thanks go to Jaap for his willingness to go through my thesis with a fine-grained comb. The readability of the final text has benefited greatly from his meticulous attention to detail and quality. Any errors remaining in the thesis are my own. I would also like to thank Birger for his comments, which helped to dot the is and cross the ts of the final product. Finally, I would like to thank Hans Paijmans who contributed considerably to my knowledge of IR. IV
PREFACE First and foremost I would like to thank my supervisor and promotor Antal van den Bosch, who guided me in my first steps as a researcher, both for my Master’s thesis and my Ph.D. research. Antal always gave me free reign in investigating many different research problems, while at the same time managing to steer me in the right direction when the time called for it. Antal was always able to make time for me or any of the other Ph.D. students, and read and comment on paper or presentation drafts. In addition to turning me into a better researcher, Antal was also instrumental in improving my Guitar Hero skills. Our thesis meetings during your sabbatical doubled as a kind of Rock ’n Roll Fantasy Camp, where we could both unwind from discussing yet another batch of experiments I had run or was planning to run. Rock on! Antal also shares my passion for ice hockey. This resulted in us attending Tilburg Trappers games in Stappegoor as well as our regular discussions of the latest hockey news. Thanks for inviting me to come see the NHL All Star games in Breda. Hopefully we will meet again in spirit come May 2010 when the Canucks beat the Penguins in the Stanley Cup finals! The research presented in this thesis was performed in the context of the À Propos project. I would like to acknowledge SenterNovem and the Dutch Ministry of Economic Affairs for funding this project as part of the IOP-MMI program. The À Propos project was started by Lou Boves, Antal, and Frank Hofstede. I would like to thank Lou and Frank in particular. Frank was always able to look at my research problems from a different and more practical angle, and as a result our discussions were always very stimulating. I would also like to Mari Carmen Puerta-Melguizo, Anita Deshpande, and Els den Os, as well as the other members and attendees of the project meetings for the pleasant cooperation and helpful comments and suggestions. I wish to thank the members of my committee for taking time out of their busy schedules to read my dissertation and attending my defense: Jaap van den Herik, Maarten de Rijke, Lou Boves, Birger Larsen, and Hans Paijmans. Special thanks go to Jaap for his willingness to go through my thesis with a fine-grained comb. The readability of the final text has benefited greatly from his meticulous attention to detail and quality. Any errors remaining in the thesis are my own. I would also like to thank Birger for his comments, which helped to dot the i’s and cross the t’s of the final product. Finally, I would like to thank Hans Paijmans, who contributed considerably to my knowledge of IR. iv

My Ph. D. years would not have been as enjoyable and successful without my colleagues at Tilburg University, especially those at the ilK group. It is not everywhere that the bond between colleagues is as strong as it was in iLK and i will not soon forget the coffee breaks with the Sulawesi Boys, the BBQs and Guitar Hero parties, lunch runs, after-work drinks and the friendly and supportive atmosphere on the 3rd floor of Dante. I do not have enough room to thank everyone personally here, you know who you are In your own way, you all contributed to this thesis Over the course of my Ph. D. I have spent many Fridays at the Science Park in Amsterdam, working with members of the ILPS group headed by Maarten de Rijke. I would like to thank Erik Tjong Kim Sang for setting this up and Maarten for allowing me to become a guest researcher at his group. Much of what I know about doing IR research, I learned from these visits. From small things like visualizing research results and LaTeX layout to IR research methodology and a focus on empirical, task-driven research. I hope that some of what I have learned shows in the thesis. i would like to thank all of the ilps members but especially Krisztian, Katja, and Maarten for collaborating with me on expert search, which has proven to be a very fruitful collaboration so far. I have also had the pleasure of working at the Royal School of Library and Information Science in Copenhagen. I am most grateful to Birger Larsen and Peter Ingwersen, for helping to arrange my visit and guiding me around. Thanks are also due to Mette, Haakon, Charles, Jette, and the other members of the Illa group for welcoming me and making me feel at home. Jeg glaeder mig til at arbejde sammen med jer snart Thanks are due to Sunil Patel for designing part of the stylesheet of this thesis and to JonathanFeinbergofhttp://www.wordle.net/forthewordcloudonthefrontofthis thesis. I owe Maarten Clements a debt of gratitude for helping me to more efficiently im- plement his random walk algorithm. And of course thanks to BibSonomy, CiteULike, and Delicious for making the research described in this thesis possible Finally, I would like to thank the three most important groups of people in my life. My friends, for always supporting me and taking my mind off my work. Thanks for all the din- ners, late-night movies, pool games, talks, vacations and trips we have had so far! Thanks to my parents for always supporting me and believing in me; without you I would not have been where I am today. Kirstine, thanks for putting up with me while I was distracted by my work, and thanks for patiently reading and commenting on my Ph. D thesis. Og tusind tak fordi du bringer sa meget glade, latter og kaerlighed ind i mit liv. Det her er til Timmy og Dinky
Preface v My Ph.D. years would not have been as enjoyable and successful without my colleagues at Tilburg University, especially those at the ILK group. It is not everywhere that the bond between colleagues is as strong as it was in ILK and I will not soon forget the coffee breaks with the Sulawesi Boys, the BBQs and Guitar Hero parties, lunch runs, after-work drinks, and the friendly and supportive atmosphere on the 3rd floor of Dante. I do not have enough room to thank everyone personally here, you know who you are. In your own way, you all contributed to this thesis. Over the course of my Ph.D. I have spent many Fridays at the Science Park in Amsterdam, working with members of the ILPS group headed by Maarten de Rijke. I would like to thank Erik Tjong Kim Sang for setting this up and Maarten for allowing me to become a guest researcher at his group. Much of what I know about doing IR research, I learned from these visits. From small things like visualizing research results and LaTeX layout to IR research methodology and a focus on empirical, task-driven research. I hope that some of what I have learned shows in the thesis. I would like to thank all of the ILPS members, but especially Krisztian, Katja, and Maarten for collaborating with me on expert search, which has proven to be a very fruitful collaboration so far. I have also had the pleasure of working at the Royal School of Library and Information Science in Copenhagen. I am most grateful to Birger Larsen and Peter Ingwersen, for helping to arrange my visit and guiding me around. Thanks are also due to Mette, Haakon, Charles, Jette, and the other members of the IIIA group for welcoming me and making me feel at home. Jeg glæder mig til at arbejde sammen med jer snart. Thanks are due to Sunil Patel for designing part of the stylesheet of this thesis and to Jonathan Feinberg of http://www.wordle.net/ for the word cloud on the front of this thesis. I owe Maarten Clements a debt of gratitude for helping me to more efficiently implement his random walk algorithm. And of course thanks to BibSonomy, CiteULike, and Delicious for making the research described in this thesis possible. Finally, I would like to thank the three most important groups of people in my life. My friends, for always supporting me and taking my mind off my work. Thanks for all the dinners, late-night movies, pool games, talks, vacations and trips we have had so far! Thanks to my parents for always supporting me and believing in me; without you I would not have been where I am today. Kirstine, thanks for putting up with me while I was distracted by my work, and thanks for patiently reading and commenting on my Ph.D. thesis. Og tusind tak fordi du bringer så meget glæde, latter og kærlighed ind i mit liv. Det her er til Timmy og Oinky!

CONTENTS Preface 1 Introduction 1.3 Problem Statement and Research Questions 1.4 Research Methodology 5 Organization of the Thesis 6 1.6 Origins of the Material 7 2 Related Work 9 2.1 Recommender Systems 2.1.1 Collaborative Filtering 2.1.2 Content-based Filtering 2.1.3 Knowledge-based Recommendation 2.1.4 Recommending Bookmarks References 2.1.5 Recommendation in Context 2.2.1 Indexing vs. Tagging 2.2.2 Broad vs Narrow Folksonomies 2.2.3 The Social Graph 25 2.3 Social Bookmarking 26 2.3.1 Domains 2.3.2 Interacting with Social Bookmarking Websites 3.3 Research tasks I Recommending bookmarks 3 Building Blocks for the Experiments 3.1 Recommender tasks .2 Data sets 3.2.1 CiteULike 41 3.2.2 BibSonomy
CONTENTS Preface iv 1 Introduction 1 1.1 Social Bookmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Scope of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Problem Statement and Research Questions . . . . . . . . . . . . . . . . . . . . . 3 1.4 Research Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.6 Origins of the Material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 Related Work 9 2.1 Recommender Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.2 Content-based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.3 Knowledge-based Recommendation . . . . . . . . . . . . . . . . . . . . . 14 2.1.4 Recommending Bookmarks & References . . . . . . . . . . . . . . . . . . 15 2.1.5 Recommendation in Context . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Social Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.1 Indexing vs. Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Broad vs. Narrow Folksonomies . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2.3 The Social Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.3 Social Bookmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.1 Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3.2 Interacting with Social Bookmarking Websites . . . . . . . . . . . . . . . 28 2.3.3 Research tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 I Recommending Bookmarks 3 Building Blocks for the Experiments 35 3.1 Recommender Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2 Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.1 CiteULike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.2 BibSonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 vi

Conte 3.2.3 Delicious 3.3 Data re 3. 4.2 Evaluatic 3.4.3 Discussion 4 Folksonomic Recommendation 4.1 Preliminaries 56 4.2 Popularity-based Recommenda 4.3 Collaborative Filtering 4.3.1 Algorith 4.4 Tag-based Collaborative Filtering 4.4.1 Tag Overlap Similarity 4.4.2 Tagging Intensity Similarity 4.4.3 Similarity Fusion 688 4.4.4 Results 4.4.5 Discussion 4.5 Related work 4.6 Comparison to Related Work 4.6.1 Tag-aware Fusion of Collaborative Filtering Algorithms 4.6.2 A Random Walk on the Social Graph 4.6.3 Results 4.6.4 Discussion 4.7 Chapter Conclusions and Answer to RQ 1 5 Exploiting Metadata for Recommendation 5.1 Contextual Metadata in Social Bookmarking 5.2 Exploiting Metadata for Item Recommendation 5888 5.2.1 Content-based Filtering 5.2.2 Hybrid Filtering 5.2.3 Similarity Matching 5.2.4 Selecting Metadata Fields for Recommendation Runs 5.3 Results 5.3.1 Content-based Filtering 5.3.2 Hybrid Filtering 5.3.3 Comparison to Folksonomic Recommendation 98 5.4 Related Work 5.4.1 Content-based Filtering 5.4.2 Hybrid Filterin 5.5 Discussion 5.6 Chapter Conclusions and Answer to RQ 2 105 6 Combining Recommendations 107 6.1 Related Work 108 6.1.1 Fusing Recommendations
Contents vii 3.2.3 Delicious . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.3 Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.4 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.4.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.4.2 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4 Folksonomic Recommendation 55 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4.2 Popularity-based Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.3 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 4.3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 4.4 Tag-based Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.1 Tag Overlap Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.2 Tagging Intensity Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.3 Similarity Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.5 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.6 Comparison to Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.6.1 Tag-aware Fusion of Collaborative Filtering Algorithms . . . . . . . . . 77 4.6.2 A Random Walk on the Social Graph . . . . . . . . . . . . . . . . . . . . . 78 4.6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.6.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 Chapter Conclusions and Answer to RQ 1 . . . . . . . . . . . . . . . . . . . . . . 82 5 Exploiting Metadata for Recommendation 85 5.1 Contextual Metadata in Social Bookmarking . . . . . . . . . . . . . . . . . . . . 86 5.2 Exploiting Metadata for Item Recommendation . . . . . . . . . . . . . . . . . . . 88 5.2.1 Content-based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Hybrid Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.2.3 Similarity Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.2.4 Selecting Metadata Fields for Recommendation Runs . . . . . . . . . . 94 5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.1 Content-based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.3.2 Hybrid Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.3.3 Comparison to Folksonomic Recommendation . . . . . . . . . . . . . . . 98 5.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.1 Content-based Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 5.4.2 Hybrid Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 5.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.6 Chapter Conclusions and Answer to RQ 2 . . . . . . . . . . . . . . . . . . . . . . 105 6 Combining Recommendations 107 6.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 6.1.1 Fusing Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Contents 6.1.2 Data Fusion in Machine Learning and IR 6.1.3 Why Does Fusion Work? 6.2 Fusing Recommendations 112 6.3 Selecting Runs for Fusion 6.4 Results 6.4.1 Fusion Analysis 117 6.4.2 Comparing All Fusion Methods 119 6.5 Discussion conclusions 120 6.6 Chapter Conclusions and Answer to RQ 3 II Growing Pains: Real-world Issues in Social Bookmarking 7 Spam 7.1 Related Work 7.2 Methodology 128 7. 2.1 Data Collection 129 7.2.2 Data Representation 130 7. 2.3 Evaluation 132 7.3 Spam Detection for Social Bookmarking 7.3.1 Language Models for Spam Detection 133 7.3.2 Spam Classification 7.3.3 Results 7.3.4 Discussion and Conclusions 7.4 The Influence of Spam on Recommendation 140 7.4.1 Related Work 7.4.2 Experimental Setup 141 7.4.3 Results and Analysis 142 7.5 Chapter Conclusions and Answer to RQ 4 145 8 Duplicates 147 8.1 Duplicates in CiteULike ..148 8.2 Related Work 8.3 Duplicate Detection 151 8.3.1 Creating a Training Set 151 8.3.2 Constructing a Duplicate Item Classifier 153 8.3.3 Results and Analysis 8.4 The Influence of Duplicates on Recommendation 8.4.1 Experimental Setup 8.4.2 Results and Analysis 162 8.5 Chapter Conclusions and Answer to RQ 5 III Conclusion 9 Discussion and Conclusions 9.1 Answers to Research Questions 169 9.2 Recommendations for Recommendation
Contents viii 6.1.2 Data Fusion in Machine Learning and IR . . . . . . . . . . . . . . . . . . 110 6.1.3 Why Does Fusion Work? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.2 Fusing Recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.3 Selecting Runs for Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.4.1 Fusion Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 6.4.2 Comparing All Fusion Methods . . . . . . . . . . . . . . . . . . . . . . . . 119 6.5 Discussion & Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.6 Chapter Conclusions and Answer to RQ 3 . . . . . . . . . . . . . . . . . . . . . . 121 II Growing Pains: Real-world Issues in Social Bookmarking 7 Spam 125 7.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 7.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 7.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 7.2.2 Data Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 7.2.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 7.3 Spam Detection for Social Bookmarking . . . . . . . . . . . . . . . . . . . . . . . 132 7.3.1 Language Models for Spam Detection . . . . . . . . . . . . . . . . . . . . 133 7.3.2 Spam Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 7.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 7.3.4 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 138 7.4 The Influence of Spam on Recommendation . . . . . . . . . . . . . . . . . . . . . 140 7.4.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 7.4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 7.4.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 7.5 Chapter Conclusions and Answer to RQ 4 . . . . . . . . . . . . . . . . . . . . . . 145 8 Duplicates 147 8.1 Duplicates in CiteULike . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 8.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 8.3 Duplicate Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3.1 Creating a Training Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 8.3.2 Constructing a Duplicate Item Classifier . . . . . . . . . . . . . . . . . . . 153 8.3.3 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.4 The Influence of Duplicates on Recommendation . . . . . . . . . . . . . . . . . . 160 8.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 8.4.2 Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 8.5 Chapter Conclusions and Answer to RQ 5 . . . . . . . . . . . . . . . . . . . . . . 164 III Conclusion 9 Discussion and Conclusions 169 9.1 Answers to Research Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169 9.2 Recommendations for Recommendation . . . . . . . . . . . . . . . . . . . . . . . 172

Contents 9.3 Summary of Contributions 9.4 Future Directions 174 References 177 Appendices a Collecting the CiteULike Data Set 191 A1 Extending the Public Data Dump A2 Spam Annotation B Glossary of Recommendation Runs 195 C Optimal Fusion Weight 197 D Duplicate Annotation in CiteULike 203 List of Figures 205 List of tables 207 List of abbreviations 209 Summar 211 Samenvatting 215 Curriculum vitae 219 Publications 221 SIKS Dissertation Series 223 TiCC Dissertation series 229
Contents ix 9.3 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 9.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 References 177 Appendices A Collecting the CiteULike Data Set 191 A.1 Extending the Public Data Dump . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 A.2 Spam Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 B Glossary of Recommendation Runs 195 C Optimal Fusion Weights 197 D Duplicate Annotation in CiteULike 203 List of Figures 205 List of Tables 207 List of Abbreviations 209 Summary 211 Samenvatting 215 Curriculum Vitae 219 Publications 221 SIKS Dissertation Series 223 TiCC Dissertation Series 229
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《电子商务 E-business》阅读文献:Recommender Systems for Intelligence Analysts.pdf
- 《电子商务 E-business》阅读文献:Recommender system for software project planning one application of revised CBR algorithm.pdf
- 《电子商务 E-business》阅读文献:Recommender System Based on Collaborative Behavior of Ants.pdf
- 《电子商务 E-business》阅读文献:Recommender system architecture for adaptive green marketing.pdf
- 《电子商务 E-business》阅读文献:Recommendation-based editor for business process modeling.pdf
- 《电子商务 E-business》阅读文献:Recommendation of Web Pages Based on Concept Association.pdf
- 《电子商务 E-business》阅读文献:Reasonable tag-based collaborative filtering for social tagging systems.pdf
- 《电子商务 E-business》阅读文献:Ranked Tag Recommendation Systems Based on Logistic Regression.pdf
- 《电子商务 E-business》阅读文献:Push-Poll Recommender System Supporting Word of Mouth.pdf
- 《电子商务 E-business》阅读文献:Proposing a charting recommender system for second-language nurses.pdf
- 《电子商务 E-business》阅读文献:Probabilistic Latent Semantic Analysis.pdf
- 《电子商务 E-business》阅读文献:preference learning in recommender systems.pdf
- 《电子商务 E-business》阅读文献:Photo-Based User Profiling for Tourism Recommender Systems.pdf
- 《电子商务 E-business》阅读文献:Personalized, interactive tag recommendation for flickr.pdf
- 《电子商务 E-business》阅读文献:Personalized Tag Recommendations via Tagging and Content-based Similarity Metrics.pdf
- 《电子商务 E-business》阅读文献:Personalized recommendation of popular blog articles for mobile applications.pdf
- 《电子商务 E-business》阅读文献:Personalized blog content recommender system for mobile phone users.pdf
- 《电子商务 E-business》阅读文献:Particle Swarm Optimization Recommender System.pdf
- 《电子商务 E-business》阅读文献:parallel user profiling based on folksonomy for large scaled recommender systems.pdf
- 《电子商务 E-business》阅读文献:Paper Classification for Recommendation on Research Support System.pdf
- 《电子商务 E-business》阅读文献:Recommender Systems for the Conference Paper Assignment Problem.pdf
- 《电子商务 E-business》阅读文献:Recommender Systems Research A Connection-Centric Survey.pdf
- 《电子商务 E-business》阅读文献:Recommender.Systems.Handbook.Oct.2010.pdf
- 《电子商务 E-business》阅读文献:recommendersystems-slides.pdf
- 《电子商务 E-business》阅读文献:Recommending Related Articles in Wikipedia via a Topic-Based Model.pdf
- 《电子商务 E-business》阅读文献:Recommending scientific articles using citeulike.pdf
- 《电子商务 E-business》阅读文献:Recommending Scientific Literatures in a Collaborative.pdf
- 《电子商务 E-business》阅读文献:Reinforcement Learning An Introduction.pdf
- 《电子商务 E-business》阅读文献:Representation, similarity measures and aggregation methods using fuzzy sets for content-based recommender systems.pdf
- 《电子商务 E-business》阅读文献:REQUEST A Query Language for Customizing Recommendations.pdf
- 《电子商务 E-business》阅读文献:Requirements for expertise location systems in biomedical science and the Semantic Web.pdf
- 《电子商务 E-business》阅读文献:Requirements for total_uncertainty measures in D-S theory of evidence.pdf
- 《电子商务 E-business》阅读文献:Research paper recommendation with topic analysis.pdf
- 《电子商务 E-business》阅读文献:Research Resources for Recommender Systems.pdf
- 《电子商务 E-business》阅读文献:Resource recommendation in social annotation systems A linear-weighted hybrid approach.pdf
- 《电子商务 E-business》阅读文献:Retrieval Failure and Recovery in Recommender Systems.pdf
- 《电子商务 E-business》阅读文献:Riedl-Recommender-Systems.ppt
- 《电子商务 E-business》阅读文献:Scienstein A Research Paper Recommender System.pdf
- 《电子商务 E-business》阅读文献:semantic halo for collaborative tagging systems.pdf
- 《电子商务 E-business》阅读文献:Semantic Reasoning A Path To New Possibilities of Personalization.pdf