Incorporating Structured World Knowledge into Unstructured Documents via——Heterogeneous Information Networks

ncorporating structured World Knowledge into unstructured documents via Heterogeneous Information Networks Yangqiu song 香港科技大學 THE HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY
Incorporating Structured World Knowledge into Unstructured Documents via Heterogeneous Information Networks Yangqiu Song 1

Collaborators Chenguang Wang Ming Zhang Yizhou sun 闭公 Jiawei han Dan roth Slides Credit: Chenguang Wang
Collaborators Chenguang Wang Ming Zhang Yizhou Sun Jiawei Han Dan Roth Slides Credit: Chenguang Wang 2

Outline Text Analytics: Motivation Two Challenges Representation ·Labe|s Text Categorization via hin HIN COnstruction from texts From hin similarity to clustering and classification World knowledge indirect supervision Conclusions and future work
Outline • Text Analytics: Motivation – Two Challenges • Representation • Labels • Text Categorization via HIN – HIN construction from texts – From HIN similarity to clustering and classification – World knowledge indirect supervision • Conclusions and future work 3

Text Categorization Two Challenges Impacts many applications Social network analysis health care, machine reading Traditional approach Label Train a Mak data classifier prediction Two challenges √ Representation Labels
Text Categorization: Two Challenges • Impacts many applications! ✓ Social network analysis, health care, machine reading … • Traditional approach: • Two challenges: ✓ Representation ✓ Labels 4 Label data Train a classifier Make prediction

Representation Bag-of-words On feb 8 don d that he 7 Februarv 20 reat day in Mobile games Sports Flappy bird Russia lOS Olympics inter Android apps champions Sochi stores game mountains beaches Usiclans sports Internet tro‖l." Trom /to 23 February 2014
Representation: Bag-of-words 5 On Feb. 8, Dong Nguyen announced that he would be removing his hit game Flappy Bird from both the iOS and Android app stores, saying that the success of the game is something he never wanted. Some fans of the game took it personally, replying that they would either kill Nguyen or kill themselves if he followed through with his decision. Frank Lantz, the director of the New York University Game Center, said that Nguyen's meltdown resembles how some actors or musicians behave. "People like that can go a little bonkers after being exposed to this kind of interest and attention," he told ABC News. "Especially when there's a healthy dose of Internet trolls." 7 February 2014 is going to be a great day in the history of Russia with the upcoming XXII Winter Olympics 2014 in Sochi. As the climate in Russia is subtropical, hence you would love to watch ice capped mountains from the beautiful beaches of Sochi. 2014 Winter Olympics would be an ultimate event for you to share your joys, emotions and the winning moments of your favourite sports champions. If you are really an obsessive fan of Winter Olympics games then you should definitely book your ticket to confirm your presence in winter Olympics 2014 which are going to be held in the provincial town, Sochi. Sochi Organizing committee (SOOC) would be responsible for the organization of this great international multi sport event from 7 to 23 February 2014. Flappy Bird iOS Android apps stores game musicians Russia Winter Olympics Sochi mountains beaches sports champions Mobile Games Sports

Context: Topic Models and Word Embeddings Topic Modeling(blei et al. 2003 Topics Documents Topic proportions and assignments tenetic0.自1 Seeking Life's Bare(Genetic) Necessities COLD NIN HARn. NEW YOur=“出m面由f甲 e La wel a the re BoDw中 w tue l I life evolv : oran 器 出“时 n Ilsla Molel Gaone brain h时 neuro w IwnAh't l ewh nerve Ahlesu h tl Hamden Lm LE L dat a nunter sIN1.V 14.24 MAY IN computer Figure source: Blei, D M.(2012). Probabilistic topic models. Communications of the ACM, 55(4),77-84
Context: Topic Models and Word Embeddings • Topic Modeling (Blei et al., 2003) 6

Context: Topic Models and Word Embeddings · Word embedding Softmax classifier Word2vec(Nikolov et al. 13 Glove(Pennington et al. 14 Matrix factorization ∑ embedding (Deerwester 90; Levy et al 15 Projection layer the cat sits on themat Italy Mad Germany walked Berlin swam Russ⊥ walki Canada v⊥ etna Hanoi Male-Female Verb tense Country-Capital https://www.tensorflow.org/versions/ro.7/tutorials/word2vec/index.html
Context: Topic Models and Word Embeddings • Word embedding – Word2vec (Mikolov et al., 13) – Glove (Pennington et al., 14) – Matrix factorization (Deerwester’90;Levy et al., 15) – … https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/index.html 7

What's Missing · The semantics of entities and their relatⅰons Ohama On Feb 10, 2007 Obama announced his candidacy for President of the United St old State front of the Old State Capitol located in portrayed passionate Bush portrayed himself as a compassionate conservative, implying he was more suitable Republicans than other Republicans to go to lead the United States Bush What can context cover New york ys. New york times What cannot? George Washington "VS. Washington Higher order relations Affiliation In Affiliation In Contains Contains Document- Basketbal‖l NBA Basketball -Document Documentsontains Conte Basketball Olympics Basketball Document
What’s Missing? 8 • The semantics of entities and their relations • What can context cover? • What cannot? – Higher order relations ``New York'' vs. ``New York Times'' ``George Washington'' vs. ``Washington'' Document Basketball NBA Basketball Document Contains Contains Affiliation In Affiliation In Document Basketball Olympics Basketball Document Contains Contains

Outline Text Analytics: Motivation Two Challenges Representation Labels Text Categorization via hin HIN cOnstruction from texts From hin similarity to clustering and classification World knowledge indirect supervision Conclusions and future work
Outline • Text Analytics: Motivation – Two Challenges • Representation • Labels • Text Categorization via HIN – HIN construction from texts – From HIN similarity to clustering and classification – World knowledge indirect supervision • Conclusions and future work 9

Acquire Labeled data Expert Semi-supervised Annotation Crowdsourcing /transfer learning f Fast changing domains so amazon mechanical turk Baic百度 HERE smartart t/cheek ToCheek freelancer amazon YAH Simple tasks Many diverse domains Only big companies can Media Aceris hire a lot of experts Low quality Costl Still costly Domain dependent 10
Acquire Labeled Data Expert Annotation Costly Crowdsourcing Simple tasks Low quality Still costly Semi-supervised /transfer learning Domain dependent Many diverse domains Fast changing domains Only big companies can hire a lot of experts 10
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- FairCloud:Sharing the Network in Cloud Computing.pptx
- 香港科技大学:《计算机网络 Computer Networks》课程教学资源(PPT课件)Chapter 1 Introduction of computer networking.ppsx
- Fluent:《GAMBIT建模教程》教学资源(PPT讲稿)Geometry Operations in GAMBIT.ppt
- 有限元分析 ANSYS:Modeling Turbulent Flows(PPT讲稿)Introductory FLUENT Training.ppt
- 隐马尔科夫模型和词性标注(PPT课件讲稿).ppt
- 哈尔滨工业大学:《中文信息处理》课程教学资源(PPT课件讲稿)句法分析(张宇).ppt
- 新乡学院:《计算机网络》课程教学大纲(适用专业:信息与计算科学).pdf
- 新乡学院:《数据库原理》课程电子教案(PPT课件)第3章 关系数据库.ppt
- 《数据库系统概论 An Introduction to Database System》课程教学资源(PPT课件讲稿)第8讲 数据库恢复技术.ppt
- 河南中医药大学:《网络技术实训》课程教学资源(PPT课件讲稿)第4讲 网络管理实训内容(上).pptx
- 河南中医药大学(河南中医学院):《计算机网络》课程教学资源(PPT课件讲稿)第六章 应用层.ppt
- 《计算机辅助设计——Photoshop制图》课程标准.pdf
- 《操作系统 Operating System》课程电子教案(PPT课件讲稿)第一章 简介.ppt
- 《操作系统》课程教学资源(PPT课件讲稿)文件管理 File Management.ppt
- 《Advanced Artificial Intelligence》课程PPT教学课件(高级人工智能)Lecture 6 Convolutional Neural Network.pptx
- 《Advanced Artificial Intelligence》课程PPT教学课件(高级人工智能)Lecture 3 Decision Tree.pptx
- 《Advanced Artificial Intelligence》课程PPT教学课件(高级人工智能)Lecture 5 Neural Networks.pptx
- 北京林业大学:《深度学习》课程PPT教学课件(Deep Learning)第二章 神经网络与优化方法(主讲:孙钰).pptx
- 浙江长征职业技术学院:计算机信息管理专业课程教学大纲汇编.doc
- 《电子商务概论》课程教学资源(PPT课件讲稿)第六章 电子商务支付技术.ppt
- 《计算机网络与通讯》课程教学资源(PPT课件讲稿,英文版)Chapter 07 Network Security.ppt
- C++ Review.ppt
- 《计算机网络与通讯》课程教学资源(PPT课件讲稿,英文版)Chapter 3 Transport Layer.ppt
- 《Java编程导论》课程教学资源(PPT课件讲稿)Chapter 8 Strings and Text I/O.ppt
- 印第安纳大学:《Informatics》课程PPT教学课件(信息学)08 网络爬虫 Web Crawling.ppt
- 《操作系统》课程教学资源(PPT课件讲稿)Chapter 1 and 2 Computer System and Operating System Overview.ppt
- 《操作系统》课程教学资源(PPT课件讲稿)Chapter 6 Concurrency Deadlock and Starvation.ppt
- 《操作系统》课程教学资源(PPT课件讲稿)Chapter 8 Virtual Memory.ppt
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 10 Pose estimation by the iterative method.pptx
- Introduction to Internet and TCPIP(PPT讲稿)IP转发 IP FORWARDING.pptx
- GD-Aggregate:A WAN Virtual Topology Building Tool for Hard Real-Time and Embedded Applications.ppt
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 05 Hough transform.pptx
- 香港中文大学:Image processing and computer vision(PPT课件讲稿)Edge detection and image filtering.pptx
- 《图像处理与计算机视觉 Image Processing and Computer Vision》课程教学资源(PPT课件讲稿)Chapter 07 Mean-shift and Cam-shift.pptx
- Essential Cluster OS Commands.ppt
- 香港浸会大学:Kickstart Tutorial/Seminar on using the 64-nodes P4-Xeon Cluster in Science Faculty.ppt
- 香港浸会大学:并行输入输出(PPT讲稿)Parallel I/O.ppt
- 四川大学:《操作系统 Operating System》课程教学资源(PPT课件讲稿)Chapter 7 Memory Management.ppt
- 四川大学:《数据库技术》课程教学资源(PPT课件讲稿)第4章 数据库查询.ppt
- 《计算机系统结构》课程教学资源(PPT课件讲稿)第五章 存储层次.ppt