《编译原理》课程教学资源:第三章 正则表达式常应用于文本匹配:

本节要点 正则表达式常应用于文本匹配: 串的查找 串的替换 将输入识别为一个个的记号
1 本节要点 • 正则表达式常应用于文本匹配: – 串的查找 – 串的替换 – 将输入识别为一个个的记号

正则表达式的应用 Use #1: Text-processing the web Web is full of data but it's in text form for humans to read · Screenscraping extracting the data you want from screen output these days, the output format is HTML Examples: extract tour schedule of your favorite bands from Ticketmaster web sites as web services: convert address to geo coordinates
正则表达式的应用 • Use #1: Text-processing the web – Web is full of data, but it’s in text form for humans to read • Screenscraping – extracting the data you want from screen output – these days, the output format is HTML • Examples: – extract tour schedule of your favorite bands from Ticketmaster – web sites as web services: convert address to geo coordinates 2

正则表达式的应用 Use #2: Text processing in general a spectrum of uses, from small to big Sma‖!fies: replacing " ugly quotes"with"smart quotes converting files between operating systems · Bigger tasks spell checking formatted documents(HTML): must extract text pretty printing code: find comments, etc; add format directives
正则表达式的应用 • Use #2: Text processing in general – a spectrum of uses, from small to big • Small fixes: – replacing "ugly quotes" with “smart quotes” – converting files between operating systems • Bigger tasks – spell checking formatted documents (HTML): must extract text – pretty printing code: find comments, etc; add format directives 3

正则表达式的应用 Use #3: Program processing especially on the web OntheWebprocedurecalls=httprequests procedure arguments"passed as strings argument extraction can be done with regular expressions · Other uses: extract components of an email address obfuscation: want to obfuscate all JS functions except those called from HTML embedded scripts; so scan web page for names of functions called from HTMl, to avoid obfuscating them
正则表达式的应用 • Use #3: Program processing – especially on the web • On the Web, procedure calls = http requests – “procedure arguments” passed as strings – argument extraction can be done with regular expressions • Other uses: – extract components of an email address – obfuscation: want to obfuscate all JS functions except those called from HTML embedded scripts; so scan web page for names of functions called from HTML, to avoid obfuscating them. 4

Regular Expression Tutorial Focus on the two languages: JavaScript Python a key rules common to both given a string and an regex. e Find the first position in string where a match is possible (except for the match( function in Python, which must match at the beginning of the string
Regular Expression Tutorial • Focus on the two languages: – JavaScript – Python A key rules common to both. Given a string and an regex: Find the first position in string where a match is possible. (except for the match() function in Python, which must match at the beginning of the string.) 5

String search: from simple to regexp JavaScript) Basic search methods for string objects string". indexof(rin") →2 string". indexof(new RegExp(rn))>-1 等效-" tring" search(new RegExp)4n -string" search(new RegExp(r n)2 string search(r. n/ 2 -"string". match(/tri str/ →["str" string". match(/ri ["st","ri"] string". match(/trilstr/g) strstr 参见( js. htm)
String search: from simple to regexp (JavaScript) • Basic search methods, for String objects: – "string".indexOf("rin") → 2 – "string".indexOf(new RegExp("r*n")) → -1 – "string".search(new RegExp("r*n")) → 4 – "string".search(new RegExp("r.*n")) → 2 – "string".search(/r.*n/) → 2 – "string".match(/tri|str/) → ["str"] – "string".match(/ri|st/g) → ["st", "ri"] – "string".match(/tri|str/g) → ["str"] 参见(js.htm) 6 等效

String search: from simple to regexp JavaScript indexof Syntax: object indexof(search Value, fromIndex) When called from a String object, this method returns the index of the first occurance of the specified searchvalue argument, starting from the specified fromIndex argument. search Syntax: object search(regexp) This method is used to search for a match between a regular expression and the specified string RegExp Syntax new RegExp( "pattern"L flags"l)EfEmyReg=pattern/flags
String search: from simple to regexp (JavaScript) • indexOf – Syntax: object.indexOf(searchValue,[fromIndex]) – When called from a String object, this method returns the index of the first occurance of the specified searchValue argument, starting from the specified fromIndex argument. • search – Syntax: object.search(regexp) – This method is used to search for a match between a regular expression and the specified string. – RegExp – Syntax: • new RegExp(“pattern”[, “flags”])或者myReg=pattern/flags

String search: from simple to regexp JavaScript match Syntax: object. match(regexp) This method is used to match a specified regular expression against a string If one or more matches are made, an array is returned that contains all of the matches. Each entry in the array is a copy of a string that contains a match. if no match is made, a nullis returned To perform a global match you must include the g global flag in the regular expression and to perform a case-insensitive match you must include the i'(ignore case) flag ·匹配用过的不用用于匹配
String search: from simple to regexp (JavaScript) • match – Syntax: object.match(regexp) – This method is used to match a specified regular expression against a string – If one or more matches are made, an array is returned that contains all of the matches. Each entry in the array is a copy of a string that contains a match. If no match is made, a null is returned. To perform a global match you must include the 'g' (global) flag in the regular expression and to perform a case-insensitive match you must include the 'i' (ignore case) flag. • 匹配用过的串不再用于匹配

Same for Python Basic search methods for String objects 表示是原始字义 Maton re match(r"tri rin" ,string") → no match/n re. search(r"tril rin","string"). group)o)>tri re compile(rtrilstr").findall("string )>['str re compile(r"rilst ). findall(string >['st,'ri] re search(r"Itr)I(in),string"). groups()>tr None)(()) capful edens note: match("expests the match to start at index o
Same for Python • Basic search methods, for String objects: – re.match(r"tri|rin", "string") → no match – re.search(r"tri|rin", "string").group(0) → 'tri' – re.compile(r"tri|str").findall("string") → ['str'] – re.compile(r"ri|st").findall("string") → ['st', 'ri'] – re.search(r"(tr)|(in)", "string").groups() → ('tr', None) • note: match() expects the match to start at index 0 9 表示是原始字义

Python正则表达式 ·支持“!,"*","+","?","|",“[y"八" ·“^N":匹配串的开始 “S":匹配到串尾 m}:m个重复 m,n}:m到n个重复 *?,+?,?2,{m,n}?:在第一个符号的意义上,改 贪婪的最大匹配为最小匹配 例:用正则表达式匹配“titles/H1>"时最大匹配可 匹配整个串,最小匹配匹配“ (.):匹配括号内的任意正则表达式,常用于分组
Python正则表达式 • 支持“.”, ”*”, ”+”, ”?”, ”|”, “[ ]”,”\” • “^” :匹配串的开始 • “$”:匹配到串尾 • {m}:m个重复 • {m,n}:m到n个重复 • *?, +?, ?? ,{m,n}? :在第一个符号的意义上,改 贪婪的最大匹配为最小匹配 • 例:用正则表达式匹配“title”时最大匹配可 匹配整个串,最小匹配匹配““ • (...) :匹配括号内的任意正则表达式,常用于分组
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 《编译原理》课程教学资源:第二章(2-3-1)对于词法分析器的要求.ppt
- 《编译原理》课程教学资源:第二章 词法分析 2.6 利用Lex自动生成扫描程序.ppt
- 《编译原理》课程教学资源:第二章 语言描述与实现 Language Description and Implementation 2.1 程序语言的语法描述.ppt
- 《编译原理》课程教学资源:第一章(1-2)编译简介.ppt
- 《编译原理》课程教学资源:语义分析和中间代码产生.ppt
- 《编译原理》课程教学资源:Chapter 5 Procedure Activations.ppt
- 《互联网软件应用与开发》综合复习材料.doc
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第四章 门电路.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第六章 时序逻辑电路.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第五章 组合逻辑电路.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第二章 半导体基本器件.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第三章 开关理论基础.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第七章 存储器和可编程逻辑器件.ppt
- 《计算机电路基础》课程教学资源(PPT课件讲稿)第一章 课程简介.ppt
- 清华大学:《数据通信原理》课程教学资源(学习讲义)软件无线电体系结构的新趋势.doc
- 清华大学:《数据通信原理》课程教学资源(学习讲义)超宽带无线通信技术及发展.doc
- 清华大学:《数据通信原理》课程教学资源(学习讲义)第12讲 无线系统及网络.ppt
- 清华大学:《无线通信工程》第10讲 抗衰落.ppt
- 清华大学:《无线通信工程》第09讲 多址2.ppt
- 清华大学:《无线通信工程》第08讲 多址1.ppt
- 《JavaScript》权威指南简介.ppt
- 《编译原理》课程教学资源:第二章(2-4)语法分析一自上而下分析.ppt
- 《编译原理》课程教学资源:第二章(2-4-1)One parse tree only.ppt
- 《编译原理》课程教学资源:第五章 YACC.ppt
- 《编译原理》课程教学资源:第四章 对象和环境.ppt
- 《编译原理》课程教学资源:第八章 符号表.ppt
- 《编译原理》课程教学资源:第六章 属性文法.ppt
- 《编译原理》课程教学资源:第五章(5-2)过程激活.ppt
- 《编译原理》课程教学资源:第二章 语言描述与实现 Language Description and Implementation 2.5 语法分析——自下而上分析.ppt
- 《编译原理》课程教学资源:教学计划.doc
- 《编译原理》课程教学资源:第十章 优化.ppt
- 《编译原理》课程教学资源:属性文法.ppt
- 《体系结构》第二章 计算机指令集结构设计.doc
- 《体系结构》第三章 流水线技术.doc
- 《体系结构》第五章 存储层次.doc
- 《体系结构》第六章 输入输出系统.doc
- 《体系结构》第一章 计算机体系结构的基本概念.doc
- USB系统研究(学位论文)USB System Study.pdf
- 《微型计算机原理与接口技术》第10章 串行通信接口.ppt
- 《微型计算机原理与接口技术》第11章 人机交互接口技术.ppt