上海交通大学:《编译原理》教学资源_教学资料_第二周讲义_lex

Lex Overview Usage Paradigm of Lex Lex is a tool for creating lexical analyzers. Lex source Lex program lex.yy.c Lexical analyzers fokenize input streams. Tokens are the terminals of a language. lex.yy.c C compiler a.out Regular expressions define tokens. input a.out tokens To Use Lex and Yacc Together Lex Internals Mechanism Lex source Yacc source (Lexical Rules)(Grammar Rules) Converts regular expressions into DFAs. DFAs are implemented as table driven state machines. lex.yy.c v.tab.c call Input yylex() yyparse() ◆Parsed Input return token lex.yy.c:What it produces Running Lex dofine YYTYPE unsignod char struct yyvork YYTYPE verify,advance;yycrank.{ To run lex on a source file,use the 88 0.0, 0,0, 6888 0,0, command: g事 lex source.I yycrankct-1. 0。 yycrank+-3, 8p3: This produces the file lex.yy.c which is the yycrank+0, veet1. yyvetop+5. C source for the lexical analyzer. unaigned char yymateh{ ·To compile this,use: 898182880t ,01,01,01 8:8: cc -o prog-0 lex.yy.c-lI 4++ 1
1 Lex Overview • Lex is a tool for creating lexical analyzers. • Lexical analyzers tokenize input streams. • Tokens are the terminals of a language. • Regular expressions define tokens . Usage Paradigm of Lex To Use Lex and Yacc Together Lex Yacc yylex() yyparse() Lex source (Lexical Rules) Yacc source (Grammar Rules) Input Parsed Input lex.yy.c y.tab.c call return token Lex Internals Mechanism • Converts regular expressions into DFAs. • DFAs are implemented as table driven state machines. lex.yy.c : What it produces Running Lex • To run lex on a source file, use the command: lex source.l • This produces the file lex.yy.c which is the C source for the lexical analyzer. • To compile this, use: cc -o prog -O lex.yy.c -ll

Versions and Reference Books General Format of Lex Source 第《 AT&T lex,GNU flex,and Win32 version lex yacc,2/e by John R.Levine,Tony Cr纱gop> ≤口n难2≤尼gXP》 Mason Doug Brown,O'Reilly 4年4 公常 Mastering Regular Expressions,by Jeffrey regp>Cact1on》 ≤王得艺xP7 <aGe1⊙12 E.F.Friedl,O'Reilly 年。4 艺落 User Subroutines (c code) Input specification file is in 3 parts Regular Policy of None- -Declarations:Definitions translated Source -Rules:Token Descriptions and actions Remember that Lex is turning the rules -Auxiliary Procedures:User-Written code into a program.Any source not intercepted Three parts are separated by % by Lex is copied into the generated Tips:The first part defines patterns, the third part defines actions,the program. second part puts together to express -Any line which is not part of a Lex rule or "If we see some pattern,then we do action which begins with a blank or tab some action”. -Anything included between lines containing only %and % -Anything after the second %delimiter Position of Copied Source Regular Policy of Translated Source source input prior to the first % Various variables or tables whose name -external to any function in the generated code prefixed by yy after the first %and prior to the second -yyleng.yysvec.yywork %% Various functions whose name prefixed by -appropriate place for declarations in the y function generated by Lex which contains the -yyless(,yymore(),yywarp(),yylex().. actions Various definition whose name are capital ·after the second%% -BEGIN,INITIAL... -after the Lex generated output 2
2 Versions and Reference Books • AT&T lex, GNU flex, and Win32 version • lex & yacc ,2/e by John R.Levine, Tony Mason & Doug Brown, O’Reilly • Mastering Regular Expressions, by Jeffrey E.F. Friedl, O’Reilly General Format of Lex Source • Input specification file is in 3 parts – Declarations: Definitions – Rules: Token Descriptions and actions – Auxiliary Procedures: User-Written code • Three parts are separated by %% • Tips: The first part defines patterns, the third part defines actions, the second part puts together to express “If we see some pattern, then we do some action”. Regular Policy of Nonetranslated Source • Remember that Lex is turning the rules into a program. Any source not intercepted by Lex is copied into the generated program. – Any line which is not part of a Lex rule or action which begins with a blank or tab – Anything included between lines containing only %{ and %} – Anything after the second %% delimiter Position of Copied Source • source input prior to the first %% – external to any function in the generated code • after the first %% and prior to the second %% – appropriate place for declarations in the function generated by Lex which contains the actions • after the second %% – after the Lex generated output Regular Policy of Translated Source • Various variables or tables whose name prefixed by yy – yyleng, yysvec[], yywork[] • Various functions whose name prefixed by yy – yyless(), yymore(), yywarp(), yylex()… • Various definition whose name are capital – BEGIN, INITIAL…

Default Rules and Actions Default Input and Output The first and second part must exist,but If you don't write your own main()to deal may be empty,the third part and the with the input and the output of yylex(),the second %are optional. default input of default main()is stdin and If the third part dose not contain a main(),- the default output of default main()is ll will link a default main()which calls stdout. yylex()then exits. Unmatched patterns will perform a default -stdin usually is to be keyboard input stdout usually is to be screen output action,which consists of copying the input -cs20:%./a.out inputfile>outputfile to the output Some Simple Lex Source A General Lex Source Example Examples A minimum lex program: %% It only copies the input to the output unchanged. Example lex source file A trivial program to deletes three spacing This first section contains necessary characters: *c declarations and includes %% *to use throughout the lex specifications. [\tin]: +/ Another trivial example: #include %% [t]+S: bin digit【o1] It deletes from the input all blanks or tabs at the ends of lines. (bin digit){ 各名 /*match all strings of 0's and 1's/ /, /Print out message with matching Now this is where you want your main *text program */ printf("BINARY:&s\n",yytext) int main(int argc,char "argv[])( / ([ab]*aa[ab]*bb[ab]*)I([ab]*bb[ab]*aa [ab]*){ call yylex to use the generated lexer /match all strings over */ (a,b)containing aa and bb yylex(); */ /* printf("AABB\n■): make sure everything was printed */ \n /ignore newlines fflush(yyout); exit(0); 3
3 Default Rules and Actions • The first and second part must exist, but may be empty, the third part and the second %% are optional. • If the third part dose not contain a main(), - ll will link a default main() which calls yylex() then exits. • Unmatched patterns will perform a default action, which consists of copying the input to the output Default Input and Output • If you don’t write your own main() to deal with the input and the output of yylex(), the default input of default main() is stdin and the default output of default main() is stdout. – stdin usually is to be keyboard input stdout usually is to be screen output – cs20: %./a.out outputfile Some Simple Lex Source Examples • A minimum lex program: %% It only copies the input to the output unchanged. • A trivial program to deletes three spacing characters: %% [ \t\n]; • Another trivial example: %% [ \t]+$; It deletes from the input all blanks or tabs at the ends of lines. A General Lex Source Example %{ /* * Example lex source file * This first section contains necessary * C declarations and includes * to use throughout the lex specifications. */ #include %} bin_digit [01] %% {bin_digit}* { /* match all strings of 0's and 1's */ /* Print out message with matching * text */ printf("BINARY: %s\n", yytext); } ([ab]*aa[ab]*bb[ab]*)|([ab]*bb[ab]*aa[ab]*) { /* match all strings over * (a,b) containing aa and bb */ printf("AABB\n"); } \n ; /* ignore newlines */ %% /* * Now this is where you want your main program */ int main(int argc, char *argv[]) { /* * call yylex to use the generated lexer */ yylex(); /* * make sure everything was printed */ fflush(yyout); exit(0); }

Token Definitions Elementary Operations(cont.) Extended Regular Expression -NOTE:.matches any character except the Elementary Operations newline -single characters -*-Kleene Closure ·except\.s^[】-?·+I()1{}%<> -+--Positive Closure -concatenation (put characters together) -alternation (abc) ·Examples: ·ab]=alb -0-9]+"[0-9+ ·a-k==ablc.jk note:without the quotes it could be any ·【a-z0-9j=any letter or digit character [a]==any character but a -t]+--is whitespace ·(except CR). There is a blank space character before the \t Special Characters: Special Characters (cont.) -matches any single character -A --means at the beginning of the line (except newline) (unless it is inside of a[) -"and \-quote the part as text -S means at the end of the line,same as An -It -tab -]--means anything except -In -newline I"[I"]"1"is a double quoted string -lb --backspace -(n,m)-m through n occurrences -" -double quote ·a(1,3}is a or aa or aaa -M -1 -{definition)-translation from definition ~ --this means the preceding was -/--matches only if followed by right part of optional ·ab?==aab 0/1 means the 0 of 01 but not 02 or 03 or... ·(ab)?=able -()-grouping Definitions The definitions can also contain variables and other declarations used by the Code ·NAME REG EXPR generated by Lex. -digs [0-9]+ These usually go at the start of this section, marked by %at the beginning and %at the end -integer fdigs) or the line which begins with a blank or tab. -plainreal (digs)"."Idigs) -Includes usually go here. -expreal (digs)"."(digs)[Ee]+-]?(digs) -It is usually convenient to maintain a line counter -real (plainrealKexpreal) so that error messages can be keyed to the lines in which the errors are found. NAME must be a valid C identifier ·% {NAME)is replaced by prior REG_EXPR int linecount=1; ·%) 4
4 Token Definitions ( Extended Regular Expression ) • Elementary Operations – single characters • except “ \ . $ ^ [ ] - ? * + | ( ) / { } % – concatenation (put characters together) – alternation (a|b|c) • [ab] == a|b • [a-k] == a|b|c|...|i|j|k • [a-z0-9] == any letter or digit • [^a] == any character but a • Elementary Operations (cont.) – NOTE: . matches any character except the newline – * -- Kleene Closure – + -- Positive Closure • Examples: – [0-9]+"."[0-9]+ • note: without the quotes it could be any character – [ \t]+ -- is whitespace • (except CR). • There is a blank space character before the \t • Special Characters: – . -- matches any single character (except newline) – “ and \ -- quote the part as text – \t -- tab – \n -- newline – \b -- backspace – \" -- double quote – \\ -- \ – ? -- this means the preceding was optional • ab? == a|ab • (ab)? == ab|ε • Special Characters (cont.) – ^ -- means at the beginning of the line (unless it is inside of a [ ]) – $ means at the end of the line, same as /\n – [^ ] -- means anything except • \"[^\"]*\" is a double quoted string – {n,m} – m through n occurrences • a{1,3} is a or aa or aaa – {definition} – translation from definition – / -- matches only if followed by right part of / • 0/1 means the 0 of 01 but not 02 or 03 or … – ( ) -- grouping Definitions • NAME REG_EXPR – digs [0-9]+ – integer {digs} – plainreal {digs}"."{digs} – expreal {digs}"."{digs}[Ee][+-]?{digs} – real {plainreal}|{expreal} • NAME must be a valid C identifier • {NAME} is replaced by prior REG_EXPR • The definitions can also contain variables and other declarations used by the Code generated by Lex. – These usually go at the start of this section, marked by %{ at the beginning and %} at the end or the line which begins with a blank or tab . – Includes usually go here. – It is usually convenient to maintain a line counter so that error messages can be keyed to the lines in which the errors are found. • %{ • int linecount = 1; • %}

Transition Rules Tokens and Actions ERE {program statement ·Example: program -(real) return FLOAT: statement -begin retum BEGIN; A null statement:will ignore the input -(newline)linecount++; Four special options: I ECHO:REJECT:BEGIN: -(integer){ printf("I found an integerin"): The unmatched token is using a default action ·return INTEGER: that ECHO from the input to the output ·} indicates that the action for this rule is from the action for the next rule Ambiguous Source Rules Multiple States lex allows the user to explicitly declare If 2 rules match the same pattern,Lex will multiple states in Definitions section use the first rule. %s COMMENT Lex always chooses the longest matching Default states is INITIAL or 0 substring for its tokens. Transition rules can be classified into To overide the choice,use action REJECT different states,which will be match ex: she (s++;REJECT;} depend on states he (h++;REJECT;} BEGIN is used to change state In; . ECHO: "/" {BEGIN COMMENT:) . {:} / (BEGIN INITIAL:) Lex Special Variables Lex library function calls identifiers used by Lex and Yacc begin with yylex( yy default main()contains a return yylex(: -yytext--a string containing the lexeme ·yywarp0 yyleng--the length of the lexeme -called by lexical analyzer if end of the input file -yyin-the input stream pointer -default yywarp()always retum 1 ·Example: ·yyless(n) -(integer){ printf("I found an integerin"): -n characters in yytext are retained sscanf(yytext"%d".&yylval): ·yymore0 return INTEGER: the next input expression recognized is to be tacked on to the end of this input -C++Comments-∥.. 5
5 Transition Rules • ERE { program statement program statement } • A null statement ; will ignore the input • Four special options: | ECHO; REJECT; BEGIN; • The unmatched token is using a default action that ECHO from the input to the output • | indicates that the action for this rule is from the action for the next rule Tokens and Actions • Example: – {real} return FLOAT; – begin return BEGIN; – {newline} linecount++; – {integer} { • printf("I found an integer\n"); • return INTEGER; • } Ambiguous Source Rules • If 2 rules match the same pattern, Lex will use the first rule. • Lex always chooses the longest matching substring for its tokens. • To overide the choice, use action REJECT ex: she {s++; REJECT;} he {h++; REJECT;} . | \n ; Multiple States • lex allows the user to explicitly declare multiple states ( in Definitions section ) %s COMMENT • Default states is INITIAL or 0 • Transition rules can be classified into different states, which will be match depend on states • BEGIN is used to change state Lex Special Variables • identifiers used by Lex and Yacc begin with yy – yytext -- a string containing the lexeme – yyleng -- the length of the lexeme – yyin – the input stream pointer • Example: – {integer} { • printf("I found an integer\n"); • sscanf(yytext,"%d", &yylval); • return INTEGER; • } – C++ Comments -- // ..... • //.* ; Lex library function calls • yylex() – default main() contains a return yylex(); • yywarp() – called by lexical analyzer if end of the input file – default yywarp() always return 1 • yyless(n) – n characters in yytext are retained • yymore() – the next input expression recognized is to be tacked on to the end of this input

User Written Code More Example 1 int lengs[100]: %% The actions associated with any given [a-z]+lengs[yyleng]++: token are normally specified using .1 statements in C.But occasionally the In; actions are complicated enough that it is %% better to describe them with a function call. yywrap() and define the function elsewhere. int i; Definitions of this sort go in the last section printf("Length No.wordsin"); of the Lex input. for(0=0:i0) printf("%5d%10dn",i,lengs[i]);return(1); More Example 2 Using yacc with lex int charCount=0,vordCount-0,lineCount-0; yacc will call yylex(to get the token from the input so that each lex rule should end word tin)+ with: (word) {wordcount++;charcount yyleng; return(token); n {charCount;lineCount;} where the appropriate token value is {charCount++; returned. main(){ An easy way is placing the line: yylex(): printf("Characters:%d Words:%d Lines:Xdin", #include "lex.yy.c" charCount,wordCount,lineCount);) in the last section of yacc input. 6
6 User Written Code • The actions associated with any given token are normally specified using statements in C. But occasionally the actions are complicated enough that it is better to describe them with a function call, and define the function elsewhere. • Definitions of this sort go in the last section of the Lex input. More Example 1 int lengs[100]; %% [a-z]+ lengs[yyleng]++; . | \n ; %% yywrap() { int i; printf("Length No. words\n"); for(i=0; i 0) printf("%5d%10d\n",i,lengs[i]); return(1); } More Example 2 Using yacc with lex • yacc will call yylex() to get the token from the input so that each lex rule should end with: return(token); where the appropriate token value is returned. • An easy way is placing the line: #include “lex.yy.c” in the last section of yacc input
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
- 上海交通大学:《编译原理》教学资源_教学资料_第二周讲义_Syntax Analyzer.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第二周讲义_Lexical Analyzer.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第九周讲义_Machine-Independent Optimizations Ⅳ.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第九周讲义_CS308 Compiler Theor.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第九周讲义_CS308 Compiler Theor.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第三周讲义_Bottom-Up Parsing.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第三周讲义_Top-Down Parsing.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第七周讲义_Code Generation Ⅱ.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第一周讲义_A Simple Syntax-Directed Translator.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第一周讲义_Introduction to Compilin.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第一周讲义_Parameter Passing Mechanisms.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第一周讲义_CS308 Compiler Theory.pdf
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT15 并发与多线程.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT14 Python GUI工具包:Tkinter.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT13 算法设计分析.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT12 面向对象设计.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT11 数据集合体.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT10 类的定义.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT09 模拟设计.ppt
- 上海交通大学:《程序设计思想与方法》课程教学资源(PPT课件讲稿)CT08 控制结构——循环语句.ppt
- 上海交通大学:《编译原理》教学资源_教学资料_第五周讲义_Type Checking.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第八周讲义_Machine-Independent Optimizations Ⅰ.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第八周讲义_Machine-Independent Optimizations Ⅱ.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第八周讲义_Machine-Independent Optimizations Ⅲ.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第六周讲义_Intermediate Code Generation.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第六周讲义_Heap Management.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第六周讲义_Run-Time Environments.pdf
- 上海交通大学:《编译原理》教学资源_教学资料_第四周讲义_Syntax-Directed Translation.pdf
- 上海交通大学:《计算机辅助设计》教学资源_Product Visualization.doc
- 上海交通大学:《科学计算》课程教学资源(英文讲义)Lecture Note 1:Introduction, calculus review and computer representation of numbers.pdf
- 上海交通大学:《科学计算》课程教学资源(英文讲义)Lecture Note 2:Solution of nonlinear equations.pdf
- 上海交通大学:《科学计算》课程教学资源(英文讲义)Lecture Note 3:Polynomial Interpolation.pdf
- 上海交通大学:《科学计算》课程教学资源(英文讲义)Lecture Note 4:Numerical differentiation and integration.pdf
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第一次上机_第一次上机内容_10.18.doc
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第一次上机_第一次上机内容_10.18.pdf
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第三次上机_python第三次上机.doc
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第三次上机_Python第三次上机解析-update.doc
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第五次上机_第五次上机.doc
- 上海交通大学:《程序设计思想与方法》课程教学资源(上机课)第四次上机_Python第四次上机题目.doc
- 上海交通大学:《程序设计思想与方法》课程教学资源_作业_第一次作业内容要求.doc