Lexical Analyzer in C and C++

Here you will get the program to implement lexical analyzer in C and C++.

The compiler is responsible for converting high-level language into machine language. There are several phases involved in this and lexical analysis is the first phase.

A lexical analyzer reads the characters from the source code and converts them into tokens.

Lexical Analyzer in C and C++

Different tokens or lexemes are:

  • Keywords
  • Identifiers
  • Operators
  • Constants

Take the below example.

c = a + b;

After lexical analysis, a symbol table is generated as given below.

TokenType
cidentifier
=operator
aidentifier
+operator
bidentifier
;separator

Now below I have given the implementation of a very simple lexical analyzer that reads source code from a file and then generates tokens.

Program for Lexical Analyzer in C

#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>

int isKeyword(char buffer[]){
	char keywords[32][10] = {"auto","break","case","char","const","continue","default",
							"do","double","else","enum","extern","float","for","goto",
							"if","int","long","register","return","short","signed",
							"sizeof","static","struct","switch","typedef","union",
							"unsigned","void","volatile","while"};
	int i, flag = 0;
	
	for(i = 0; i < 32; ++i){
		if(strcmp(keywords[i], buffer) == 0){
			flag = 1;
			break;
		}
	}
	
	return flag;
}

int main(){
	char ch, buffer[15], operators[] = "+-*/%=";
	FILE *fp;
	int i,j=0;
	
	fp = fopen("program.txt","r");
	
	if(fp == NULL){
		printf("error while opening the file\n");
		exit(0);
	}
	
	while((ch = fgetc(fp)) != EOF){
   		for(i = 0; i < 6; ++i){
   			if(ch == operators[i])
   				printf("%c is operator\n", ch);
   		}
   		
   		if(isalnum(ch)){
   			buffer[j++] = ch;
   		}
   		else if((ch == ' ' || ch == '\n') && (j != 0)){
   				buffer[j] = '\0';
   				j = 0;
   				   				
   				if(isKeyword(buffer) == 1)
   					printf("%s is keyword\n", buffer);
   				else
   					printf("%s is indentifier\n", buffer);
   		}
   		
	}
	
	fclose(fp);
	
	return 0;
}

Program for Lexical Analyzer in C++

#include<iostream>
#include<fstream>
#include<stdlib.h>
#include<string.h>
#include<ctype.h>

using namespace std;

int isKeyword(char buffer[]){
	char keywords[32][10] = {"auto","break","case","char","const","continue","default",
							"do","double","else","enum","extern","float","for","goto",
							"if","int","long","register","return","short","signed",
							"sizeof","static","struct","switch","typedef","union",
							"unsigned","void","volatile","while"};
	int i, flag = 0;
	
	for(i = 0; i < 32; ++i){
		if(strcmp(keywords[i], buffer) == 0){
			flag = 1;
			break;
		}
	}
	
	return flag;
}

int main(){
	char ch, buffer[15], operators[] = "+-*/%=";
	ifstream fin("program.txt");
	int i,j=0;
	
	if(!fin.is_open()){
		cout<<"error while opening the file\n";
		exit(0);
	}
	
	while(!fin.eof()){
   		ch = fin.get();
   		
		for(i = 0; i < 6; ++i){
   			if(ch == operators[i])
   				cout<<ch<<" is operator\n";
   		}
   		
   		if(isalnum(ch)){
   			buffer[j++] = ch;
   		}
   		else if((ch == ' ' || ch == '\n') && (j != 0)){
   				buffer[j] = '\0';
   				j = 0;
   				   				
   				if(isKeyword(buffer) == 1)
   					cout<<buffer<<" is keyword\n";
   				else
   					cout<<buffer<<" is indentifier\n";
   		}
   		
	}
	
	fin.close();
	
	return 0;
}

Output

Lexical Analyzer in C and C++

The source code present in the file is shown in the above image.

The below video explains how a lexical analyzer works.

Comment below if you have any queries regarding the above program for lexical analyzer in C and C++.

63 thoughts on “Lexical Analyzer in C and C++”

      1. In Turbo c++when i run this program in c++ language it give ma an error like that
        ‘is_open’ is not a member of ifstream

      2. Awesome Code Maker

        At the top of the code, in the part of including libraries, has an error, it is written: #include and, in fact, it is: #include

      3. before u run the program,u should create a file which saved in the name program.txt.
        then u will get the o/p

        1. I created a file with same name and code. Can you please tell where to save this file in order for this code to execute properly.

    1. lexical Analyzer is mainly used for identifying each and every elements of a program
      A file is created in order to check whether the given lexeme is an identifier,keyword or constant.
      a function is used to check all the 32 keywords.

      1. before u run the program,u should create a file which saved in the name program.txt.
        then u will get the o/p

        1. I created a file and named it the program.txt yet I receive the same error ” Error while opening the file”

  1. The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.

    1. Use this code
      System(“pause”);
      At the very end of the code. Just above the int main() coily bracket.

      Int main() {
      //ur own code
      System(“pause”) ;

      }

  2. Singapore Karthi

    The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.
    Compiler is responsible for converting high level language in machine language. There are several phases involved in this and lexical analysis is the first phase.std;

    int isKeyword(char buffer[]){
    char keywords[32][10] = {“auto”,”break”,”case”,”char”,”const”,”continue”,”default”,
    “do”,”double”,”else”,”enum”,”extern”,”float”,”for”,”goto”,
    “if”,”int”,”long”,”register”,”return”,”short”,”signed”,
    “sizeof”,”static”,”struct”,”switch”,”typedef”,”union”,
    “unsigned”,”void”,”vola

    hola hola hola hooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!

  3. if the input is an identifier ‘abc3’, it shows abc is an identifier and 3 is a digit.. please put a program accordingly to identify digits in identifiers and delimiters

  4. in c program write 3 line code to hold the output
    #include //after #include
    getch(); // last line inside int main()
    clrscr(); // next line just after getch();

  5. thank you it was very usefull
    but next plz add algorithm to understand code better for non programs
    to work easier .
    happy day !!!!!

    1. using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Text.RegularExpressions;
      namespace ConsoleApplication6
      {class Program
      {static void Main(string[] args)
      {string opr = “”;
      string num = “”;
      string type = “”;
      string iden = “”;
      String pattern = @”([-+*>/=<])";
      string path1 = @"E:input.txt";
      string exp = System.IO.File.ReadAllText(path1);
      Regex delimiterChars = new Regex(@"[_{;(,.}:)]");
      Regex comments1 = new Regex(@"//.*?n");//single line
      Regex comments2 = new Regex(@"/*(.|n)*?*/");//multiple line
      string afterremoval = comments1.Replace(exp, "");
      afterremoval = comments2.Replace(afterremoval, "");
      string removal = delimiterChars.Replace(afterremoval, " ");
      Regex headerfile = new Regex(@"using.*?n");
      Match header = headerfile.Match(removal);
      Console.WriteLine("headerfile:{0}", header);
      string headerremoval = headerfile.Replace(removal, "");
      string[] elements =headerremoval.Split(' ');
      foreach (var a in elements)
      {if (a.Contains("int") || a.Contains("char") || a.Contains("string") ||
      a.Contains("double") || a.Contains("bool") || a.Contains("void") || a.Contains("return") || a.Contains("Main") || a.Contains("static"))
      {type = a;
      Console.WriteLine("Keyword:{0}", type);}
      else
      {String[] element = Regex.Split(a, pattern);
      foreach (var c in element)
      {if (c.Contains("+") || c.Contains("-") || c.Contains("*") || c.Contains("/") || c.Contains("”) || c.Contains(“=”))
      {opr = c;
      Console.WriteLine(“Operator:{0}”, opr);}
      else if (c.Contains(“1”) || c.Contains(“2”) || c.Contains(“3”) || c.Contains(“4”) || c.Contains(“5”)
      || c.Contains(“6”) || c.Contains(“7”) || c.Contains(“8”) || c.Contains(“9”) || c.Contains(“0”))
      {num = c;
      Console.WriteLine(“Numeric value:{0}”, num);
      }
      else if (c.Contains(“a”) || c.Contains(“b”) || c.Contains(“c”) || c.Contains(“d”) || c.Contains(“e”)
      || c.Contains(“f”) || c.Contains(“g”) || c.Contains(“h”) || c.Contains(“i”) || c.Contains(“j”)
      || c.Contains(“k”) || c.Contains(“l”) || c.Contains(“m”) || c.Contains(“n”) || c.Contains(“o”)
      || c.Contains(“p”) || c.Contains(“q”) || c.Contains(“r”) || c.Contains(“s”) || c.Contains(“t”)
      || c.Contains(“u”) || c.Contains(“v”) || c.Contains(“w”) || c.Contains(“x”) || c.Contains(“y”)
      || c.Contains(“z”) || c.Contains(“A”) || c.Contains(“B”) || c.Contains(“C”) || c.Contains(“D”)
      || c.Contains(“E”) || c.Contains(“F”) || c.Contains(“G”) || c.Contains(“H”) || c.Contains(“I”)
      || c.Contains(“J”) || c.Contains(“K”) || c.Contains(“L”) || c.Contains(“M”) || c.Contains(“N”)
      || c.Contains(“O”) || c.Contains(“P”) || c.Contains(“Q”) || c.Contains(“R”) || c.Contains(“S”)
      || c.Contains(“T”) || c.Contains(“U”) || c.Contains(“V”) || c.Contains(“W”) || c.Contains(“X”)
      || c.Contains(“Y”) || c.Contains(“Z”))
      {iden = c;
      Console.WriteLine(“Identifier:{0}”, iden);}}}}
      Console.ReadLine();}}}

      1. Hello Sadia, I’m student of UMT, Lhr. I’ve an assignment for the compiler construction using regex. Your code snippet is useful for me and I have a few questions. Will you please be so kind to help me?

  6. Hello! i’ve got a problem in CodeBlocks , when i run a code in order to navigate throught the stack , the compiler is flashing and deos not returns nothing. I read on the some discussions that i might have problem with the compiler , not the cod. I whant to mention that i don’t have any errors and a few warnings. Can anyone give some advice?
    Thanks.

  7. The code works perfectly.

    I would appreciate help in getting the code to perform a Syntax Analysis on the output of this lexical analysis. I’ve got a project to use YACC tool for it.

    Would sincerely appreciate any help.

  8. in turbo c there is an error on while loop. the error is “while statement is missing”. plzzzz tell me the solution

  9. it shows all the input as an identifier rather than keywords and operators, and i want to show digits and identifiers separately. what i need to do please help

  10. This program showing all the inputs as an identifier rather than keywords and operators, and i want to show digits and identifiers separately. what i need to do please help.

  11. else if((ch == ‘ ‘ || ch == ‘\n’) && (j != 0)){
    buffer[j] = ‘\0’;
    j = 0;

    I don’t understand this part of the code . can i anyone help me to understand

  12. Thank you for your great explanation, I have a question :
    in the example below, how can I recognize the for loop that start from line number 5 and end in line number 8
    and the same happen to while loop, I only need to recognize each loop after that the output of my tool is a new text file with the same code but before each loop a comment that show the type of the loop and other detail.

    #include
    using namespace std;
    int main()
    {
    for (int a=0;a<=10;a++)
    {
    cout<<"hello"<<endl;
    }
    int a = 10;
    while( a < 20 ) {
    cout << "value of a: " << a << endl;
    a++;
    }
    return 0;
    }

  13. return_type main()
    {
    switch (varible)
    begin
    case num :
    var= id/num
    printf(var);
    break;
    end
    return 0;
    }
    create code for this in c language lexical code

  14. after compilation of the c++ program it says, there is an error while opening the file. Can anybody help me with this

  15. Assume you have the following C++ source program to be analyzed by C++ compiler
    lexical analyzer. Identify and list down all possible tokens and their lexemes.
    #include
    int Avg(int list[], int n)
    ( int s=0;
    for(int i=0;i<n;i++)
    s+=i;
    return float(s)/n;
    )
    int main()
    { int x[]={7,6,3,1};
    cout<<”the average of the numbers is ”<<Avg(x,4);
    return 0;
    }
    Use the following table to write your answers.
    Token
    Matching

Leave a Comment

Your email address will not be published. Required fields are marked *