Lexical Analyzer in C and C++

Here you will get program to implement lexical analyzer in C and C++.

Compiler is responsible for converting high level language in machine language. There are several phases involved in this and lexical analysis is the first phase.

Lexical analyzer reads the characters from source code and convert it into tokens.

Lexical Analyzer in C and C++

Different tokens or lexemes are:

  • Keywords
  • Identifiers
  • Operators
  • Constants

Take below example.

c = a + b;

After lexical analysis a symbol table is generated as given below.

Token Type
c identifier
= operator
a identifier
+ operator
b identifier
; separator

Now below I have given implementation of very simple lexical analyzer which reads source code from file and then generate tokens.

Program for Lexical Analyzer in C

Program for Lexical Analyzer in C++

Output

Lexical Analyzer in C and C++

The source code present in file is shown in above image.

Comment below if you have any queries regarding above program for lexical analyzer in C and C++.

 

50 thoughts on “Lexical Analyzer in C and C++

      1. Areeb Ahmed

        In Turbo c++when i run this program in c++ language it give ma an error like that
        ‘is_open’ is not a member of ifstream

        Reply
      2. karthick

        While running this program its wont read d contents my file name is 5a.c what can i do??

        Reply
      3. Awesome Code Maker

        At the top of the code, in the part of including libraries, has an error, it is written: #include and, in fact, it is: #include

        Reply
      4. Ajay Simon

        build an text file in the name program.txt with a sample c/c++ programme and compile

        Reply
      5. Ancy Michael

        before u run the program,u should create a file which saved in the name program.txt.
        then u will get the o/p

        Reply
    1. Rajesh

      lexical Analyzer is mainly used for identifying each and every elements of a program
      A file is created in order to check whether the given lexeme is an identifier,keyword or constant.
      a function is used to check all the 32 keywords.

      Reply
      1. Ancy Michael

        before u run the program,u should create a file which saved in the name program.txt.
        then u will get the o/p

        Reply
    2. Yash

      Me also trouble to understand that, so if understand then explain me plzz

      Reply
  1. Midhat Aamir

    The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.

    Reply
    1. eagle-i

      Open Command Prompt, go to program location and then run the executable.

      Reply
    2. Mustapha

      Use this code
      System(“pause”);
      At the very end of the code. Just above the int main() coily bracket.

      Int main() {
      //ur own code
      System(“pause”) ;

      }

      Reply
    3. Rinnie

      include —–
      getch() at the end of the program in c language

      Reply
  2. Singapore Karthi

    The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.
    Compiler is responsible for converting high level language in machine language. There are several phases involved in this and lexical analysis is the first phase.std;

    int isKeyword(char buffer[]){
    char keywords[32][10] = {“auto”,”break”,”case”,”char”,”const”,”continue”,”default”,
    “do”,”double”,”else”,”enum”,”extern”,”float”,”for”,”goto”,
    “if”,”int”,”long”,”register”,”return”,”short”,”signed”,
    “sizeof”,”static”,”struct”,”switch”,”typedef”,”union”,
    “unsigned”,”void”,”vola

    hola hola hola hooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!

    Reply
  3. anvita sharma

    if the input is an identifier ‘abc3’, it shows abc is an identifier and 3 is a digit.. please put a program accordingly to identify digits in identifiers and delimiters

    Reply
  4. Shubh

    in c program write 3 line code to hold the output
    #include //after #include
    getch(); // last line inside int main()
    clrscr(); // next line just after getch();

    Reply
  5. roshini

    thank you it was very usefull
    but next plz add algorithm to understand code better for non programs
    to work easier .
    happy day !!!!!

    Reply
    1. sadia mushtaq

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Text.RegularExpressions;
      namespace ConsoleApplication6
      {class Program
      {static void Main(string[] args)
      {string opr = “”;
      string num = “”;
      string type = “”;
      string iden = “”;
      String pattern = @”([-+*>/=<])";
      string path1 = @"E:input.txt";
      string exp = System.IO.File.ReadAllText(path1);
      Regex delimiterChars = new Regex(@"[_{;(,.}:)]");
      Regex comments1 = new Regex(@"//.*?n");//single line
      Regex comments2 = new Regex(@"/*(.|n)*?*/");//multiple line
      string afterremoval = comments1.Replace(exp, "");
      afterremoval = comments2.Replace(afterremoval, "");
      string removal = delimiterChars.Replace(afterremoval, " ");
      Regex headerfile = new Regex(@"using.*?n");
      Match header = headerfile.Match(removal);
      Console.WriteLine("headerfile:{0}", header);
      string headerremoval = headerfile.Replace(removal, "");
      string[] elements =headerremoval.Split(' ');
      foreach (var a in elements)
      {if (a.Contains("int") || a.Contains("char") || a.Contains("string") ||
      a.Contains("double") || a.Contains("bool") || a.Contains("void") || a.Contains("return") || a.Contains("Main") || a.Contains("static"))
      {type = a;
      Console.WriteLine("Keyword:{0}", type);}
      else
      {String[] element = Regex.Split(a, pattern);
      foreach (var c in element)
      {if (c.Contains("+") || c.Contains("-") || c.Contains("*") || c.Contains("/") || c.Contains("”) || c.Contains(“=”))
      {opr = c;
      Console.WriteLine(“Operator:{0}”, opr);}
      else if (c.Contains(“1”) || c.Contains(“2”) || c.Contains(“3”) || c.Contains(“4”) || c.Contains(“5”)
      || c.Contains(“6”) || c.Contains(“7”) || c.Contains(“8”) || c.Contains(“9”) || c.Contains(“0”))
      {num = c;
      Console.WriteLine(“Numeric value:{0}”, num);
      }
      else if (c.Contains(“a”) || c.Contains(“b”) || c.Contains(“c”) || c.Contains(“d”) || c.Contains(“e”)
      || c.Contains(“f”) || c.Contains(“g”) || c.Contains(“h”) || c.Contains(“i”) || c.Contains(“j”)
      || c.Contains(“k”) || c.Contains(“l”) || c.Contains(“m”) || c.Contains(“n”) || c.Contains(“o”)
      || c.Contains(“p”) || c.Contains(“q”) || c.Contains(“r”) || c.Contains(“s”) || c.Contains(“t”)
      || c.Contains(“u”) || c.Contains(“v”) || c.Contains(“w”) || c.Contains(“x”) || c.Contains(“y”)
      || c.Contains(“z”) || c.Contains(“A”) || c.Contains(“B”) || c.Contains(“C”) || c.Contains(“D”)
      || c.Contains(“E”) || c.Contains(“F”) || c.Contains(“G”) || c.Contains(“H”) || c.Contains(“I”)
      || c.Contains(“J”) || c.Contains(“K”) || c.Contains(“L”) || c.Contains(“M”) || c.Contains(“N”)
      || c.Contains(“O”) || c.Contains(“P”) || c.Contains(“Q”) || c.Contains(“R”) || c.Contains(“S”)
      || c.Contains(“T”) || c.Contains(“U”) || c.Contains(“V”) || c.Contains(“W”) || c.Contains(“X”)
      || c.Contains(“Y”) || c.Contains(“Z”))
      {iden = c;
      Console.WriteLine(“Identifier:{0}”, iden);}}}}
      Console.ReadLine();}}}

      Reply
      1. Rehan

        Hello Sadia, I’m student of UMT, Lhr. I’ve an assignment for the compiler construction using regex. Your code snippet is useful for me and I have a few questions. Will you please be so kind to help me?

        Reply
  6. Emilian

    Hello! i’ve got a problem in CodeBlocks , when i run a code in order to navigate throught the stack , the compiler is flashing and deos not returns nothing. I read on the some discussions that i might have problem with the compiler , not the cod. I whant to mention that i don’t have any errors and a few warnings. Can anyone give some advice?
    Thanks.

    Reply
    1. Ankit

      use getch(); or system(“PAUSE”) at end of the code just above return 0;

      Reply
  7. Mandvi Singh

    It gives operator. But no result for identifier or keywords

    Reply
  8. Disu

    The code works perfectly.

    I would appreciate help in getting the code to perform a Syntax Analysis on the output of this lexical analysis. I’ve got a project to use YACC tool for it.

    Would sincerely appreciate any help.

    Reply
  9. sidra

    in turbo c there is an error on while loop. the error is “while statement is missing”. plzzzz tell me the solution

    Reply
  10. Edo

    the programm is running but with message “error while opening the file”. can you fix that ?

    Reply
  11. John Abraham

    This one is work perfectly !! I have checked this one in my lab !! Great job guys !!

    Reply
  12. Iqra

    it shows all the input as an identifier rather than keywords and operators, and i want to show digits and identifiers separately. what i need to do please help

    Reply
  13. Iqra

    This program showing all the inputs as an identifier rather than keywords and operators, and i want to show digits and identifiers separately. what i need to do please help.

    Reply
  14. kaustubh purohit

    if character is alphabet or number then it will return 1
    else return 0

    Reply
  15. Chaitanya

    showing “error while opening file”. I’m using codeblocks on macos.
    Please help.

    Reply
  16. kazi

    else if((ch == ‘ ‘ || ch == ‘\n’) && (j != 0)){
    buffer[j] = ‘\0’;
    j = 0;

    I don’t understand this part of the code . can i anyone help me to understand

    Reply
  17. Hisham

    Thank you for your great explanation, I have a question :
    in the example below, how can I recognize the for loop that start from line number 5 and end in line number 8
    and the same happen to while loop, I only need to recognize each loop after that the output of my tool is a new text file with the same code but before each loop a comment that show the type of the loop and other detail.

    #include
    using namespace std;
    int main()
    {
    for (int a=0;a<=10;a++)
    {
    cout<<"hello"<<endl;
    }
    int a = 10;
    while( a < 20 ) {
    cout << "value of a: " << a << endl;
    a++;
    }
    return 0;
    }

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *