Lexical Analyzer in C and C++

Here you will get program to implement lexical analyzer in C and C++.

Compiler is responsible for converting high level language in machine language. There are several phases involved in this and lexical analysis is the first phase.

Lexical analyzer reads the characters from source code and convert it into tokens.

Lexical Analyzer in C and C++

Different tokens or lexemes are:

  • Keywords
  • Identifiers
  • Operators
  • Constants

Take below example.

c = a + b;

After lexical analysis a symbol table is generated as given below.

Token Type
c identifier
= operator
a identifier
+ operator
b identifier
; separator

Now below I have given implementation of very simple lexical analyzer which reads source code from file and then generate tokens.

Program for Lexical Analyzer in C

Program for Lexical Analyzer in C++


Lexical Analyzer in C and C++

The source code present in file is shown in above image.

Comment below if you have any queries regarding above program for lexical analyzer in C and C++.


34 thoughts on “Lexical Analyzer in C and C++

      1. Areeb Ahmed

        In Turbo c++when i run this program in c++ language it give ma an error like that
        ‘is_open’ is not a member of ifstream

      2. karthick

        While running this program its wont read d contents my file name is 5a.c what can i do??

      3. Awesome Code Maker

        At the top of the code, in the part of including libraries, has an error, it is written: #include and, in fact, it is: #include

      4. Ajay Simon

        build an text file in the name program.txt with a sample c/c++ programme and compile

    1. Rajesh

      lexical Analyzer is mainly used for identifying each and every elements of a program
      A file is created in order to check whether the given lexeme is an identifier,keyword or constant.
      a function is used to check all the 32 keywords.

  1. Midhat Aamir

    The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.

    1. eagle-i

      Open Command Prompt, go to program location and then run the executable.

    2. Mustapha

      Use this code
      At the very end of the code. Just above the int main() coily bracket.

      Int main() {
      //ur own code
      System(“pause”) ;


    3. Rinnie

      include —–
      getch() at the end of the program in c language

  2. Singapore Karthi

    The code is running perfectly with no errors, but the output screen does not hold. It vanishes immediately. What should I do please tell me. I’ll be very grateful to you.
    Compiler is responsible for converting high level language in machine language. There are several phases involved in this and lexical analysis is the first phase.std;

    int isKeyword(char buffer[]){
    char keywords[32][10] = {“auto”,”break”,”case”,”char”,”const”,”continue”,”default”,

    hola hola hola hooooooooooooo!!!!!!!!!!!!!!!!!!!!!!!!!

  3. anvita sharma

    if the input is an identifier ‘abc3’, it shows abc is an identifier and 3 is a digit.. please put a program accordingly to identify digits in identifiers and delimiters

  4. Shubh

    in c program write 3 line code to hold the output
    #include //after #include
    getch(); // last line inside int main()
    clrscr(); // next line just after getch();

  5. roshini

    thank you it was very usefull
    but next plz add algorithm to understand code better for non programs
    to work easier .
    happy day !!!!!

    1. sadia mushtaq

      using System;
      using System.Collections.Generic;
      using System.Linq;
      using System.Text;
      using System.Text.RegularExpressions;
      namespace ConsoleApplication6
      {class Program
      {static void Main(string[] args)
      {string opr = “”;
      string num = “”;
      string type = “”;
      string iden = “”;
      String pattern = @”([-+*>/=<])";
      string path1 = @"E:input.txt";
      string exp = System.IO.File.ReadAllText(path1);
      Regex delimiterChars = new Regex(@"[_{;(,.}:)]");
      Regex comments1 = new Regex(@"//.*?n");//single line
      Regex comments2 = new Regex(@"/*(.|n)*?*/");//multiple line
      string afterremoval = comments1.Replace(exp, "");
      afterremoval = comments2.Replace(afterremoval, "");
      string removal = delimiterChars.Replace(afterremoval, " ");
      Regex headerfile = new Regex(@"using.*?n");
      Match header = headerfile.Match(removal);
      Console.WriteLine("headerfile:{0}", header);
      string headerremoval = headerfile.Replace(removal, "");
      string[] elements =headerremoval.Split(' ');
      foreach (var a in elements)
      {if (a.Contains("int") || a.Contains("char") || a.Contains("string") ||
      a.Contains("double") || a.Contains("bool") || a.Contains("void") || a.Contains("return") || a.Contains("Main") || a.Contains("static"))
      {type = a;
      Console.WriteLine("Keyword:{0}", type);}
      {String[] element = Regex.Split(a, pattern);
      foreach (var c in element)
      {if (c.Contains("+") || c.Contains("-") || c.Contains("*") || c.Contains("/") || c.Contains("”) || c.Contains(“=”))
      {opr = c;
      Console.WriteLine(“Operator:{0}”, opr);}
      else if (c.Contains(“1”) || c.Contains(“2”) || c.Contains(“3”) || c.Contains(“4”) || c.Contains(“5”)
      || c.Contains(“6”) || c.Contains(“7”) || c.Contains(“8”) || c.Contains(“9”) || c.Contains(“0”))
      {num = c;
      Console.WriteLine(“Numeric value:{0}”, num);
      else if (c.Contains(“a”) || c.Contains(“b”) || c.Contains(“c”) || c.Contains(“d”) || c.Contains(“e”)
      || c.Contains(“f”) || c.Contains(“g”) || c.Contains(“h”) || c.Contains(“i”) || c.Contains(“j”)
      || c.Contains(“k”) || c.Contains(“l”) || c.Contains(“m”) || c.Contains(“n”) || c.Contains(“o”)
      || c.Contains(“p”) || c.Contains(“q”) || c.Contains(“r”) || c.Contains(“s”) || c.Contains(“t”)
      || c.Contains(“u”) || c.Contains(“v”) || c.Contains(“w”) || c.Contains(“x”) || c.Contains(“y”)
      || c.Contains(“z”) || c.Contains(“A”) || c.Contains(“B”) || c.Contains(“C”) || c.Contains(“D”)
      || c.Contains(“E”) || c.Contains(“F”) || c.Contains(“G”) || c.Contains(“H”) || c.Contains(“I”)
      || c.Contains(“J”) || c.Contains(“K”) || c.Contains(“L”) || c.Contains(“M”) || c.Contains(“N”)
      || c.Contains(“O”) || c.Contains(“P”) || c.Contains(“Q”) || c.Contains(“R”) || c.Contains(“S”)
      || c.Contains(“T”) || c.Contains(“U”) || c.Contains(“V”) || c.Contains(“W”) || c.Contains(“X”)
      || c.Contains(“Y”) || c.Contains(“Z”))
      {iden = c;
      Console.WriteLine(“Identifier:{0}”, iden);}}}}

  6. Emilian

    Hello! i’ve got a problem in CodeBlocks , when i run a code in order to navigate throught the stack , the compiler is flashing and deos not returns nothing. I read on the some discussions that i might have problem with the compiler , not the cod. I whant to mention that i don’t have any errors and a few warnings. Can anyone give some advice?

    1. Ankit

      use getch(); or system(“PAUSE”) at end of the code just above return 0;

  7. Mandvi Singh

    It gives operator. But no result for identifier or keywords

  8. Disu

    The code works perfectly.

    I would appreciate help in getting the code to perform a Syntax Analysis on the output of this lexical analysis. I’ve got a project to use YACC tool for it.

    Would sincerely appreciate any help.

  9. sidra

    in turbo c there is an error on while loop. the error is “while statement is missing”. plzzzz tell me the solution

  10. Edo

    the programm is running but with message “error while opening the file”. can you fix that ?


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.