An intuitive Token Parser that includes grammar definition, tokenization, parsing, syntax error and debugging. Implementation based on Lexical Analysis for Dart.

Overview

Pub.dev package GitHub repository

Token Parser

An intuitive Token Parser that includes syntax/grammar definition, tokenization and parsing.

Implementation based on Lexical Analysis.
Read more about it on Wikipedia, or with a Basic Diagram.

Features

  • Syntax/grammar definition
  • Tokenization
  • Parsing
  • Referencing, and self-reference
  • Lexical Syntax Error
  • Debugging

Getting Started

dart pub add token_parser

And import the package:

import 'package:token_parser/token_parser.dart';

Usage

This package is based on a syntax/grammar definition, which is a list of lexemes that define the grammar. Here is a brief example:

final whitespace = ' ' | '\t';
final lineBreak = '\n' | '\r';
final space = (whitespace | lineBreak).multiple;

final letter = '[a-zA-Z]'.regex;
final digit = '[0-9]'.regex;

final number = digit.multiple & ('.' & digit.multiple).optional;
final identifier = letter & (letter | digit).multiple.optional;

final grammar = Grammar(
  main: identifier & space & '=' & space & number,
  rules: {
    'whitespace': whitespace,
    'lineBreak': lineBreak,
    'space': space,

    'letter': letter,
    'digit': digit,

    'number': number,
    'identifier': identifier,
  }
);

void main() {
  final result = grammar.parse('myNumber = 12.3');

  print('Identifier: ${ result.get(lexeme: identifier).first.value }');
  print('Number: ${ result.get(lexeme: number).first.value }');
  // [Output]
  // Identifier: myNumber
  // Number: 12.3
}

Lexeme

A lexeme is a grammar definition that will be used to tokenize an input. It's a pattern that must be matched, essentially a grammar rule.

The syntax/grammar definition is done by defining what each token must have, using Lexical Analysis.

This composition of lexemes is what will define the grammar. Lexemes can contain other lexemes to form a more complex lexical grammar.

final abc = 'a' | 'b' | 'c';
final def = 'd' | 'e' | 'f';

final expression = abc & def;

Using the & operator to combine tokens with an "and" operation, and the | operator to combine tokens with an "or" operation. We can define an expression that can take any combination of the lexemes abc and def.

Lexemes may be extended to have slightly different properties.

final abc = ('a' | 'b' | 'c').multiple;

final expression = abc & 'd'.optional;

For convenience, a lexeme can be defined using a regular expression.

Lexeme modification methods available:

  • .not
  • .multiple / .multipleOrNone
  • .full
  • .optional
  • .regex
  • .character
  • .spaced
  • .repeat(int min, [int max])
  • .until(Pattern pattern)
final digit = '[0-9]'.regex;
final number = digit.multiple & ('.' & digit.multiple).optional;

final letter = '[a-zA-Z]'.regex;
final word = letter.multiple;
final phrase = word & (' ' & word).multiple.optional;

Operators

Patterns, such as String, RegExp and Lexeme can be combined or modified using operators.

Some operators can only be used to combine patterns, and others can only be used to modify patterns. Modifying operators must be placed before the target pattern.

The available operators are:

Operator Description Action
& / + And Combine
| Or Combine
- Not Modify
~ Spaced Modify

Negative Lexemes

The negation of lexemes might work differently than expected. Negation will not consume the input, but rather ensure that the pattern ahead does not match the target lexeme.

This means negating a lexeme does not mean the same as "any character that is not". To consume any character that doesn't match the lexeme, use a .not.character combination.

Additionally, notice the difference between the use of the negation operator with other modifiers:

final wrongLexeme = -'a'.multiple.optional;
final lexeme = (-'a').multiple.optional;

Although -'a' would consume any character that is not "a", the multiple and optional are added before the negation. The negation of wrongLexeme was applied to the optional lexeme.

To ensure that the negation of a character is applied to the multiple and optional, you may use .not.character.multiple.optional

Reference, and self-reference

Reference lexemes are placeholders, that when requested to tokenize an input will find the lexeme in the grammar bound to it, associated with a name.

Lexemes can be referenced using the functions reference(String name) and self(), or ref(String name) for short.

final abc = 'a' | 'b' | 'c' | reference('def');
final def = ('d' | 'e' | 'f') & self().optional;

For a reference to have an effect, it must be bound to the grammar, and the referenced lexeme must be present in the same grammar. If referenced lexeme is not present, it will throw an error when tokenizing.

Grammar

A grammar is a list of lexemes that will be used to parse an input, essentially a list of rules that define the language.

A grammar has an entry point, called the main lexeme. This lexeme is used to parse the input and will be the only one returned.

Grammar can be defined in two ways, using the constructor:

final grammar = Grammar(
  main: phrase | number,
  rules: {
    'digit': digit,
    'number': number,

    'letter': letter,
    'word': word,
    'phrase': phrase,

    'abc': 'a' | 'b' | 'c',
    'def': 'd' | 'e' | 'f',
  },
);

Or using the .add(String name, Pattern pattern) method:

final grammar = Grammar();

grammar.addMain(phrase | number);

grammar.add('digit', digit);
grammar.add( ... );

Lexemes can tokenize an input by themselves, but it's often more consistent to group the lexemes in a grammar.

That way allowing the use of references and main lexeme. Adding any lexeme to a grammar will effectively bind them together, along with a name, and resolves any self-references.

Parsing an input

The grammar is used for parsing any input, which will tokenize it, taking into account all the lexemes previously added.

Parse an input using .parse(String input, { Lexeme? main }) method.

final grammar = Grammar(...);

grammar.parse('123');
grammar.parse('123.456');

grammar.parse('word');
grammar.parse('two words');

You can override the main lexeme used for parsing the input, by passing it as a parameter.

When parsing an input, it will return a resulting token, which can be used to get the value and position of the lexemes that matched. It can also be used to get the children tokens.

Token

A token is the result of matching a lexeme to an input. It contains the value of the lexeme that matched and the position of the token.

The process of generating this token is called tokenization.

final grammar = Grammar(...);
final token = grammar.parse('123');

print('''
  Value: ${ token.value }
  Lexeme: ${ token.lexeme.name }

  Start: ${ token.start }
  End: ${ token.end }
  Length: ${ token.length }
''');

Lexical Syntax Error

When tokenizing, if the input doesn't match any lexeme, it will throw a LexicalSyntaxError error.

This error displays the position of the error, and the lexemes that were expected to match the input. Additionally, it will display the list of the lexemes that were traversed, as the path to the error.

This error will skip any lexeme that is not named.

Analysing the Token Tree

You may use this token to analyze the resulting tree. Using the .get({ Lexeme? lexeme, String? name }) method will get all the tokens that match the lexeme or name.

The reach of the search can be limited by using the bool shallow parameter, the default is false when having a lexeme or name, and true when no search parameters are given.

final result = grammar.parse('two words');

final tokens = result.get();
final words = result.get(lexeme: word);
final letters = result.get(name: 'letter');

print('Words: ${ words.map((token) => token.value) }');
print('Letters: ${ letters.get(letter).map((token) => token.value) }');

You may also use the .children and .allChildren for a more direct approach. Although the children are not guaranteed to be tokens, they may also be basic matching values, such as of Match type.

Debugging

It's important to know how the grammar is tokenizing the input, and what lexemes are being used. For this reason, a debug mode and syntax errors are available.

Debug Mode

Enable the debug mode by instantiating a DebugGrammar instead of a Grammar.

final grammar = DebugGrammar(...);

Additionally, you can specify debugging parameters:

  • bool showAll: Include lexemes with no name, defaults to false
  • bool showPath: Show the path to the lexeme, defaults to false
  • Duration delay: Delay between each step, defaults to Duration.zero

The informative output is as follows.

│
│  (#3)
├► Tokenizing named syntaxRule
│    at index 0, character "/"
│    on path: (main) → syntax → syntaxRule
│

Lexical Syntax Errors

Syntax errors are thrown when the input doesn't match a required lexeme. The error will display the character, index, lexeme and path.

LexicalSyntaxError: Unexpected character "/"
	at index 0
	with lexeme "syntax"
	on path:
		→ syntax
		↑ (main)

Example

Tokenization (/example/main.dart)
import 'package:token_parser/token_parser.dart';

final whitespace = ' ' | '\t';
final lineBreak = '\n' | '\r';
final space = (whitespace | lineBreak).multiple;

final letter = '[a-zA-Z]'.regex;
final digit = '[0-9]'.regex;

final identifier = letter & (letter | digit).multiple.optional;

final number = digit.multiple & ('.' & digit.multiple).optional;
final string = '"' & '[^"]*'.regex & '"'
              | "'" & "[^']*".regex & "'";

final variableDeclaration =
  'var' & space & identifier & space.optional & '=' & space.optional & (number | string) & space.optional & (';' | space);

final grammar = Grammar(
  main: (variableDeclaration | space).multiple,
  rules: {
    'whitespace': whitespace,
    'lineBreak': lineBreak,
    'space': space,

    'letter': letter,
    'digit': digit,

    'identifier': identifier,

    'number': number,
    'string': string,

    'variableDeclaration': variableDeclaration,
  },
);

void main() {
  final result = grammar.parse('''
    var hello = "world";
    var foo = 123;
    var bar = 123.456;
  ''');

  final numbers = result.get(lexeme: number).map((token) => token.value);
  final identifiers = result.get(lexeme: identifier).map((token) => '"${ token.value }"');

  print('Numbers: $numbers');
  print('Identifiers: $identifiers');
}
Referencing (/example/reference.dart)
import 'package:token_parser/token_parser.dart';

final expression = 'a' & Lexeme.reference('characterB').optional;
final characterB = 'b'.lexeme();

final recursive = 'a' & Lexeme.self().optional;

final grammar = Grammar(
  main: expression,
  rules: {
    'expression': expression,
    'characterB': characterB,
    
    'recursive': recursive,
  }
);

void main() {
  print(grammar.parse('ab').get(lexeme: characterB));
  print(grammar.parse('aaa', recursive).get(lexeme: recursive));
}
You might also like...

A token auction website made by Flutter thats interacts with Ethereum web3 through flutter_web3 package.

A token auction website made by Flutter thats interacts with Ethereum web3 through flutter_web3 package.

flutter_web3_auction A token auction website made by Flutter thats interacts with Ethereum web3 through flutter_web3 package. This flutter web package

Dec 26, 2022

Create Google reCAPTCHA v3 token for Flutter web.

Create Google reCAPTCHA v3 token for Flutter web.

g_recaptcha_v3 Create Google reCAPTCHA v3 token for Flutter web. Google reCAPTCHA v3 plugin for Flutter. A Google reCAPTCHA is a free service that pro

Nov 30, 2022

🧾 Flutter widget allowing easy cache-based data display in a ListView featuring pull-to-refresh and error banners.

Often, apps just display data fetched from some server. This package introduces the concept of fetchable streams. They are just like normal Streams, b

Jan 18, 2022

A Dart EPUB parser built from the ground up, and designed to support a variety of use cases and custom

A Dart EPUB parser built from the ground up, and designed to support a variety of use cases and custom implementations such as on-device caching and serving content from a server.

Nov 3, 2022

Dart library for parsing relative date and time.

Dateparser Dart library for parsing relative date and time. Examples just now a moment ago tomorrow today yesterday 10 days remaining 2 hours ago 2 mo

Sep 20, 2022

Low-level link (text, URLs, emails) parsing library in Dart

linkify Low-level link (text, URLs, emails) parsing library in Dart. Required Dart =2.12 (has null-safety support). Flutter library. Pub - API Docs -

Nov 4, 2022

Receipt parser application written in dart.

Receipt parser application written in dart.

Receipt manager You can find pre-compiled releases on the Github release page or in the FDROID repository. All the needed info about how to use the re

Dec 29, 2022

How to get the most value from Dart static analysis

This package is deprecated. Before it was deprecated, it was the way to get analysis options matching those used internally at Google. This was useful

Nov 4, 2022

Dart duration iso parser - Package to parse duration from ISO 8601 string

duration_iso_parser Package for parsing ISO 8601 durations string to the Duratio

Jan 18, 2022
Owner
JUST A SNIPER ツ
JUST A SNIPER ツ
soTired is an application for cognitive fatigue assessment. It includes a stand-alone Android app for fatigue detection and an additional part for data management and further analysis.

Motivation soTired is an application for cognitive fatigue assessment. It includes a stand-alone Android app for fatigue detection and an additional p

Team Ulster 2.0 5 Oct 22, 2021
Response Parser makes it easier to parse data and error response from server.

Response Parser makes it easier to parse data and error response from server. Getting started Do you want to write this pretty functions... Future<Eit

Qyre AB 4 Nov 5, 2022
Dart JS interop for Mermaid - The Javascript tool that makes use of a markdown based syntax to render customizable diagrams, charts and visualizations.

Mermaid (Dart) Dart JS interop for Mermaid - Javascript library that makes use of a markdown based syntax to render customizable diagrams, charts and

Tim Maffett 3 Dec 12, 2022
A web-safe implementation of dart.io.Platforms. Helps avoid the "Unsupported operation: Platform._operatingSystem" runtime error.

Universal Platform - A Web-safe Platform class Currently, if you include the dart.io.Platform anywhere in your code, your app will throw the following

gskinner team 86 Nov 20, 2022
This is a tool for adding define macro definition to programming languages.

This is a tool for adding define macro definition to programming languages. It is used to distinguish different versions and platforms. The implementation principle is to annotate unnecessary code by using define macros. This tool is theoretically applicable to any programming language.

fengdeyingzi 4 Dec 7, 2022
FIDL(Flutter Interface Definition Language) is a language for transfer objects cross platforms.

Flutter Interface Definition Language (FIDL) README in English(TODO) FIDL 即 Flutter 接口定义语言,类似于AIDL(Android Interface Definition Language)。您可以利用它定义不同平台

null 47 Dec 7, 2022
Syntax highlighting for Dart and Flutter

highlight.dart Syntax highlighting for Dart and Flutter, which supports lots of languages and themes. View gallery built with Flutter web Package Vers

GitTouch 181 Jan 8, 2023
SIES Library Catalog - a free book catalog application with an intuitive interface, available for use with Android devices

SIES Library Catalog Prepared by @kriticalflare @barath121 @sasukeuzumaki31 @mithil467 1. Introduction: - SIES Library Catalog is a free book catalog

kriticalflare 34 Jan 26, 2022
In this video we will learn how to Integrate NodeJS Login and Register API in our Flutter application using JWT Token Authentication.

Flutter Login & Register with Node JS Rest API In this video we will learn how to Integrate NodeJS Login and Register API in our Flutter application u

SnippetCoder 18 Nov 28, 2022
Custom style-dictionary transforms and formats to generate Flutter resources from a Figma Design Token plugin export..

style-dictionary-figma-flutter An extension to style-dictionary to support more custom types with Flutter as target platform. It supports the custom t

Aloïs Deniel 24 Dec 30, 2022