Quantcast
Channel: Hacker News
Viewing all articles
Browse latest Browse all 25817

Writing Your Own Programming Language

$
0
0

README.md

It's a playground explaining how to create a tiny programming language (Mu).

You can download the playground here or check the source code live here

Or follow the tutorial below.


You don't need a CS degree to write a programing language, you just need to understand 3 basic steps.

The Language: Mu(μ)

Mu is a minimal language, that is consisted by a postfix operator, a binary operation and one digit numbers.

Examples:

(s 2 4) or (s (s 4 5) 4) or (s (s 4 5) (s 3 2))...

The Steps:

Alt text


"In computer science, lexical analysis is the process of converting a sequence of characters into a sequence of tokens (strings with an identified "meaning"). A program that performs lexical analysis may be called a lexer, tokenizer,[1] or scanner (though "scanner" is also used to refer to the first stage of a lexer). Such a lexer is generally combined with a parser, which together analyze the syntax of programming languages..."-Wikipedia

The idea is to transform an array of charaters into an array of tokens (strings with an identified "meaning")

Example:

Alt text

Because Mu is so small--only one character operator and numbers--you can simply iterate over the input and check each character.

enumToken {caseparensOpencaseop(String)casenumber(Int)caseparensClose
}structLexer {staticfunctokenize(_input: String) -> [Token] {return input.characters.flatMap {switch$0 {case"(":return Token.parensOpencase")":return Token.parensClosecase"s":return Token.op($0.description)default:if"0"..."9"~=$0 {return Token.number(Int($0.description)!)
                }
            }returnnil
        }
    }
}let input ="(s (s 4 5) 4)"let tokens = Lexer.tokenize(input)

Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, conforming to the rules of a formal grammar...-Wikipedia

Grammar:

expression: parensOpen operator primaryExpression primaryExpression parensClose

primaryExpression: expression | number

parensOpen: "("

parensClose: ")"

operator: "s"

number: [0-9]

Mu's grammar is a context-free grammar, that means it describes all possible strings in the language. The parser will start from the top (root of the generated tree) and it will go until the lowest node.

Tip: the code should be a direct representation of the grammar

func parseExpression() -> ExpressionNode {
   ...
   firstPrimaryExpression = parsePrimaryExpression()
   secondPrimaryExpression = parsePrimaryExpression()
   ...
}

func parseExpression() -> PrimaryExpressionNode {
   return parseExpression() || parseNumber()
}

Alt text

indirectenumPrimaryExpressionNode {casenumber(Int)caseexpression(ExpressionNode)
}structExpressionNode {var op:Stringvar firstExpression: PrimaryExpressionNodevar secondExpression: PrimaryExpressionNode
}structParser {var index =0let tokens: [Token]init(tokens: [Token]) {self.tokens= tokens
    }mutatingfuncpopToken() -> Token {let token = tokens[index]
        index +=1return token
    }mutatingfuncpeekToken() -> Token {return tokens[index]
    }mutatingfuncparse() throws-> ExpressionNode {returntryparseExpression()
    }mutatingfuncparseExpression() throws-> ExpressionNode {guardcase .parensOpen=popToken() else {throw ParsingError.unexpectedToken
        }guardcaselet Token.op(_operator) =popToken() else {throw ParsingError.unexpectedToken
        }let firstExpression =tryparsePrimaryExpression()let secondExpression =tryparsePrimaryExpression()guardcase .parensClose=popToken() else {throw ParsingError.unexpectedToken
        }returnExpressionNode(op: _operator, firstExpression: firstExpression, secondExpression: secondExpression)
    }mutatingfuncparsePrimaryExpression() throws-> PrimaryExpressionNode {switchpeekToken() {case .number:returntryparseNumber()case .parensOpen:let expressionNode =tryparseExpression()return PrimaryExpressionNode.expression(expressionNode)default:throw ParsingError.unexpectedToken
        }
    }mutatingfuncparseNumber() throws-> PrimaryExpressionNode {guardcaselet Token.number(n) =popToken() else { throw ParsingError.unexpectedToken }return PrimaryExpressionNode.number(n)
    }

}

//MARK: UtilsextensionExpressionNode: CustomStringConvertible{publicvar description:String {return"\(op) -> [\(firstExpression), \(secondExpression)]"
    }
}extensionPrimaryExpressionNode: CustomStringConvertible{publicvar description:String {switchself {case .number(let n):return n.descriptioncase .expression(let exp):return exp.description
        }
    }
}let input ="(s 2 (s 3 5))"let tokens = Lexer.tokenize(input)var parser =Parser(tokens: tokens)var ast =try! parser.parse()

"In computer science, an interpreter is a computer program that directly executes, i.e. performs, instructions written in a programming or scripting language, without previously compiling them into a machine language program."-Wikipedia

Example:

Mu's interpreter will walk through its A.S.T and compute a value by applying an operator to the children nodes.

Alt text

enumInterpreterError: Error{caseunknownOperator
}structInterpreter {staticfunceval(_expression: ExpressionNode) throws->Int {let firstEval =tryeval(expression.first)let secEval =tryeval(expression.second)if expression.op=="s" {return firstEval + secEval
        }throw InterpreterError.unknownOperator
    }    staticfunceval(_prim: PrimaryExpressionNode) throws->Int {switch prim {case .expression(let exp):returntryeval(exp)case .number(let n):returnInt(n)
        }
    }

}

let input ="(s (s 5 2) 4)"let tokens = Lexer.tokenize(input)var parser =Parser(tokens: tokens)let ast =try! parser.parse()try! Interpreter.eval(ast)

Alt text

  • Given an inputlet input = "(s (s 4 5) 4)
  • Extract an array of tokens (Lexing)let tokens = Lexer.tokenize(input)
  • Parse the given tokens into a tree (Parsing)
var parser = Parser(tokens: tokens)
let ast = try! parser.parse()

Viewing all articles
Browse latest Browse all 25817

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>