问题描述
如何在 C# 中编写解析器(递归下降?)?现在我只想要一个简单的解析器来解析算术表达式(并读取变量?).虽然后来我打算写一个 xml 和 html 解析器(用于学习目的).我这样做是因为解析器有用的东西范围很广:Web 开发、编程语言解释器、内部工具、游戏引擎、地图和瓷砖编辑器等.那么编写解析器的基本理论是什么,我该怎么做在 C# 中实现一个?C# 是解析器的正确语言吗(我曾经用 C++ 编写了一个简单的算术解析器,它很高效.JIT 编译会证明同样好吗?).任何有用的资源和文章.最重要的是,代码示例(或代码示例的链接).
How do I go about writing a Parser (Recursive Descent?) in C#? For now I just want a simple parser that parses arithmetic expressions (and reads variables?). Though later I intend to write an xml and html parser (for learning purposes). I am doing this because of the wide range of stuff in which parsers are useful: Web development, Programming Language Interpreters, Inhouse Tools, Gaming Engines, Map and Tile Editors, etc. So what is the basic theory of writing parsers and how do I implement one in C#? Is C# the right language for parsers (I once wrote a simple arithmetic parser in C++ and it was efficient. Will JIT compilation prove equally good?). Any helpful resources and articles. And best of all, code examples (or links to code examples).
注意:出于好奇,回答这个问题的人是否曾经在 C# 中实现过解析器?
Note: Out of curiosity, has anyone answering this question ever implemented a parser in C#?
推荐答案
我已经用 C# 实现了几个解析器 - 手写和工具生成.
I have implemented several parsers in C# - hand-written and tool generated.
一个非常好的一般解析入门教程是让我们构建一个编译器 - 它演示了如何构建递归下降解析器;对于任何有能力的开发人员,这些概念很容易从他的语言(我认为是 Pascal)翻译成 C#.这将教您递归下降解析器的工作原理,但是手动编写完整的编程语言解析器是完全不切实际的.
A very good introductory tutorial on parsing in general is Let's Build a Compiler - it demonstrates how to build a recursive descent parser; and the concepts are easily translated from his language (I think it was Pascal) to C# for any competent developer. This will teach you how a recursive descent parser works, but it is completely impractical to write a full programming language parser by hand.
您应该研究一些工具来为您生成代码 - 如果您决心编写 经典递归下降解析器 (TinyPG, Coco/R, 讽刺).请记住,现在还有其他编写解析器的方法,它们通常性能更好 - 并且定义更简单(例如 TDOP解析或一元解析).
You should look into some tools to generate the code for you - if you are determined to write a classical recursive descent parser (TinyPG, Coco/R, Irony). Keep in mind that there are other ways to write parsers now, that usually perform better - and have easier definitions (e.g. TDOP parsing or Monadic Parsing).
关于 C# 是否适合这项任务 - C# 有一些最好的文本库.今天的许多解析器(在其他语言中)都有大量的代码来处理 Unicode 等.我不会对 JITted 代码发表太多评论,因为它可能会变得非常虔诚——但是你应该没问题.IronJS 是 CLR 上的解析器/运行时的一个很好的例子(尽管它是用 F# 编写的)及其性能略逊于 Google V8.
On the topic of whether C# is up for the task - C# has some of the best text libraries out there. A lot of the parsers today (in other languages) have an obscene amount of code to deal with Unicode etc. I won't comment too much on JITted code because it can get quite religious - however you should be just fine. IronJS is a good example of a parser/runtime on the CLR (even though its written in F#) and its performance is just shy of Google V8.
旁注:与语言解析器相比,标记解析器是完全不同的野兽——在大多数情况下,它们是手工编写的——并且在扫描器/解析器级别非常简单;它们通常不是递归下降的——特别是在 XML 的情况下,最好不要编写递归下降解析器(以避免堆栈溢出,并且因为可以在 SAX/推送模式下使用平面"解析器).
Side Note: Markup parsers are completely different beasts when compared to language parsers - they are, in the majority of the cases, written by hand - and at the scanner/parser level very simple; they are not usually recursive descent - and especially in the case of XML it is better if you don't write a recursive descent parser (to avoid stack overflows, and because a 'flat' parser can be used in SAX/push mode).
这篇关于如何用 C# 编写解析器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!