Unfortunately I'm not familiar with Scala's combinator parser. If you could point me to a readable formal grammar and a sample input of what you're trying to parse, I'll try to help you with the FParsec parser. If you already have an F# AST, it would be good if you could post that too.

- Stephan

By on 10/31/2009 12:57 PM ()

This should parse something like this DDL grammar (from SQL).

1
CREATE TABLE attributes (id INTEGER PRIMARY KEY, name TEXT, min NUMERIC, max NUMERIC, category INTEGER);CREATE TABLE database_version (version_number INTEGER);

I am starting to get more of the hang of it.

Here is the current mapping of Scala's to F#'s operators...

1
2
3
~ ==  >>.

| == <|>

Full source for the parser is [link:github.com]

Sample text file that should parse

[link:github.com]

The next thing I need to figure out is how to create objects while parsing the code.

For instance the Scala code below matches the inputs and allows me to generate a type of object called Instruction. I am not quite sure how to get my matched strings from FParspec.

1
2
def instr: Parser[Instruction] = create ~ table ~ item ~ ColumnInfo ~ ";" ^^
          {case create ~ table ~ itm ~ col ~ semi  => new Instruction(create, table, CapitalizeFirstLetter(itm), col) }

The code above is basically saying for my grammar create ~ table .... do a match on the values and create a new instruction with these values. Make sense?

By on 10/31/2009 1:58 PM ()

You can create objects by using the piping primitives "|>>" and "pipe2", "pipe3", ... Please have a look at the sample projects and the reference docs to see how these work.

Maybe the following (non-literal) translation of your parser to FParsec will help you a bit.

(Edit: Slightly changed the definitions of ws, name and charsBeforeEol)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
open System.Collections.Generic


type DataType = | BINARY     = 1
                | BIGINT     = 2
                | BIT        = 3
                | CHAR       = 4
                | DATE       = 5
                | DATETIME   = 6
                | DATETIME2  = 7
                | DECIMAL    = 8           
                | FLOAT      = 9
                | IMAGE      = 10
                | INT        = 11
                | INTEGER    = 12
                | MONEY      = 13         
                | NCHAR      = 14
                | NTEXT      = 15
                | NUMERIC    = 16
                | NVARCHAR   = 17
                | REAL       = 18
                | SMALLINT   = 19
                | SMALLMONEY = 20
                | TEXT       = 21
                | TIME       = 22
                | TINYINT    = 23                                                         
                | VARBINARY  = 24
                | VARCHAR    = 25       


let dataTypeDict = 
    let dict = new System.Collections.Generic.Dictionary<string, DataType>(32)    
    for name in DataType.GetNames(typeof<DataType>) do
        dict.Add(name, unbox (System.Enum.Parse(typeof<DataType>, name)))
    dict

                        
type Column(name: string, dataType: DataType, isPrimaryKey: bool) =
    member t.Name = name
    member t.Type = dataType
    member t.IsPrimaryKey = isPrimaryKey    
    override t.ToString() = sprintf "Column(name = %s, dataType = %s, isPrimaryKey = %b)" 
                                    name (dataType.ToString()) isPrimaryKey


type IsAdd = bool    


type Action = CreateTable of string * Column list
            | CreateIndex of string
            | AlterTable of string * IsAdd * string 


module Parser =     
    open FParsec.Primitives    
    open FParsec.CharParsers
    open FParsec.Error


    type Parser<'a> = Parser<'a,unit>


    let ws = skipManySatisfy (fun c -> c = ' ' || c = '\t') // whitespace
    let ch c = pchar c .>> ws
    let str s = pstring s .>> ws
                // stringCIReturn s s >>. ws // case insensitive alternative


    // a helper for parsing any constant in a dict
    let dictParser label (dict: Dictionary<string,'a>) (p: Parser<string>) =
        let error = messageError ("invalid " + label + " value")
        fun state ->
            let reply = p state
            if reply.Status = Ok then
                let mutable v = Unchecked.defaultof<'a>
                if dict.TryGetValue(reply.Result, &v) then Reply(v, reply.State)
                else Reply(Error, error, state)
            else Reply(reply.Status, reply.Error, reply.State)    

    
    let dataType = 
        let dataTypeStr = many1Satisfy isAsciiLetter .>> ws
                          // |>> fun s -> s.ToUpperInvariant()) // case insensitive alternative
        dictParser "data type" dataTypeDict dataTypeStr


    let isNameFirstChar = fun c -> isAsciiLetter c || c = '_'
    let isNameChar = fun c -> isAsciiLetter c || isDigit c || c = '_'
    let name = many1Satisfy2L isNameFirstChar isNameChar "identifier" .>> ws


    let alter = str "ALTER"
    let create = str "CREATE"
    let index = str "INDEX"
    let table = str "TABLE"

        
    let eol = ch ';' >>. (skipNewline <|> eof)
    let charsBeforeEol =  manySatisfy (fun c -> c <> ';' && c <> '\n')

    
    let parameter = pipe3 name dataType (opt (str "PRIMARY" >>. str "KEY"))
                          (fun name dataType primaryKeyOpt ->
                               Column(name, dataType, primaryKeyOpt.IsSome))


    let columnInfo = between (ch '(') (ch ')') (sepBy parameter (ch ','))

    
    let createTable = table >>. pipe2 name columnInfo 
                                      (fun item paras -> CreateTable(item, paras))


    let createIndex = index >>. charsBeforeEol |>> CreateIndex

    
    let createAction = create >>. (createTable <|> createIndex)

    
    let alterAction = 
        let addOrDrop = str "ADD" <|> str "DROP"        
        alter >>. table >>. pipe3 name addOrDrop charsBeforeEol
                                  (fun item addOrDrop rest -> AlterTable(item, addOrDrop = "ADD", rest))


    let action = createAction <|> alterAction


    let file = many (spaces >>. action .>> eol) .>> eof


    let parseDDLFile filePath =    
        runParserOnFile file () filePath System.Text.Encoding.Default


let testString = 
    @"CREATE TABLE attributes (id INTEGER PRIMARY KEY, name TEXT, min NUMERIC, max NUMERIC, category INTEGER);
    CREATE TABLE database_version (version_number INTEGER);
    CREATE TABLE data (id INTEGER PRIMARY KEY, letter TEXT );
    CREATE TABLE events (start_day INTEGER, duration INTEGER );
    CREATE INDEX idx_index ON attributes (don't care);"


printf "%A" (FParsec.CharParsers.run Parser.file testString)
By on 11/1/2009 4:15 AM ()

Thanks a lot Stephan. This has helped me out a lot.
I now understand the pipe operators as well. However, I know I am doing something wrong but not sure exactly what...
I have another grammar I am trying to parse. The grammar is

1
let testString = "PROJECT FILE\nVERSION:    5\n"

I want to verify the first line is PROJECT FILE but ignore it. I then want to parse VERSION: (but ignore it) and grab the value (which in this case is 5).

Everything seems to parse fine and I get a result of Success: () printed out. I also see that my object Version is created. But I am not sure how to access the object that is created. In the DDL example above everything prints out. In this case there is only the success printed out but not any data associated with it. Is this because my types for my parser is not correct? i.e. It is returning a unit? I appreciate the help and apologize for the newbie questions..

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
type sparser = Parser<string, unit>
type uparser = Parser<unit, unit> 
type Project = Version of int

module test1 =
   let ws : uparser = skipManySatisfy (fun c -> c = ' ' || c = '\t') // whitespace
   let ch c = pchar c .>> ws
   let str s = pstring s .>> ws
   let skipStr s = skipString s .>> ws
   let eol = skipNewline
   let VersionValue : sparser = many1Chars digit
   let ProjectKeyword = skipStr "PROJECT FILE"
   let VersionKeyword = skipStr "VERSION:"
   let literal : sparser = asciiLetter <|> digit >>. manyChars (anyOf ":_\\/." <|> asciiLetter <|> digit)
   let versionval = VersionValue |>> (fun version ->
                                             Version(System.Convert.ToInt32(version)))
   let projFile = ProjectKeyword .>> eol .>> VersionKeyword .>> versionval .>> (eol <|> spaces)
   let file = (spaces >>. projFile) .>> eof
   let parseProjectFile fileName = runParserOnFile file () fileName System.Text.Encoding.Default

let testString2 = "PROJECT FILE\nVERSION: 5\n"
printfn "%A" (FParsec.CharParsers.run test1.file testString2)
By on 11/2/2009 9:50 AM ()

You need to replace

1
VersionKeyword .>> versionval

with

1
VersionKeyword >>. versionval

, i.e. move the dot to the right-hand side. The dot marks the parser whose result is returned (and the operators .>> and >>. are both left-associative). Currenty projType returns unit because it returns the result of

1
ProjectKeyword 

instead of versionVal.

Some more comments:

  • 1
    
    eol <|> spaces

    can be simplified to spaces. (spaces skips any space, tab or newline chars.)

  • If the version number is indeed a simple integer, it is better to use pint32 or a similar number parser in the

    1
    
    VersionValue

    parser.

  • You should prefer >>. to .>> if you don't need either of the results, e.g. use

    1
    
    let projFile = ProjectKeyword >>. eol >>. VersionKeyword >>. versionval .>> spaces

    .

  • There's no performance difference between pstring and skipString, so there isn't really a need for your skipStr parser.
By on 11/2/2009 12:55 PM ()

And I hope this is the last question. I appreciate your patience and especially your answers.

Using the latest example above where I have a file with a header like this...

1
2
3
4
PROJECT FILE
VERSION: 5
NAME: TEST
PLATFORM: PC

It has to be in this order in the file and I want to populate a class called ProjectHeader with the version, name, and platform data. How do I go about this? In P-Code I would

  1. Create new ProjectHeader instance when I hit the text PROJECT FILE. Save that in the parser state.
  2. Parse the VERSION FIELD. Grab the parser state's ProjectHeader. Set the mutable field for version.
  3. Do the same for NAME and Platform.

However, I am not quite following how to set-up the heirarchy where it seems like I want a ProjectHeader parser that then contains a field(s) parser whereby the field parsers contain access to the ProjectHeader parser so they can grab out that state.

Also, if there is a more functional way to do this so I don't need to worry about state I am all ears as well.

By on 11/3/2009 5:49 AM ()

Couldn't you just use something like the following?

1
2
3
4
5
6
7
8
let header = tuple4 projectType projectVersion projectName projectPlatform

let body = (* parser for project body*)

let project = pipe2 header body 
                    (fun (ty, ver, name, platform) body -> new Project(...))

let projects = many projects
By on 11/3/2009 6:15 AM ()

Sweet!!! Thanks for the help. The light bulb (finally :) clicked on and have written most of my parser which is more detailed than the snippets above. Yeah!!! I now feel that I get most of the patterns.

What do you recommend doing if you have more than 5 parsers to pipe as there is only a pipe5?

By on 11/4/2009 6:26 AM ()

You could for example group arguments as tuples. For example:

1
2
3
let pipe6 p1 p2 p3 p4 p5 p6 f = 
    pipe5 p1 p2 p3 p4 (tuple2 p5 p6)
          (fun x1 x2 x3 x4 (x5, x6) -> f x1 x2 x3 x4 x5 x6)) 
By on 11/4/2009 1:28 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper