The day 4 puzzle… wasn’t my favorite, I have to admit, as it seemed like it was just parsing and validation. We get data for each “passport” in the following form:
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm
which is meant to represent fields and their values on the passport. Then we need to validate that the appropriate fields are present and meet specified criteria.
As I alluded to in a previous post, for “real” parsing scenarios, I’d use a parser combinator library like Megaparsec. Let’s jump into it!
import Text.Megaparsec import Text.Megaparsec.Char import Text.Megaparsec.Char.Lexer type Parser = Parsec Void Text
It’s typical to define a type synonym for your Parser monad (I named mine
Parser) to sidestep some type inference ambiguity issues. The
Void bit there means that we aren’t extending Megaparsec’s native error type in any way, and the
Text bit means that our input stream is
Text (as opposed to some other stream of tokens like
Let’s define the set of allowable fields:
data FieldType = Byr | Iyr | Eyr | Hgt | Hcl | Ecl | Pid | Cid deriving (Show, Eq, Generic) instance Hashable FieldType
I named each field data constructor after the three-letter code for it that appears in the passport data. With that we can look at our first actual parsing function.
pFieldType :: Parser FieldType = pFieldType "byr" $> Byr string <|> string "iyr" $> Iyr <|> string "eyr" $> Eyr <|> string "hgt" $> Hgt <|> string "hcl" $> Hcl <|> string "ecl" $> Ecl <|> string "pid" $> Pid <|> string "cid" $> Cid
The tutorial that I learned Megaparsec from follows a naming conventiom of prepending
p to the thing you’re parsing, and it stuck with me. As we said before,
Parser is a
Monad, but it also has an
Alternative instance, meaning we can combine two parsers
p <|> q to produce a new parser that first tries to parse input as
p, and if
p fails, falls back to trying to parse as
type Field = (FieldType, Text) pField :: Parser Field = do pField <- pFieldType fieldType <- char ':' _ <- takeWhile1P (Just "non-whitespace") (\c -> c /= ' ' && c /= '\n') txt return (fieldType, txt)
Parser is a monad, we can use do-notation to sequence multiple
Parser actions together in order to parse a larger construct. This also makes it easy to define handy generic combinators for parsers:
type Passport = [Field] pNonemptyList :: Parser a -> Parser b -> Parser [a] = do pNonemptyList pElem pSep <- pElem h <- many $ try $ pSep *> pElem tl return $ h : tl pPassport :: Parser Passport = pNonemptyList pField (char ' ' <|> char '\n') pPassport pInput :: Parser [Passport] = pNonemptyList pPassport (string "\n\n")pInput
We can define a
pNonemptyList combinator that takes parsers for list elements and the separators between them, then use it to parse fields into a passport (separated by a single space or newline), and also entire passports (separated by double newlines).
try bit you see in the line
tl <- many $ try $ pSep *> pElem is to tell Megaparsec that if it fails while parsing
pSep *> pElem, it should backtrack to before it consumed any input for
pSep. (“Backtrack” means rewinding the input stream so that the next parser gets to look at it freshly.
a <|> b might reject some cases where
a partially consumed the input and then failed, while
try a <|> b may parse those cases.) By default, Megaparsec won’t backtrack – that performs better and can help avoid nonterminating parses due to recursive grammars. We don’t have to worry about that in this case, though.)
I’m going to leave it there for now! Part 2 of the challenge involves validating specific business rules for each field. For example, for the
hgt (Height) field:
data HeightUnit = Cm | In pHeight :: Parser (Int, HeightUnit) = do pHeight <- decimal num <- (string "cm" $> Cm) <|> (string "in" $> In) unit return (num, unit) -- and then inside the full passport validator... hgt :: (Int, HeightUnit) <- Hgt `lookup` fields >>= parseMaybe pHeight $ case snd hgt of check Cm -> fst hgt `isInRange` (150, 193) In -> fst hgt `isInRange` (59, 76)
As always, feel free to check out my full solution on GitHub: https://github.com/ewilden/aoc-2020/blob/main/src/Day04.hs