Evan Wildenhain - Advent of Code Day 4: Passport Validation

The day 4 puzzle… wasn’t my favorite, I have to admit, as it seemed like it was just parsing and validation. We get data for each “passport” in the following form:

ecl:gry pid:860033327 eyr:2020 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm

which is meant to represent fields and their values on the passport. Then we need to validate that the appropriate fields are present and meet specified criteria.

As I alluded to in a previous post, for “real” parsing scenarios, I’d use a parser combinator library like Megaparsec. Let’s jump into it!

import Text.Megaparsec
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer

type Parser = Parsec Void Text

It’s typical to define a type synonym for your Parser monad (I named mine Parser) to sidestep some type inference ambiguity issues. The Void bit there means that we aren’t extending Megaparsec’s native error type in any way, and the Text bit means that our input stream is Text (as opposed to some other stream of tokens like ByteString).

Let’s define the set of allowable fields:

data FieldType
  = Byr
  | Iyr
  | Eyr
  | Hgt
  | Hcl
  | Ecl
  | Pid
  | Cid
  deriving (Show, Eq, Generic)

instance Hashable FieldType

I named each field data constructor after the three-letter code for it that appears in the passport data. With that we can look at our first actual parsing function.

pFieldType :: Parser FieldType
pFieldType =
  string "byr" $> Byr
    <|> string "iyr" $> Iyr
    <|> string "eyr" $> Eyr
    <|> string "hgt" $> Hgt
    <|> string "hcl" $> Hcl
    <|> string "ecl" $> Ecl
    <|> string "pid" $> Pid
    <|> string "cid" $> Cid

The tutorial that I learned Megaparsec from follows a naming conventiom of prepending p to the thing you’re parsing, and it stuck with me. As we said before, Parser is a Monad, but it also has an Alternative instance, meaning we can combine two parsers p and q as p <|> q to produce a new parser that first tries to parse input as p, and if p fails, falls back to trying to parse as q instead.

type Field = (FieldType, Text)

pField :: Parser Field
pField = do
  fieldType <- pFieldType
  _ <- char ':'
  txt <- takeWhile1P (Just "non-whitespace") (\c -> c /= ' ' && c /= '\n')
  return (fieldType, txt)

Again, since Parser is a monad, we can use do-notation to sequence multiple Parser actions together in order to parse a larger construct. This also makes it easy to define handy generic combinators for parsers:

type Passport = [Field]

pNonemptyList :: Parser a -> Parser b -> Parser [a]
pNonemptyList pElem pSep = do
  h <- pElem
  tl <- many $ try $ pSep *> pElem
  return $ h : tl

pPassport :: Parser Passport
pPassport = pNonemptyList pField (char ' ' <|> char '\n')

pInput :: Parser [Passport]
pInput = pNonemptyList pPassport (string "\n\n")

We can define a pNonemptyList combinator that takes parsers for list elements and the separators between them, then use it to parse fields into a passport (separated by a single space or newline), and also entire passports (separated by double newlines).

The try bit you see in the line tl <- many $ try $ pSep *> pElem is to tell Megaparsec that if it fails while parsing pSep *> pElem, it should backtrack to before it consumed any input for pSep. (“Backtrack” means rewinding the input stream so that the next parser gets to look at it freshly. a <|> b might reject some cases where a partially consumed the input and then failed, while try a <|> b may parse those cases.) By default, Megaparsec won’t backtrack – that performs better and can help avoid nonterminating parses due to recursive grammars. We don’t have to worry about that in this case, though.)

I’m going to leave it there for now! Part 2 of the challenge involves validating specific business rules for each field. For example, for the hgt (Height) field:

data HeightUnit = Cm | In

pHeight :: Parser (Int, HeightUnit)
pHeight = do
  num <- decimal
  unit <- (string "cm" $> Cm) <|> (string "in" $> In)
  return (num, unit)

-- and then inside the full passport validator...
  hgt :: (Int, HeightUnit) <- Hgt `lookup` fields >>= parseMaybe pHeight
    check $ case snd hgt of
      Cm -> fst hgt `isInRange` (150, 193)
      In -> fst hgt `isInRange` (59, 76)

As always, feel free to check out my full solution on GitHub: https://github.com/ewilden/aoc-2020/blob/main/src/Day04.hs