The day 4 puzzle… wasn’t my favorite, I have to admit, as it seemed like it was just parsing and validation. We get data for each “passport” in the following form:
ecl:gry pid:860033327 eyr:2020 hcl:#fffffd
byr:1937 iyr:2017 cid:147 hgt:183cm
which is meant to represent fields and their values on the passport. Then we need to validate that the appropriate fields are present and meet specified criteria.
As I alluded to in a previous post, for “real” parsing scenarios, I’d use a parser combinator library like Megaparsec. Let’s jump into it!
import Text.Megaparsec
import Text.Megaparsec.Char
import Text.Megaparsec.Char.Lexer
type Parser = Parsec Void Text
It’s typical to define a type synonym for your Parser monad (I named mine Parser
) to sidestep some type inference ambiguity issues. The Void
bit there means that we aren’t extending Megaparsec’s native error type in any way, and the Text
bit means that our input stream is Text
(as opposed to some other stream of tokens like ByteString
).
Let’s define the set of allowable fields:
data FieldType
= Byr
| Iyr
| Eyr
| Hgt
| Hcl
| Ecl
| Pid
| Cid
deriving (Show, Eq, Generic)
instance Hashable FieldType
I named each field data constructor after the three-letter code for it that appears in the passport data. With that we can look at our first actual parsing function.
pFieldType :: Parser FieldType
=
pFieldType "byr" $> Byr
string <|> string "iyr" $> Iyr
<|> string "eyr" $> Eyr
<|> string "hgt" $> Hgt
<|> string "hcl" $> Hcl
<|> string "ecl" $> Ecl
<|> string "pid" $> Pid
<|> string "cid" $> Cid
The tutorial that I learned Megaparsec from follows a naming conventiom of prepending p
to the thing you’re parsing, and it stuck with me. As we said before, Parser
is a Monad
, but it also has an Alternative
instance, meaning we can combine two parsers p
and q
as p <|> q
to produce a new parser that first tries to parse input as p
, and if p
fails, falls back to trying to parse as q
instead.
type Field = (FieldType, Text)
pField :: Parser Field
= do
pField <- pFieldType
fieldType <- char ':'
_ <- takeWhile1P (Just "non-whitespace") (\c -> c /= ' ' && c /= '\n')
txt return (fieldType, txt)
Again, since Parser
is a monad, we can use do-notation to sequence multiple Parser
actions together in order to parse a larger construct. This also makes it easy to define handy generic combinators for parsers:
type Passport = [Field]
pNonemptyList :: Parser a -> Parser b -> Parser [a]
= do
pNonemptyList pElem pSep <- pElem
h <- many $ try $ pSep *> pElem
tl return $ h : tl
pPassport :: Parser Passport
= pNonemptyList pField (char ' ' <|> char '\n')
pPassport
pInput :: Parser [Passport]
= pNonemptyList pPassport (string "\n\n") pInput
We can define a pNonemptyList
combinator that takes parsers for list elements and the separators between them, then use it to parse fields into a passport (separated by a single space or newline), and also entire passports (separated by double newlines).
The try
bit you see in the line tl <- many $ try $ pSep *> pElem
is to tell Megaparsec that if it fails while parsing pSep *> pElem
, it should backtrack to before it consumed any input for pSep
. (“Backtrack” means rewinding the input stream so that the next parser gets to look at it freshly. a <|> b
might reject some cases where a
partially consumed the input and then failed, while try a <|> b
may parse those cases.) By default, Megaparsec won’t backtrack – that performs better and can help avoid nonterminating parses due to recursive grammars. We don’t have to worry about that in this case, though.)
I’m going to leave it there for now! Part 2 of the challenge involves validating specific business rules for each field. For example, for the hgt
(Height) field:
data HeightUnit = Cm | In
pHeight :: Parser (Int, HeightUnit)
= do
pHeight <- decimal
num <- (string "cm" $> Cm) <|> (string "in" $> In)
unit return (num, unit)
-- and then inside the full passport validator...
hgt :: (Int, HeightUnit) <- Hgt `lookup` fields >>= parseMaybe pHeight
$ case snd hgt of
check Cm -> fst hgt `isInRange` (150, 193)
In -> fst hgt `isInRange` (59, 76)
As always, feel free to check out my full solution on GitHub: https://github.com/ewilden/aoc-2020/blob/main/src/Day04.hs