The textbook way to do this, assuming that you do need to handle
included CRLFs and/or other control characters, _and_ that you need to
handle files of arbitrary (i.e. large!) size, would probably be to
open the file as binary, read it in fixed-length chunks, and
"double-buffer" the input in such a way that matches would be
guaranteed to be found even if they fall across block margins. The
details of the most efficient way of doing this would depend on your
scanning algorithm.