Find UTF-8 byte order marks

In a templating application I just ran into ugly “” characters a the beginning of the text. This is caused by the byte order mark with the hex characters 0xEFBBBF. As it was not the only one file that contained the BOM I ran a search:

find . -iname '*.css' -o -iname '*.html' -o -iname '*.js' -o -iname '*.pm' -o -iname '*.pl' -o -iname '*.xml' | xargs grep -rl $'xEFxBBxBF'

To remove the BOM I followed the suggested way by http://stackoverflow.com/questions/204765/elegant-way-to-search-for-utf-8-files-with-bom using sed:

find . -iname '*.css' -o -iname '*.html' -o -iname '*.js' -o -iname '*.pm' -o -iname '*.pl' -o -iname '*.xml' -exec sed 's/^xEFxBBxBF//' -i.bak {} ; -exec rm {}.bak ;

Tada, no more ugly BOMs!

Advertisement

2 thoughts on “Find UTF-8 byte order marks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s