Categories
command-line How-To

Remove ^M characters and more with repl.bash

Hey folks, this is a goody but quicky.

First off, respect the character encoding of a file. I don’t know how many devs out there violate this rule, but if you’re like me and Joel On Software, you’ll agree that you should respect the character encoding of a file.

If you happen to see that your file has gotten code page 1252 aka Windows-Latin 1 in it, then you’ll have a variety of random characters like ^M or ?~@~Y or ?~@~\ or ?~@~] .

Well, I wrote a script that removes these guys and makes sure that the file format of Unix is respected. Here it is:

#!/bin/bash
#
# By: barce[a t]codebelay.com
# ——————-
# this script replaces microsoft special chars with plain ol’ ascii
#
# usage: ./repl.bash filename
#

# replace ^M characters
perl -pi -e ‘s/\x{0D}\x{0A}/\x{0A}/g’ $1

# replace garbage with single-quotes
# ?~@~Y
perl -pi -e ‘s/\x{E2}\x{80}\x{99}/\x{27}/g’ $1
perl -pi -e ‘s/\x{80}\x{99}/\x{27}/g’ $1
perl -pi -e ‘s/\x{80}\x{9c}/\x{27}/g’ $1
perl -pi -e ‘s/\x{80}\x{9d}/\x{27}/g’ $1

# replace garbage with asterisk
# ?~@?
# e280 a2
perl -pi -e ‘s/\x{E2}\x{80}\x{A2}/\x{2A}/g’ $1

# replace garbage quotes with plain quotes
# start: ?~@~\
# close: ?~@~]
# e2 809c
perl -pi -e ‘s/\x{E2}\x{80}\x{9C}/\x{22}/g’ $1
perl -pi -e ‘s/\x{E2}\x{80}\x{9D}/\x{22}/g’ $1

# replace garbage hyphens with plain hyphens
perl -pi -e ‘s/\x{E2}\x{80}\x{93}/\x{2D}/g’ $1

# replace garbage with ellipsis
perl -pi -e ‘s/\x{E2}\x{80}\x{A6}/\x{2E}\x{2E}\x{2E}/g’ $1