python - Eliminating Unwanted characters -
how eliminate characters like, e.g. “it , in word
these characters causing python program fail. how handle these characters, input file has lots of them.
please help. thanks
that looks utf-8 being misinterpreted different encoding. try:
fixed_input_string = input_string.decode('utf-8')
and see if solves problem.
btw, if have no idea said, read http://www.joelonsoftware.com/articles/unicode.html right now. if try write software accepts "english" text (which means ascii, there plenty of characters used in standard english text aren't in ascii), software going fail in kinds of "interesting" ways. unicode isn't going away, , you'll have learn sometime -- time start.