handling “eof inside string at line x” while using pandas.read_csv()

The sky is a little grey today because the program I wrote throwing a strange error. And It takes me couple hours to figure out.

1. The story about this error.

Error "EOF inside string starting at line X" occurs while I was using maybe the best file-reading function in our world, pandas.read_csv(…).

What I saw back then, there is only a chinese character at line X and I can not find any EOF in any where but the end of file(of course there is one).

At first, I thought the encoding of file may have the responsibility in this issue since the file is full of chinese character and Python 2.7 is really not good at handling it.

After being strictly checked over and over again, it turns out to be, there is one duoble quotation mark (") at the next line. And I think the pd.read_csv() function was expecting second double quotation mark to make himself complete as his priority, ignoring every column delimiter and End of File, and, unfortunately, reached the very end of this file.

Therefore, we got an error “EOF inside string”.

2. The solution to this error

Simply adding the option quoting=csv.QUOTE_NONE of pandas.read_csv(…).

For my case, the solution goes as:

1
df = pd.read_csv(f, delimiter='n', header=None, encoding='utf-8', quoting=csv.QUOTE_NONE)

3. Acknowledgment

  1. Discuss on GitHub issue
  2. Discuss on StackOverflow