python读取doc与docx

读取docx文档

读取doc文档

1、安装antiword

1
brew install antiword

2、使用antiword将doc转换为docx

1
2
3
4
5
6
7
8
9


import subprocess

word = 'test.doc'

output = subprocess.check_output(['antiword',word])

print(output)

3、安装并python-docx

1
sudo pip install python-docx

4、使用python-docx读取docx文件

1
2
3
4
5
6
7
8
9


import docx

doc = docx.Document('test.docx')

data = 'n'.join([paragraph.text for paragraph in doc.paragraphs])

#print(data)