中文分词模块jieba学习笔记


  • 1、cut函数

    1
    2
    3
    4
    import jieba
    sentence="我是长沙理工大学学生"
    generator=jieba.cut(sentence)
    words=" ".join(generator)
  • 2、词性标注

    1
    2
    3
    4
    5
    import jieba.posseg as jp
    sentence="我是长沙理工大学学生"
    posseg=jp.cut(sentence)
    for i in posseg:
    print(i.__dict__)
  • 3、词典添词和删词

    1
    2
    3
    4
    5
    6
    import jieba
    sentence="天长地久有时尽"
    jieba.add_word("时尽",999,"nz")
    print(jieba.cut(sentence))
    jieba.del_word("时尽")
    print(jieba.cut(sentence))
  • 4、加载自定义词典

    1
    2
    3
    4
    5
    6
    7
    8
    9
    import jieba
    my_dict="my_dict.txt"
    with open(my_dict,"w",encoding="utf-8") as f:
    f.write("莫容紫英 9 nrn云天河 9 nrn天河剑 9 nz")
    dentence="莫容紫英为云天河打造了云天剑"
    pprint("加载前:",jieba.lcut(sentence))
    jieba.load_userdict(my_dict)
    print("加载后:",jieba.lcut(sentence))
    os.remove(my_dict)