python 檔案讀寫

CW Lin

8 min readSep 10, 2019

檔案讀寫是很基本又很常用的功能，但由於我每次要用時都還是會估狗一下，因此還是寫下來讓自己記得&好查。

讀寫 txt 檔:

使用 f = open('檔案','模式') 開啟文件
常用的模式有以下幾種:

r : 讀取
w: 寫入 (若檔案存在則清空取代)
a: 在既有檔案最後面寫入
r+: 讀取資料並且可由開頭寫入(會取代掉開頭原文字)
w+:同w但同時也可讀取
a+: 同a 但同時也可讀取

讀寫內容:

f.read(size): 將文字讀取成string(若換行會包含\n)，其中size為要讀取長度，若不填則讀取全部。
f.readline(): 讀取當下游標後的一行文字
f.readlines(): 讀取多行，傳回list, element 為每一行的string(最後包含\n)
f.write(string): 寫入
f.seek(位移bit數): 控制游標位置，0:文件最前端, 1:目前游標位置,2:文件最末端

#example1:
string = ‘由python 寫入文字’
f = open(“txt test.txt”,’w+’)
f.write(string)
f.seek(0)
txt=f.read()
f.close()
print(txt)
#example2:
with open(file,'r') as f:
    content_list = f.read().splitlines()#splitlines() 按照行('\r', '\r\n', \n')分隔，返回一个包含各行作为元素的列表

讀寫csv檔:

一般都是使用pandas 操作，pd.read_csv() 讀成DataFrame，df.to_csv() 存出

import pandas as pd 
df = pd.read_csv(‘Book1.csv’) 
c = df[‘remark’].tolist()
c[0]=’some revised’
df[‘remark’] = c
 
df.to_csv(‘Book1.csv’,index=0) #不保存index行

讀寫json檔:

json( JavaScript Object Notation)，當要與其他程式溝通時常會使用的共同格式。主要有物件object (用大括號{})和陣列array (用中括號[])兩種。

jason 範例: '{“subject”:”Math”,”score”:80}'，其實跟python的dict 很類似，以下為python format 與 json format 對照:

當你要把 python的結果用json的格式寫出時有兩種方法，一種是自己串出 json format 的字串，如 '{“subject”:”Math”,”score”:80}'，另一種是使用 python的套件 json中的dumps() 將 dictionary 轉換成 json form。
因 python 的變數格式有非常多種，為確保能輸出標準 json format 建議使用dumps()，若沒撞 error 就是你的格式正確啦~

json 套件主要會用到以下4種方法:

當你在dumps 你的dict to json 時可能會遇到些error，由於你的dict裡存著一些不是json格式的東東，比如numpy 的格式。參考這裡，寫一個encoder 來做格式轉換就可以解決了。

#example
import json
import numpy as npclass MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(MyEncoder, self).default(obj)dic = {'name':['henry','peter'],'score':[78,87],'remark':['handsome',None]}## dict to json
json_form = json.dumps(dic,cls = MyEncoder)
print(json_form) #{"name": ["henry", "peter"], "score": [78, 87], "remark": ["handsome", null]}
print(type(json_form)) #<class 'str'>## json to dict
new_dict = json.loads(json_form)
print(new_dict) #{'name': ['henry', 'peter'], 'score': [78, 87], 'remark': ['handsome', None]}
print(type(new_dict)) #<class 'dict'>#檔案讀寫
#write
with open('test.json', 'w', encoding='utf-8') as f:
    json.dump(dic, f, ensure_ascii=False, indent=4,sort_keys=True)#read
with open('test.json') as f:
    json_from_file = json.load(f)print(json_from_file) 
print(type(json_from_file)) #<class 'dict'>

讀寫 xml 檔:

若你有在玩物件辨識，一定對VOC format的檔案(.xml)不陌生!
python 中有很多xml相關的package，我個人是使用xml.etree.ElementTree，據說較輕量且效率高。而我使用目的只是為了數每個類別各標計了多少個，所以對它只有粗淺的了解XD 若以後有用到其他功能再學起來補充~

#get 根節點:
import xml.etree.ElementTree as ETtree = ET.parse(in_file)
root = tree.getroot()

使用 root.iter(‘insterest_name’) 找出有興趣的element，每個element包含幾種方法:

tag: string，表示資料代表的種類
attrib: dictionary，表示屬性
text: string，表示element的內容
tail：string，表示element閉合之後的尾跡
find(match): 在第一層子元素中查詢第一個匹配的元素，返回匹配的元素或None
findall(match): 在第一層子元素，按照子元素順序以 list 形式返回所有匹配的元素。
get(match): 返回元素名字為match的屬性，如果沒有找到返回None

直接看範例: (爬取voc xml 並count 各類別被label多少個)

import xml.etree.ElementTree as ET
import osclasses = ['class1','class2','class3']
count_list = [0,0,0]folder_path = 'Annotations'
xml_file_list = os.listdir(folder_path)for image_id in xml_file_list:
    in_file = open(os.path.join(folder_path,image_id))
    tree = ET.parse(in_file)
    root = tree.getroot()
    
    for obj in root.iter('object'):
        cls = obj.find('name').text
        if cls not in classes:
            continue
        count_list[classes.index(cls)]+=1

另外附上keras yolov3 裡轉換VOC format to 他要的txt 格式的 code來參考:

https://github.com/qqwweee/keras-yolo3/blob/master/voc_annotation.py

python 檔案讀寫

讀寫 txt 檔:

讀寫csv檔:

讀寫json檔:

讀寫 xml 檔:

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by CW Lin

No responses yet