更快读取Excel文件到pandas dataframe的方法
更快读取Excel文件到pandas dataframe的方法
我有一个包含五个工作表的14MB的Excel文件,我正在将其读入Pandas数据帧中,虽然下面的代码可以工作,但需要9分钟!
有没有人有加速它的建议?
import pandas as pd def OTT_read(xl,site_name): df = pd.read_excel(xl.io,site_name,skiprows=2,parse_dates=0,index_col=0, usecols=[0,1,2],header=None, names=['date_time','%s_depth'%site_name,'%s_temp'%site_name]) return df def make_OTT_df(FILEDIR,OTT_FILE): xl = pd.ExcelFile(FILEDIR + OTT_FILE) site_names = xl.sheet_names df_list = [OTT_read(xl,site_name) for site_name in site_names] return site_names,df_list FILEDIR='c:/downloads/' OTT_FILE='OTT_Data_All_stations.xlsx' site_names_OTT,df_list_OTT = make_OTT_df(FILEDIR,OTT_FILE)
admin 更改状态以发布 2023年5月24日
我使用了xlsx2csv来将Excel文件虚拟转换成内存中的CSV文件,这有助于将读取时间缩短约一半。
from xlsx2csv import Xlsx2csv from io import StringIO import pandas as pd def read_excel(path: str, sheet_name: str) -> pd.DataFrame: buffer = StringIO() Xlsx2csv(path, outputencoding="utf-8", sheet_name=sheet_name).convert(buffer) buffer.seek(0) df = pd.read_csv(buffer) return df
正如其他人所建议的,csv读取速度更快。因此,如果您在Windows上拥有Excel,则可以调用vbscript将Excel转换为csv,然后读取csv。我尝试了下面的脚本,它大约需要30秒。
# create a list with sheet numbers you want to process sheets = map(str,range(1,6)) # convert each sheet to csv and then read it using read_csv df={} from subprocess import call excel='C:\\Users\\rsignell\\OTT_Data_All_stations.xlsx' for sheet in sheets: csv = 'C:\\Users\\rsignell\\test' + sheet + '.csv' call(['cscript.exe', 'C:\\Users\\rsignell\\ExcelToCsv.vbs', excel, csv, sheet]) df[sheet]=pd.read_csv(csv)
这里是一个小的Python片段,用于创建ExcelToCsv.vbs脚本:
#write vbscript to file vbscript="""if WScript.Arguments.Count < 3 Then WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv" Wscript.Quit End If csv_format = 6 Set objFSO = CreateObject("Scripting.FileSystemObject") src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0)) dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1)) worksheet_number = CInt(WScript.Arguments.Item(2)) Dim oExcel Set oExcel = CreateObject("Excel.Application") Dim oBook Set oBook = oExcel.Workbooks.Open(src_file) oBook.Worksheets(worksheet_number).Activate oBook.SaveAs dest_file, csv_format oBook.Close False oExcel.Quit """; f = open('ExcelToCsv.vbs','w') f.write(vbscript.encode('utf-8')) f.close()