以第1、2列作为坐标、第3列作为值将CSV文件读取为一个网格。
以第1、2列作为坐标、第3列作为值将CSV文件读取为一个网格。
大家好,我对Python还很新,正在学习中。我正在尝试读取一个有3列的CSV文件,前两列是坐标,第三列是数值。下面是CSV文件内容的示例。\n我需要按照以下方式读取它:\n(322000.235 582999.865 149.309 ) (322000.485 582999.865 149.249 ) (322000.735 582999.865 149.193 ) (322000.985 582999.865 149.156 )\n(322000.235 582999.615 149.29 ) (322000.485 582999.615 149.217 ) (322000.735 582999.615 149.159 ) (322000.985 582999.615 149.128 )\n(322000.235 582999.365 149.276 ) (322000.485 582999.365 149.224 ) (322000.735 582999.365 149.179 ) (322000.985 582999.365 149.16 )\n...\n我写了一段代码,通过比较第三列的值与相邻值使用.shift(-1)和.shift(1),这样做可以实现目标,但是我获得了很多不必要的数据。实际上,我想检查的不仅仅是相邻的值,而是与相邻值形成的网格进行比较,这在大多数情况下需要进行4次检查。如链接的示例所示,红色是要与所有相邻的蓝色标记进行比较的值。\n这是我目前的代码,希望这些信息足够清楚,不知道能否修改它还是应该重新开始。希望有人能帮忙。\n
from __future__ import print_function import pandas as pd import os import re Dir = os.getcwd() Blks = [] CSV = [] for f in os.listdir(Dir): if re.search('.txt', f): Blks = [each for each in os.listdir(Dir) if each.endswith('.txt')] print(Blks) for f in os.listdir(Dir): if re.search('.csv', f): CSV = [each for each in os.listdir(Dir) if each.endswith('.csv')] print(CSV) limit = 3 tries = 0 while True: print("----------------------------------------------------") spikewell = float(raw_input("Please Enter Parameters: ")) tries += 1 if tries == 4: print("----------------------------------------------------") print("Entered incorrectly too many times.....Exiting") print("----------------------------------------------------") break else: if spikewell > 50: print("Parameters past limit (20)") print("----------------------------------------------------") print(tries) continue elif spikewell < 0: print("Parameters can't be negative") print("----------------------------------------------------") print(tries) continue else: spikewell print("Parameters are set") print(spikewell) print("Searching files") print("----------------------------------------------------") for z in Blks: df = pd.read_csv(z, sep=r'\s+', names=['X', 'Y', 'Z']) z = sum(df['Z']) average = z / len(df['Z']) for terrain in Blks: for df in terrain: df = pd.read_csv(terrain, sep=r'\s+', names=['X', 'Y', 'Z']) spike_zleft = df['Z'] - df['Z'].shift(1) spike_zright = df['Z'] - df['Z'].shift(-1) wzdown = -(df['Z'] - df['Z'].shift(-1)) wzup_abs = abs(df['Z'] - df['Z'].shift(1)) wzdown_abs = abs(wzdown) spikecsv = ('spikes.csv') wellcsv = ('wells.csv') spikes_search = df.loc[(spike_zleft > spikewell) & (spike_zright > spikewell)] with open(spikecsv, 'a') as f: spikes_search[['X', 'Y', 'Z']].to_csv(f, sep='\t', index=False) well_search = df.loc[(wzup_abs > spikewell) & (wzdown > spikewell)] with open(wellcsv, 'a') as f: well_search[['X', 'Y', 'Z']].to_csv(f, sep='\t', index=False) print("----------------------------------------------------") print('Search completed') if len(spikes_search) == 0: print("0 SPIKES FOUND") elif len(spikes_search) > 0: print(terrain) print(str(len(spikes_search)) + " SPIKES FOUND") if len(well_search) == 0: print("0 WELLS FOUND") elif len(well_search) > 0: print(str(len(well_search)) + " WELLS FOUND") break break
问题原因:
1. 提供的脚本没有明确描述正在处理数据的操作,导致理解困难。
2. 没有使用csv模块来读取CSV文件,导致代码冗长且容易出错。
解决方法:
1. 使用csv模块来读取CSV文件,示例代码如下:
import csv with open('FILENAME','r') as f: data = [] readr = csv.reader(f) for line in readr: data.append([float(i) for i in line])
2. 如果要进行数值计算,建议使用numpy模块。该模块已经提供了许多功能,可能已经有适合您需求的函数。具体可以参考numpy官方文档(http://www.numpy.org/)。
3. 一旦使用了numpy数组,可以参考其他人针对相同问题的解决方案,例如寻找局部极值点的问题,可以参考以下链接:
- [Find all local Maxima and Minima when x and y values are given as numpy arrays](https://stackoverflow.com/questions/31070563)
- [Get coordinates of local maxima in 2D array above certain value](https://stackoverflow.com/questions/9111711)
最后,提到了数据是地形数据、GIS和坐标系统相关的,目标是找到高度超过5米的尖峰。对于代码的整体性,原作者表示抱歉,并表示已经注意到了提供的链接中的代码。他将查看附加的链接并尝试解决问题。