Python：逐个读取文件夹中的Csv文件，并将输出保存为png文件。

Question

11 浏览2023年3月26日

匿名的 2023年3月27日

0 Comments

我有一个包含50个csv文件的文件夹，比如countrieslist1.csv、countrieslist2.csv、countrieslist3.csv等等。我有一段代码可以使用pandas从csv文件中读取值，并根据数据绘制所需的图表。我想要的是，我的代码应该先读取第一个csv文件，进行绘图，然后将其保存为png文件，然后再读取第二个csv文件，以此类推，这样最终我应该有50个png文件（每个csv文件一个）。

我尝试了以下代码：

import pandas as pd
import os
import matplotlib.pyplot as plt
folder_path = "C:/Users/xyz/Desktop/Countrieslist"
df=pd.read_csv(folder_path)
X=df.'columnname'.value_counts.(normalize=True).head(5)
X.plot.barh()
plt.ylabel()
plt.xlabel()
plt.title()
plt.savefig(folder_path[:-3]+'png')

这可以输出结果，但只适用于单个csv文件。但我想要的是一段代码，可以依次处理所有csv文件，进行绘图并保存为png文件。我该如何做到这一点？

0

3 答案

匿名的 · Answer 1 · 2023-04-25T13:10:55+00:00

import glob
for file in glob.glob(folder + '/*.csv'):
    df=pd.read_csv(file)
    # continue with other code here

The reason for this problem is that the user wants to read multiple CSV files in a folder and save the output as PNG files. The user has already imported the necessary module "os" to work with file and directory paths. To achieve this, the user needs to iterate over the contents of the directory and read each CSV file one by one. However, the user wants to skip any file that is not a CSV file.

To solve this problem, the user can use the "listdir" function from the "os" module to get the list of files in the specified folder. Then, the user can iterate over each file and check if it ends with the extension ".csv" using the "endswith" method. If it doesn't, the iteration continues to the next file. If it does, the user can use the "pd.read_csv" function from the pandas library to read the CSV file into a DataFrame. After that, the user can continue with the rest of the code to process the data and save the output as PNG files.

Alternatively, the user can simplify the process by using the "glob" module. By using the "glob.glob" function with the pattern "folder/*.csv", the user can directly get a list of file paths that match the specified pattern. This eliminates the need for checking file extensions and filtering out non-CSV files manually. The rest of the code remains the same, with the user reading each CSV file into a DataFrame and continuing with the data processing and PNG file saving.

Overall, the problem arises from the need to read multiple CSV files in a folder and save the output as PNG files. The user can solve this problem by using either the "listdir" function from the "os" module or the "glob.glob" function from the "glob" module to iterate over the files in the folder and filter out non-CSV files.

匿名的 · Answer 2 · 2023-04-21T11:59:39+00:00

问题的出现原因是由于文件名的处理不正确导致输出的文件名不符合要求。解决方法是在循环中设置工作目录为包含.csv文件的文件夹，然后使用正确的文件名保存输出的.png文件。

文章内容如下：

Python：逐个读取文件夹中的.csv文件并将输出保存为.png文件

首先，我们需要获取文件夹中的.csv文件：

import glob, os
csv_files = []
os.chdir("C:/Users/xyz/Desktop/Countrieslist")
for file in glob.glob("*.csv"):
    csv_files.append(file)

接下来，在一个循环中进行操作：

import pandas as pd
import matplotlib.pyplot as plt
for file in csv_files:
    df = pd.read_csv(file)
    X = df['columnname'].value_counts(normalize=True).head(5)
    X.plot.barh()
    plt.ylabel()
    plt.xlabel()
    plt.title()
    plt.savefig(file + '.png')

为什么要使用chdir？输出的文件名将以.csvpng结尾，这是有效的，但几乎肯定不是所需的文件名。这是我的错误，已经进行了更新。

你有注意到这一行代码中的语法错误吗：X=df.'columnname'.value_counts.(normalize=True).head(5)？只是从原始帖子中复制粘贴的。这是我的错误，我以为你提供的是一个答案。

匿名的 · Answer 3 · 2023-08-05T13:17:05+00:00

问题的出现原因：

这个问题的出现是因为需要读取一个文件夹中的多个CSV文件，并将每个文件的数据进行处理，并将处理结果保存为PNG文件。在给出的代码中，使用了pandas库来读取CSV文件，并使用matplotlib库来生成图表并保存为PNG文件。

解决方法：

为了解决这个问题，可以使用以下代码：

import pandas as pd
import pathlib
import matplotlib.pyplot as plt
folder_path = pathlib.Path("C:/Users/xyz/Desktop/Countrieslist")
def create_image(filename, columnname):
    df = pd.read_csv(filename)
    ax = (df[columnname].value_counts(normalize=True).head(5)
                        .plot.bar(ylabel='Count', xlabel='Country',
                                  title='Value counts',
                                  legend=False, rot=0))
    plt.savefig(folder_path / f'{filename.stem}.png')
for filename in folder_path.glob('*.csv'):
    create_image(filename, 'Country')

这段代码首先导入了需要使用的库，然后使用`pathlib`库来指定文件夹路径，然后定义了一个名为`create_image`的函数，该函数用于读取CSV文件并生成图表并保存为PNG文件。在循环中，通过使用`glob`方法获取文件夹中的所有CSV文件，并调用`create_image`函数来处理每个文件。

输入数据：

REGIONS = ['AL', 'AT', 'BE', 'BG', 'CH', 'CZ', 'DE', 'DK',
           'EE', 'ES', 'FI', 'FR', 'GR', 'HR', 'HU', 'IE',
           'IT', 'LT', 'LU', 'LV', 'ME', 'NL', 'NO', 'PL',
           'PT', 'RO', 'RS', 'SE', 'SI', 'SK', 'UK']
for i in range(1, 10):
    df = pd.DataFrame({'Country': np.random.choice(REGIONS, 200)})
    df.to_csv(f'Countrieslist/countrieslist{i}.csv', index=False)

这段代码用于生成输入的CSV文件。它首先定义了一个包含多个国家代码的列表`REGIONS`，然后使用`np.random.choice`方法从`REGIONS`列表中随机选择200个国家代码，并将其保存为一个名为`countrieslist{i}.csv`的CSV文件，其中`{i}`表示循环变量的值。

在给出的代码中，还提到了一个关于`plt.close()`方法的问题。这是因为在代码中使用了函数来处理每个CSV文件，所以在每次循环结束时会自动关闭图表。通常，如果直接在循环中创建图表，就不需要手动关闭图表。如果不关闭图表，matplotlib会引发警告。