最佳方法查找文件中是否存在一个字符串

Question

9 浏览2023年4月22日

匿名的 2023年4月22日

0 Comments

有一个包含成千上万行的文本文件。寻找某个特定字符串在文件中是否存在的最佳方法是什么？

可以将整个文件读入字符串，然后使用string.contains方法进行查找；或者使用Files.readAllLines方法创建一个包含所有行的列表，然后循环遍历列表中的每一行，检查该行是否包含所需的字符串。

更新：我正在使用Java 7。每个文件（10个文件）对应1-2个字符串搜索。要搜索的字符串在每个文件中都会发生变化。如果找到了字符串，我希望停止搜索。

0

3 答案

匿名的 · Answer 1 · 2023-06-21T10:43:10+00:00

在使用Java 8且文件大小庞大的情况下，最佳的方法是利用Streams API来判断一个字符串是否存在于文件中。有两种情况：一种是当你找到包含要搜索的字符串stringToSearch的行时，你想要返回该行；另一种是你想要遍历所有行，寻找stringToSearch。示例代码如下：

String fileName = "c://SomeFile.txt";
String stringToSearch = "dummy";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
     // 查找第一个匹配的行
     Optional<String> lineHavingTarget = stream.filter(l -> l.contains(stringToSearch)).findFirst();
     // 遍历所有匹配的行
     stream.filter(l -> l.contains(stringToSearch)).forEach(System.out::println);
     // 进行其他操作
    } catch (IOException e) {
         // 记录异常
    }

因此，读取文件的所有行似乎不是一个好主意。最好逐行读取。如果你对最快的字符串搜索算法感兴趣，请查看此链接。

匿名的 · Answer 2 · 2023-06-18T17:42:12+00:00

在处理包含大量行的文件时，最好逐行读取文件，而不是将整个内容读入程序内存。因此，基本上，读取一行，检查字符串是否存在，然后继续处理下一行。

To implement this, we can use the `open()` function in Python to open the file and read it line by line. We can then use the `in` operator to check if the desired string is present in each line.

def is_string_present(file_path, search_string):
    with open(file_path, 'r') as file:
        for line in file:
            if search_string in line:
                return True
    return False

This function takes the file path and the search string as input parameters. It opens the file using the `open()` function in read mode ('r'). Then, it iterates over each line in the file and checks if the search string is present in that line using the `in` operator. If it finds a match, it returns True. If it reaches the end of the file without finding a match, it returns False.

This approach is more memory-efficient as it reads the file line by line instead of loading the entire content into memory. It is also time-efficient as it stops as soon as it finds a match, instead of checking the entire file.

In conclusion, when dealing with large files, it is advisable to read the file line by line and check for the presence of a specific string using the `in` operator. This approach is more optimal in terms of memory usage and time efficiency.

匿名的 · Answer 3 · 2023-05-13T08:43:49+00:00

问题：如何在文件中查找字符串的最佳方法？

在处理文件时，将文件的每一行存储在一个列表中没有太多的好处。尽管您提出的两种方法都存在相同的问题。

如果您只关心文件中的特定行，您可能不希望在内存中保存不需要的行。如果您使用的是Java 8，可以使用Files.lines()以流的方式逐行读取文件。否则，guava的LineProcessor也可以做到这一点。

下面的示例使用流来查找所有与字符串匹配的行，并将它们存储在一个列表中。

List lines = Files.lines(path)
            // findFirst()可以用来获取第一个匹配并停止。
            .filter(line -> line.contains("foo"))
            .collect(Collectors.toList());

下面的示例使用guava来实现相同的功能。

import com.google.common.io.Files;
import com.google.common.io.LineProcessor;
List lines = Files.readLines(file, new LineProcessor>() {
    private List lines = new ArrayList<>();
    public boolean processLine(String line) throws IOException {
        if (line.contains("foo"))
            lines.add(line);
        return true; // 返回false以停止
    }
    public List getResult() {
        return lines;
    }
});