连接文本文件会创建一个每个字母之间有空格的文件。

Question

14 浏览2023年6月23日

匿名的 2023年6月24日

0 Comments

我试图拼接txt文件，几乎一切顺利，但输出文件中每个字母之间有一个空格，例如l o r e m i p s u m。以下是我的代码：

import glob
all = open("all.txt","a");
for f in glob.glob("*.txt"):
    print f
    t = open(f, "r")
    all.write(t.read())
    t.close()
all.close()

我正在使用Windows 7和Python 2.7。

编辑：

也许有更好的方法来拼接文件吗？

编辑2：

我现在遇到了解码问题：

Traceback (most recent call last):
  File "P:\bwiki\BWiki\MobileNotes\export\999.py", line 9, in 
    all.write( t.read())
  File "C:\Python27\lib\codecs.py", line 671, in read
    return self.reader.read(size)
  File "C:\Python27\lib\codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 18: invalid
continuation byte
import codecs
import glob
all =codecs.open("all.txt", "a", encoding="utf-8")
for f in glob.glob("*.txt"):
    print f
    t = codecs.open(f, "r", encoding="utf-8")
    all.write( t.read())

0

3 答案

匿名的 · Answer 1 · 2023-08-24T14:45:25+00:00

问题的原因是文件使用了utf-16编码，导致在文本文件连接时每个字母之间都有空格。

解决方法是如果所有文件都使用相同的字符编码，则可以使用cat命令将文件按字节复制。以下是相应于Python代码的PowerShell命令：

PS C:\> Get-Content *.txt | Add-Content all.txt

与`cat *.txt >> all.txt`不同，这种方法不会破坏字符编码。

如果使用二进制文件模式，您的代码应该可以工作：

from glob import glob
from shutil import copyfileobj
with open('all.txt', 'ab') as output_file:
    for filename in glob("*.txt"):
        with open(filename, 'rb') as file:
            copyfileobj(file, output_file)

再次强调，所有文件应该具有相同的字符编码，否则输出结果可能会包含垃圾（混合内容）。

匿名的 · Answer 2 · 2023-09-12T17:47:03+00:00

问题原因：输入文件的编码格式为UTF编码，但是你使用ASCII进行读取，这导致空格出现（反映了空字节）。解决方法：尝试使用codecs模块进行编码转换。

为了解决这个问题，可以使用下面的代码来导入codecs模块，并对文件进行编码转换：

import codecs
...
for f in glob.glob("*.txt"):
    print f
    t = codecs.open(f, "r", encoding="utf-16")

以上代码中，我们首先导入了codecs模块。然后使用glob.glob函数来获取当前目录下所有的txt文件，并进行遍历。在遍历过程中，我们使用codecs.open函数来打开文件，并指定编码格式为utf-16。这样，我们就可以正确地读取并处理UTF编码的文件，避免了空格出现的问题。

匿名的 · Answer 3 · 2023-09-11T14:46:19+00:00

问题出现的原因是在将文本文件连接起来时，每个字母之间都有一个空格。

解决方法是修改代码中的hexdump函数，将连接字母的空格去除。

以下是解决方法的具体步骤：

1. 找到代码中的hexdump函数。

2. 修改函数中的return语句，去除连接字母的空格。

3. 重新运行代码，检查输出结果。

修改后的代码如下：

import glob
import sys
def hexdump(s):
    return "".join("{:02x}".format(ord(c)) for c in s)
l = 0
for f in glob.glob("*.txt"):
    l = max(l, len(f))
for f in glob.glob("*.txt"):
    with open(f, "rb") as fp:
       sys.stdout.write("{0:<{1}}  {2}\n".format(f, l, hexdump(fp.read(16))))

重新运行代码后，输出结果将不再有每个字母之间的空格。