zip4j 解压中文乱码问题解决

时间：2020-03-23 20:14:52 阅读：777 评论：0 收藏：0 [点我收藏+]

在使用zip4j解压上传的zip文件时，总会遇到解压后的文件名中文乱码，刚开始是使用判断字符

File zipFile = new File(zip);
ZipFile zFile = new ZipFile(zipFile);
zFile.setFileNameCharset(getEncoding(zip));
if (!zFile.isValidZipFile()) {
    throw new ZipException("压缩文件不合法,可能被损坏.");
}




/**
     * 判断该使用哪种编码方式解压
     * @param path
     * @return
     * @throws Exception
     */
    private static String getEncoding(String path) throws Exception {
        String encoding = "GBK";
        ZipFile zipFile = new ZipFile(path);
        zipFile.setFileNameCharset(encoding);
        List<FileHeader> list = zipFile.getFileHeaders();
        for (int i = 0; i < list.size(); i++) {
            FileHeader fileHeader = list.get(i);
            String fileName = fileHeader.getFileName();
            if (isMessyCode(fileName)) {
                encoding = "UTF-8";
                break;
            }
        }
        return encoding;
    }

    private static boolean isMessyCode(String str) {
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            // 当从Unicode编码向某个字符集转换时，如果在该字符集中没有对应的编码，则得到0x3f（即问号字符?）
            // 从其他字符集向Unicode编码转换时，如果这个二进制数在该字符集中没有标识任何的字符，则得到的结果是0xfffd
            if ((int) c == 0xfffd) {
                // 存在乱码
                return true;
            }
        }
        return false;
    }

但是这种方式刚开始可以，后来就不行了，不知道为啥，今天终于找了个彻底的解决方法：

转载自：https://www.jianshu.com/p/5594952e43f7

public static File[] unzip(String zip, String dest, String passwd) throws Exception {
        File zipFile = new File(zip);
        ZipFile zFile = new ZipFile(zipFile);
        zFile.setFileNameCharset(StandardCharsets.UTF_8.name());
        if (!zFile.isValidZipFile()) {
            throw new ZipException("压缩文件不合法,可能被损坏.");
        }
        File destDir = new File(dest);
        if (destDir.isDirectory() && !destDir.exists()) {
            destDir.mkdir();
        }
        if (zFile.isEncrypted()) {
            zFile.setPassword(passwd.toCharArray());
        }
        zFile.extractAll(dest);

        List<FileHeader> headerList = zFile.getFileHeaders();
        List<File> extractedFileList = new ArrayList<>();
        for (FileHeader fileHeader : headerList) {
            if (!fileHeader.isDirectory()) {
                extractedFileList.add(new File(destDir, getFileNameFromExtraData(fileHeader)));
            }
        }
        File[] extractedFiles = new File[extractedFileList.size()];
        extractedFileList.toArray(extractedFiles);
        return extractedFiles;
    }


public static String getFileNameFromExtraData(FileHeader fileHeader) {
        List<ExtraDataRecord> extraDataRecords = fileHeader.getExtraDataRecords();

        if (!CollectionUtil.isEmpty(extraDataRecords)) {
            for (ExtraDataRecord extraDataRecord : extraDataRecords) {
                long identifier = extraDataRecord.getHeader();
                if (identifier == 0x7075) {
                    byte[] bytes = extraDataRecord.getData();
                    ByteBuffer buffer = ByteBuffer.wrap(bytes);
                    byte version = buffer.get();
                    assert (version == 1);
                    return new String(bytes, 5, buffer.remaining(), StandardCharsets.UTF_8);
                }
            }
        }
        return fileHeader.getFileName();
    }

通过阅读ZIP的协议文档，我们可以发现，Info-ZIP Unicode Path Extra Field (0x7075)
这个额外信息可以解决我们的问题,据笔者测试，WinRAR和百度压缩等使用GBK作为文件编码的压缩软件，
在这个区域会记录文件名的UTF-8编码的名称，但是因为这个字段不是必要字段，文件名使用UTF-8编码的
MacOS归档、Deepin归档等软件不会填充这个信息。
要学习的太多了～。

zip4j 解压中文乱码问题解决

原文：https://www.cnblogs.com/bfyq/p/12554239.html

踩

(1)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)