首页 > 其他 > 详细

使用tesseract-ocr读取图片文字(转)

时间:2021-09-06 06:38:41      阅读:17      评论:0      收藏:0      [点我收藏+]

1、下载安装tesseract:

  https://digi.bib.uni-mannheim.de/tesseract/

2、配置环境变量:

  在path变量中加入tesseract-ocr的安装路径

3、使用tesseract指令,测试安装是否成功

4、使用命令行:

  1.tesseract + 图片路径 + 保存结果名 + -l 语言集

  示列: tesseract 1606150081.png 1606150081 -l chi_sim

  2.tesseract + 图片路径 +stdout -l +语言集

  示列: tesseract D:\company\ruigushop\spring-2s\test.png stdout -l chi_sim

5、Java代码:

  

package com.lbh.web.controller;

/*
 * Copyright@lbhbinhao@163.com
 * Author:liubinhao
 * Date:2020/11/23
 * ++++ ______ @author       liubinhao   ______             ______
 * +++/     /|                         /     /|           /     /|
 * +/_____/  |                       /_____/  |         /_____/  |
 * |     |   |                      |     |   |        |     |   |
 * |     |   |                      |     |   |________|     |   |
 * |     |   |                      |     |  /         |     |   |
 * |     |   |                      |     |/___________|     |   |
 * |     |   |___________________   |     |____________|     |   |
 * |     |  /                  / |  |     |   |        |     |   |
 * |     |/ _________________/  /   |     |  /         |     |  /
 * |_________________________|/b    |_____|/           |_____|/
 */
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import org.springframework.web.multipart.MultipartFile;

import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStreamReader;

@RestController
public class LiteralExtractController {

    @PostMapping("/image/extract")
    public String reg(@RequestParam("file")MultipartFile file) throws IOException {
        String result = "";
        String filename = file.getOriginalFilename();
        File save = new File(System.getProperty("user.dir")+"\\"+filename);
        if (!save.exists()){
            save.createNewFile();
        }
        file.transferTo(save);
        String cmd = String.format("tesseract %s stdout -l %s",System.getProperty("user.dir")+"\\"+filename,"chi_sim");
        result = cmd(cmd);
        return result;
    }

    public static String cmd(String cmd) {
        BufferedReader br = null;
        try {
            Process p = Runtime.getRuntime().exec(cmd);
            br = new BufferedReader(new InputStreamReader(p.getInputStream()));
            String line = null;
            StringBuilder sb = new StringBuilder();
            while ((line = br.readLine()) != null) {
                sb.append(line + "\n");
            }
            return sb.toString();
        } catch (Exception e) {
            e.printStackTrace();
        }
        finally
        {
            if (br != null)
            {
                try {
                    br.close();
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        }
        return null;
    }
}

转自:https://mp.weixin.qq.com/s/CvDF_AyxyOZftQvpub1A1Q

使用tesseract-ocr读取图片文字(转)

原文:https://www.cnblogs.com/BobXie85/p/15227290.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!