Java实现抓取在线视频并提取视频语音为文本

一、背景

最近在做大模型相关的项目，其中有个模块需要提取在线视频语音为文本并输出给用户。作为一个纯后端Jave工程师，搞这个确实是初次尝试。

二、调研

基于上述功能模块，主要有三大任务：1、提取网页中的视频 2、视频转语音 3、语音转文本。

首先是第一项：尝试了jsoup，webmagic等工具，最终还得是 selenium（也是各种踩坑）才实现了想要的效果。

第二项：这个探索是相当费劲，首选开源库 FFmpeg，但是命令行安装一直失败。因此转向其他方案，尝试了 Xuggler、JAVE、JAVE2、JavaCV 等均以失败告终。最终决定还是用 FFmpeg 吧。经过不懈努力，终于是安装好了，直接官网下载本地解压即可。

第三项：团队大哥提供了一个技术方案： https://www.**funasr**.com。虽说是现成的方案但是实践起来也是费了一把力。

经过上述三步，理论上来说，整体流程总算是可以调通了。但是实际运行起来却不那么顺利，如：长视频转语音超时、语音转文本超时等等。但是经过不懈努力呢，总算是搞定了上述一系列问题，实现了想要的效果。具体实践方案如下：

三、实践

1、提取网页中的视频

a. 下载插件 chromedriver

建议从网页下载，需要与chrome浏览器版本适配，不然运行不起来。下载地址： https://chromedriver.storage.googleapis.com/index.html

b. 导入selenium的jar包

<dependency>

<groupId>org.seleniumhq.selenium</groupId>

<artifactId>selenium-java</artifactId>

<version>3.1.0</version>

</dependency>

c. 话不多说，直接上🐎：

    /**
     * 从指定网址获取主视频链接
     *
     * @param targetUrl 目标网址
     * @return 主视频链接，如果未找到则返回null
     */
    public static String catchMainVideo(String targetUrl) {
        // 加载驱动，后面的路径自己要选择正确，也可以放在本地
        System.setProperty("webdriver.chrome.driver", "xxx/driver/chromedriver");
        // ChromeOptions 可以注释 这里是阻止浏览器的打开
        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless");
        options.addArguments("--disable-gpu");

        // 初始化一个谷歌浏览器实例，实例名称叫driver
        WebDriver driver = new ChromeDriver(options);

        // get()打开一个站点
        driver.get(targetUrl);

        // 等待页面加载
        try {
            Thread.sleep(100);
        } catch (Exception e) {
            return null;
        }

        JavascriptExecutor js = CastUtil.convert(driver);

        List<WebElement> elements = CastUtil.convert(js.executeScript("return document.querySelectorAll('.sgVideoWrapper video source')"));

        // 处理返回的WebElement列表
        for (WebElement element : elements) {
            // 你可以获取元素的属性，例如src
            if ("video/mp4".equals(element.getAttribute("type"))) {
                return element.getAttribute("src");
            }
        }

        return null;
    }

2、视频转语音

a. 先下载 ffmpeg，建议也是网页下载，命令行下载失败了n次，升级xcode也不好使。最后还是从网页success：https://ffmpeg.org/download.html

b. 话不多说，直接上🐎

这里初次转换的时候打视频转语音没问题，但是在后续的语音转文本流程超时失败，所以最终决定视频转语音分段。

    /**
     * 将视频分割为音频文件
     *
     * @param inputVideoPath       输入视频文件的路径
     * @param outputAudioPrefix    输出音频文件的前缀
     * @param segmentSizeInSeconds 分段大小（以秒为单位）
     */
    public static void video2audio(String inputVideoPath, String outputAudioPrefix, int segmentSizeInSeconds) {
        try {
            ProcessBuilder pb = new ProcessBuilder("xxx/ffmpeg", "-i", inputVideoPath, "-vn", "-c:a", "copy", "-f", "segment", "-segment_time", String.valueOf(segmentSizeInSeconds), outputAudioPrefix + "%03d.aac");
            pb.inheritIO();
            Process process = pb.start();
            process.waitFor();
            log.info("Audio splitting completed.");
        } catch (Exception e) {
            log.error("video2audio error", e);
        }
    }

3、语音转文本

本部分实现参考了funasr，拿到离线代码之后解读简化，最后得到如下🐎，其中用到的wss地址需要自行部署，详见文档：

import com.google.common.collect.Maps;
import com.jd.store.common.util.JsonUtil;
import org.apache.commons.collections4.MapUtils;
import org.apache.commons.compress.utils.Lists;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.File;
import java.io.FileInputStream;
import java.net.URI;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;

public class FunasrWsClient extends WebSocketClient {

    private static final Logger log = LoggerFactory.getLogger(FunasrWsClient.class);

    String fileName;

    private String fileContent;

    public String getFileContent() {
        return fileContent;
    }

    public void setFileContent(String fileContent) {
        this.fileContent = fileContent;
    }

    public FunasrWsClient(URI serverURI, String fileName) {
        super(serverURI);
        this.fileName = fileName;
    }

    public void sendJson(String mode, String strChunkSize, int chunkInterval, String wavName, boolean isSpeaking, String suffix) {
        try {
            Map<String, Object> obj = Maps.newHashMap();
            obj.put("mode", mode);

            String[] chunkList = strChunkSize.split(",");
            List<Integer> array = Lists.newArrayList();
            for (String s : chunkList) {
                array.add(Integer.parseInt(s.trim()));
            }

            obj.put("chunk_size", array);
            obj.put("chunk_interval", chunkInterval);
            obj.put("wav_name", wavName);

//            if (FunasrWsClient.hotwords.trim().length() > 0) {
//                String regex = "\d+";
//                JSONObject jsonitems = new JSONObject();
//                String[] items = FunasrWsClient.hotwords.trim().split(" ");
//                Pattern pattern = Pattern.compile(regex);
//                StringBuilder tmpWords = new StringBuilder();
//                for (String item : items) {
//                    Matcher matcher = pattern.matcher(item);
//                    if (matcher.matches()) {
//                        jsonitems.put(tmpWords.toString().trim(), item.trim());
//                        tmpWords = new StringBuilder();
//                        continue;
//                    }
//                    tmpWords.append(item).append(" ");
//                }
//                obj.put("hotwords", jsonitems.toString());
//            }

//            if (suffix.equals("wav")) {
//                suffix = "mp3";
//            }
            obj.put("wav_format", suffix);
            if (isSpeaking) {
                obj.put("is_speaking", Boolean.TRUE);
            } else {
                obj.put("is_speaking", Boolean.FALSE);
            }
            log.info("sendJson: " + JsonUtil.toJsonString(obj));
            send(JsonUtil.toJsonString(obj));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    public void sendEof() {
        try {
            Map<String, Object> obj = Maps.newHashMap();

            obj.put("is_speaking", Boolean.FALSE);

            log.info("sendEof: " + JsonUtil.toJsonString(obj));
            send(JsonUtil.toJsonString(obj));
        } catch (Exception e) {
            log.error("sendEof", e);
        }
    }

    public void recWav() {
        String suffix = fileName.split("\.")[fileName.split("\.").length - 1];
        sendJson(mode, strChunkSize, chunkInterval, fileName, true, suffix);
        File file = new File(fileName);

        int chunkSize = sendChunkSize;
        byte[] bytes = new byte[chunkSize];

        int readSize;
        try (FileInputStream fis = new FileInputStream(file)) {
            if (fileName.endsWith(".wav")) {
                fis.read(bytes, 0, 44);
            }
            readSize = fis.read(bytes, 0, chunkSize);
            while (readSize > 0) {
                // send when it is chunk size
                if (readSize == chunkSize) {
                    send(bytes);
                } else {
                    // send when at last or not is chunk size
                    byte[] tmpBytes = new byte[readSize];
                    System.arraycopy(bytes, 0, tmpBytes, 0, readSize);
                    send(tmpBytes);
                }
                if (!mode.equals("offline")) {
                    Thread.sleep(chunkSize / 32);
                }

                readSize = fis.read(bytes, 0, chunkSize);
            }

//            if (!mode.equals("offline")) {
//                // if not offline, we send eof and wait for 3 seconds to close
//                Thread.sleep(2000);
//                sendEof();
//                Thread.sleep(3000);
//                close();
//            }
//
//            else {
            // if offline, just send eof
            sendEof();
//            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    @Override
    public void onOpen(ServerHandshake handshake) {
        this.recWav();
    }

    @Override
    public void onMessage(String message) {
        log.info("received: " + message);

        Map<String, Object> jsonObject = JsonUtil.parseMap(message);
        if (MapUtils.isEmpty(jsonObject)) {
            return;
        }
        log.info("text: " + jsonObject.get("text"));

        // 回传文件内容
        fileContent = jsonObject.get("text").toString();

        close();
    }

    @Override
    public void onClose(int code, String reason, boolean remote) {

    }

    @Override
    public void onError(Exception e) {
        log.error("onError ", e);
    }

    static String mode = "online";
    static String strChunkSize = "5,10,5";
    static int chunkInterval = 10;
    static int sendChunkSize = 1920;

    public static String execute(String fileName) {
        try {
            String wsAddress = "wss://xxx";

            FunasrWsClient c = new FunasrWsClient(new URI(wsAddress), fileName);

            c.connect();

            TimeUnit.SECONDS.sleep(5);
            return c.fileContent;
        } catch (Exception e) {
            log.error("execute error", e);
        }
        return null;
    }

}