Skip to content

大文件上传

大文件上传是前端开发中常见的一个需求,特别是在处理视频、音频文件或者大型文档时。传统的表单上传方式在面对大文件时会遇到诸多问题,如浏览器内存占用过高、上传速度慢等。为了解决这些问题,可以采用分片的方法进行上传

这里有几个关键的步骤和概念:
计算整个文件的hash值,用于校验文件的一致性
分片上传: 将大文件分割成多个小块,逐个上传这些块(可同时并行上传多个块以提升效率)
秒传: 通过计算文件的hash值,如果服务器上已经存在相同的文件,则直接返回上传成功的消息
断点续传: 通过后端的接口来记录上传的进度,如果上传中断可以从中断的位置继续上传

实现步骤

1. 计算文件的hash值并进行切片

js
const buildFileDigestAndChunks = async (file: File, chunkSize: number, signal?: AbortSignal) => {
  const fileSpark = new SparkMD5.ArrayBuffer();
  const chunks: UploadChunk[] = [];
  let offset = 0;
  let index = 0;
  while (offset < file.size) {
    throwIfUploadAborted(signal);
    const blob = file.slice(offset, offset + chunkSize);
    const buffer = await readBlobAsArrayBuffer(blob);
    fileSpark.append(buffer);
    chunks.push({ blob, index, size: blob.size });
    offset += chunkSize;
    index++;
  }
  return { fileHash: fileSpark.end(), chunks };
};

2. 上传文件块并在上传完毕之后进行合并

js
const checkUploaded = async (
  enterpriseId: string,
  fileHash: string,
  file: File,
  uploadPath: string,
  chunkSize: number,
  signal?: AbortSignal
): Promise<number[]> => {
  try {
    throwIfUploadAborted(signal);
    const fd = new FormData();
    fd.append("fileHash", fileHash);
    fd.append("fileMd5", fileHash);
    fd.append("fileName", file.name);
    fd.append("fileSize", String(file.size));
    fd.append("enterpriseId", String(enterpriseId || ""));
    fd.append("path", uploadPath);
    // const res: any = await uploadWorkpaperCheckFile(fd, { signal });
    // if (res?.data) {
    //   const totalChunks = Math.ceil(file.size / chunkSize);
    //   return Array.from({ length: totalChunks }, (_, i) => i);
    // }
    throwIfUploadAborted(signal);
    const partForm = new FormData();
    partForm.append("fileHash", fileHash);
    partForm.append("fileMd5", fileHash);
    partForm.append("enterpriseId", String(enterpriseId || ""));
    partForm.append("path", uploadPath);
    const partRes: any = await uploadWorkpaperWorkpaperParts(partForm, { signal });
    const list = partRes?.data ?? partRes ?? [];
    if (Array.isArray(list)) return list.map((x: any) => Number(x)).filter((x: number) => Number.isFinite(x));
    if (Array.isArray(list?.uploadedChunks)) return list.uploadedChunks;
    if (Array.isArray(list?.parts)) return list.parts;
    return [];
  } catch {
    return [];
  }
};

export async function uploadFileByChunk(
  enterpriseId: string,
  file: File,
  uploadPath: string,
  options?: number | UploadFileByChunkOptions
) {
  const { chunkSize: chunkSizeOpt, onProgress, signal } = resolveChunkOptions(options);
  const chunkSize = chunkSizeOpt ?? DEFAULT_CHUNK_SIZE;
  throwIfUploadAborted(signal);
  const prepared = await buildFileDigestAndChunks(file, chunkSize, signal);
  const fileHash = prepared.fileHash;
  const chunks = prepared.chunks;
  const total = chunks.length;
  onProgress?.({
    phase: "preparing",
    fileName: file.name,
    uploadPath,
    chunksDone: 0,
    totalChunks: total,
    fileSize: file.size
  });
  throwIfUploadAborted(signal);
  const uploaded = await checkUploaded(enterpriseId, fileHash, file, uploadPath, chunkSize, signal);
  const need = chunks.filter(c => !uploaded.includes(c.index));
  const already = uploaded.length;
  let sessionUploaded = 0;
  for (const chunk of need) {
    throwIfUploadAborted(signal);
    const fd = new FormData();
    fd.append("file", chunk.blob, file.name);
    fd.append("fileMd5", fileHash);
    fd.append("fileHash", fileHash);
    fd.append("path", uploadPath);
    fd.append("partIndex", String(chunk.index));
    fd.append("chunkIndex", String(chunk.index));
    fd.append("partNumber", String(chunk.index + 1));
    fd.append("totalParts", String(total));
    fd.append("totalChunks", String(total));
    fd.append("enterpriseId", String(enterpriseId || ""));
    await uploadWorkpaperFiles(fd, { signal });
    sessionUploaded += 1;
    onProgress?.({
      phase: "uploading",
      fileName: file.name,
      uploadPath,
      chunksDone: already + sessionUploaded,
      totalChunks: total,
      fileSize: file.size
    });
  }
  throwIfUploadAborted(signal);
  onProgress?.({
    phase: "merging",
    fileName: file.name,
    uploadPath,
    chunksDone: total,
    totalChunks: total,
    fileSize: file.size
  });
  const mergeFd = new FormData();
  mergeFd.append("fileMd5", fileHash);
  mergeFd.append("fileHash", fileHash);
  mergeFd.append("path", uploadPath);
  mergeFd.append("totalParts", String(total));
  mergeFd.append("totalChunks", String(total));
  mergeFd.append("enterpriseId", String(enterpriseId || ""));
  await uploadWorkpaperFilesMerge(mergeFd, { signal });
}

3. 断点续传的实现

js
// ========== 断点检查 ==========
const checkUploaded = async (fileHash: string, file: File, row: FileRow): Promise<number[]> => {
  try {
    const relativePath = getUploadPath(file, row);
    const fd = new FormData();
    fd.append("fileHash", fileHash);
    fd.append("fileMd5", fileHash);
    fd.append("fileName", file.name);
    fd.append("fileSize", String(file.size));
    fd.append("enterpriseId", String(props.enterpriseId || ""));
    fd.append("path", relativePath);
    // const res: any = await uploadWorkpaperCheckFile(fd);
    // ensureApiSuccess(res, "CHECK_FILE");
    // if (res.data) {
    //   const totalChunks = Math.ceil(file.size / props.chunkSize!);
    //   return Array.from({ length: totalChunks }, (_, i) => i);
    // }
    const partForm = new FormData();
    partForm.append("fileHash", fileHash);
    partForm.append("fileMd5", fileHash);
    partForm.append("enterpriseId", String(props.enterpriseId || ""));
    partForm.append("path", relativePath);
    const partRes: any = await uploadWorkpaperWorkpaperParts(partForm);
    ensureApiSuccess(partRes, "CHECK_PARTS");
    const list = partRes?.data ?? partRes ?? [];
    if (Array.isArray(list)) return list.map((x: any) => Number(x)).filter((x: number) => Number.isFinite(x));
    if (Array.isArray(list?.uploadedChunks)) return list.uploadedChunks;
    if (Array.isArray(list?.parts)) return list.parts;
    return [];
  } catch {
    return [];
  }
};

大文件上传容易出现的坑

  1. 文件上传切片后,文件切片丢失问题

限制并发请求数:3~6 个
并发过高:浏览器请求限制、后端连接打满、超时、跨域、OOM
失败分片自动重试(重试次数 2~3 次,间隔退避)

  1. 大文件计算hash值耗时问题

超大文件(1GB+)不能一次性全量读入内存
必须:分片增量计算 Hash(SparkMD5),防止页面卡顿、内存溢出
禁止:readAsArrayBuffer 一次性读取整个文件