Dify插件开发-PDF转图片
2025-03-21 23:11:01

前言

昨天收到一个需求,用户需要在Dify工作流中提交PDF给大模型识别提取信息,由于用户提供的是扫描版PDF,即PDF中基本都是图片,这种PDF文件交给Dify自带的文档提取器并无法解析。那就需要将PDF转为图片,再将图片交给有多模态能力的大模型识别。

开始-API版

先来个API版的,直接给Cursor提需求,让它搓一个:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
from flask import Flask, request, Response
from PIL import Image
import fitz # PyMuPDF
import io

app = Flask(__name__)


def pdf_to_image(pdf_bytes):
# 打开 PDF 字节数据
pdf_document = fitz.open(stream=pdf_bytes, filetype="pdf")
images = []

# 遍历每一页,将其转换为图像
for page_number in range(len(pdf_document)):
page = pdf_document.load_page(page_number)
pix = page.get_pixmap()
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
images.append(img)

# 合并所有图像
total_width = max(img.width for img in images)
total_height = sum(img.height for img in images)
combined_image = Image.new('RGB', (total_width, total_height))

y_offset = 0
for img in images:
combined_image.paste(img, (0, y_offset))
y_offset += img.height

return combined_image


@app.route('/convert', methods=['POST'])
def convert_pdf_to_image():
if 'file' not in request.files:
return "No file part", 400

file = request.files['file']
if file.filename == '':
return "No selected file", 400

# 读取文件为字节数据
pdf_bytes = file.read()

# Convert PDF to image
combined_image = pdf_to_image(pdf_bytes)

# Save the image to a BytesIO object
img_io = io.BytesIO()
combined_image.save(img_io, 'PNG')
img_io.seek(0)

# 创建一个类似于 FileResponse 的响应对象
response = Response(
img_io.getvalue(),
mimetype='image/png',
headers={
'Content-Disposition': f'attachment; filename=combined_image.png',
'Content-Type': 'image/png'
}
)

return response


if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=5000)

效果如下:image-202503212221130101

开始-Dify插件版

1、先过一遍Dify官方教程:插件开发

2、按官方教程新建好项目,结构如下:

image-20250321222614418

3、provider文件中的代码基本不用动,插件的实际代码在tool/pdf2image.py中,还是一样,交给Cursor写,注意出参要根据Dify的官方文档要求。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
import logging
from collections.abc import Generator
from typing import Any
import io

from dify_plugin import Tool
from dify_plugin.entities.tool import ToolInvokeMessage
from dify_plugin.file.file import File
from pydantic import BaseModel


logger = logging.getLogger(__name__)


class ToolParameters(BaseModel):
files: list[File]


class Pdf2imageTool(Tool):
"""
A tool for converting PDF files to images using PyMuPDF and Pillow
"""

def _invoke(
self, tool_parameters: dict[str, Any]
) -> Generator[ToolInvokeMessage, None, None]:
if tool_parameters.get("files") is None:
yield self.create_text_message("No files provided. Please upload PDF files for processing.")
return

params = ToolParameters(**tool_parameters)
files = params.files

try:
# Try both import methods to ensure compatibility
try:
import pymupdf
fitz_module = pymupdf
except ImportError:
import fitz
fitz_module = fitz

try:
from PIL import Image
except ImportError:
error_msg = "Error: Pillow library not installed. Please install it with 'pip install Pillow'."
logger.error(error_msg)
yield self.create_text_message(error_msg)
return

for file in files:
try:
logger.info(f"Processing file: {file.filename}")

# Process PDF file
file_bytes = io.BytesIO(file.blob)
doc = fitz_module.open(stream=file_bytes, filetype="pdf")

page_count = doc.page_count
images = []

# Convert each page to an image
for page_num in range(page_count):
page = doc.load_page(page_num)
pix = page.get_pixmap()
img = Image.frombytes("RGB", [pix.width, pix.height], pix.samples)
images.append(img)

# Close the document to free resources
doc.close()

if not images:
yield self.create_text_message(f"No pages found in {file.filename}")
continue

# Merge all images vertically
total_width = max(img.width for img in images)
total_height = sum(img.height for img in images)
combined_image = Image.new('RGB', (total_width, total_height))

y_offset = 0
for img in images:
combined_image.paste(img, (0, y_offset))
y_offset += img.height

# Save the combined image to a bytes buffer
img_buffer = io.BytesIO()
combined_image.save(img_buffer, format='PNG')
img_buffer.seek(0)
image_bytes = img_buffer.getvalue()

# Yield image as blob with mime type
yield self.create_blob_message(
image_bytes,
meta={
"mime_type": "image/png",
"filename": f"{file.filename.rsplit('.', 1)[0]}.png"
},
)

except Exception as e:
error_msg = f"Error processing {file.filename}: {str(e)}"
logger.error(error_msg)
yield self.create_text_message(error_msg)
yield self.create_json_message({
file.filename: {"error": str(e)}
})

except ImportError as e:
error_msg = f"Error: Required library not installed. {str(e)}"
logger.error(error_msg)
yield self.create_text_message(error_msg)

4、由于pdf2image插件的入参是files类型,所以需要修改tool/pdf2image.yaml ,默认入参是String类型,改成files类型:

image-20250321223240521

5、pdf2image插件需要PyMuPDF和Pillow 两个依赖,所以还需要修改requirements.txt,在里面加上依赖名,这样插件打包上传到Dify时,才会自动下载这两个依赖。

image-20250321223410902

6、打包,上传,使用:

image-20250321224730833

总结

插件源码:https://github.com/Run1997/pdf2image-dify-plugin

打包好的插件:https://github.com/Run1997/dify-plugin

上一页
2025-03-21 23:11:01
下一页