defencode_image(image_path): """Encode the image to base64.""" try: withopen(image_path, "rb") as image_file: return base64.b64encode(image_file.read()).decode('utf-8') except FileNotFoundError: print(f"Error: The file {image_path} was not found.") returnNone except Exception as e: # Added general exception handling print(f"Error: {e}") returnNone
Prompt
VLM物体描述的prompt:
核心需要:准确定位物体所在方位,不把远景识别为物体,降低False Positive
1 2 3 4 5 6 7 8 9 10 11 12 13
Focus on the area highlighted in green in the image.
Step 1: Determine if the highlighted area represents a distinct, identifiable object or instance: - If the highlighted area is clearly a distinct object, proceed to Step 2. - If the highlighted area is abstract, ambiguous, or you cannot confidently identify it as a specific object (e.g., part of background, texture, partial view), respond with "Valid: No".
Step 2: If the highlighted area is a distinct object, provide: 1. The specific name of the object (be precise and use technical terms when appropriate) 2. The primary function or purpose of this object 3. Any notable features visible in the highlighted area (no color description) 4. If there is text visible on the object, include what it says
Remember, if you're uncertain about the highlighted area being a distinct object, respond only with "Valid: No".
输出结果:
Valid
1 2 3 4 5 6 7 8 9
Valid: Yes
1. The specific name of the object: Soap dispenser 2. The primary function or purpose of this object: To dispense liquid soap or hand sanitizer. 3. Notable features visible in the highlighted area: - The dispenser has a pump mechanism at the top. - The body of the dispenser is cylindrical. - The material appears to be translucent plastic. 4. There is no visible text on the object.
invalid
1
Valid: No
VLM输出->Structured Output
使用另一个LLM来对VLM输出的内容进行parse,转化成json文件, 通过mistral ai 提供的接口实现: