以一个例子说明MCP协议的具体应用以及交互原理

👈🏻 Select language

普通AI是一个困在盒子里面的猛兽。权限的受限让它只能囿于一个对话框中，像个植物人一样躺在一个病床上呻吟。而MCP协议，就相当于AI的”血管“和”神经元“，是一种能实现AI指令的信号传导。

这里我使用vscode配合cline客户端，搭配Google gemini作为底层大模型“内核”。配置过程就直接省略不表了。

cline UI交互

配置 Google gemini之后，并集成 file-system mcp。

"github.com/modelcontextprotocol/servers/tree/main/src/filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/zeusro/Desktop"
      ],
      "disabled": false,
      "autoApprove": [
        "read_file",
        "write_file",
        "read_multiple_files",
        "list_directory",
        "directory_tree",
        "get_file_info",
        "list_allowed_directories"
      ]
    }

启用之后，会发现本地以npx启动了mcp-server-filesystem进程。

ps aux | grep "npx.*server-filesystem"

直接在 cline 对话框声明“删除我桌面上所有截图”
点击Approve表示同意让Cline查阅目录文件。

点击 run command表示执行命令，接着可以看到vs code在终端输入并执行了相应命令。

这里用的是 function call 联动 mcp server产生的结果。

客户端，Google Gemini 与 mcp server 的三方交互

1. MCP Server 分析

来源：github.com/modelcontextprotocol/servers/tree/main/src/filesystem
功能：提供文件系统操作工具，包括：
- list_files：列出指定目录中的文件。
- read_file：读取文件内容。
- write_file：写入文件内容。
- delete_file：删除指定文件。
协议：使用 JSON-RPC，通过 HTTP 或 STDIO 提供服务，支持工具发现（ListTools 请求）和工具调用。
关键点：无直接 delete_screenshots 工具，但可通过 list_files 获取文件列表，结合多次 delete_file 删除 .png 和 .jpg 文件。

2. 调整后的假设

用户提示“删除桌面上所有截图”由 Gemini 解析为：
1. 调用 list_files 获取桌面文件列表。
2. 筛选 .png 和 .jpg 文件，逐一调用 delete_file。
MCP Server 运行在本地，监听 http://localhost:8080。
桌面路径示例：/home/user/Desktop。

3. 流程

用户通过 Gemini API 发送提示。
Gemini 生成 list_files 调用，获取桌面文件列表。
MCP 客户端筛选截图文件，发起多次 delete_file 调用。
MCP Server 执行删除操作并返回结果。
Gemini 汇总结果，生成最终响应。

完整 Web 请求示例

以下是基于 Gemini API 和指定 MCP Server 的 Web 请求流程。

1. 用户向 Gemini API 发送请求

用户通过 HTTP POST 请求向 Gemini API 发送提示。

请求：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "list_files",
          "description": "列出指定目录中的文件和子目录。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要列出文件的目录路径，例如 '/home/user/Desktop'。"
              }
            },
            "required": ["path"]
          }
        },
        {
          "name": "delete_file",
          "description": "删除指定路径的文件。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要删除的文件的完整路径，例如 '/home/user/Desktop/screenshot.png'。"
              }
            },
            "required": ["path"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

说明：

tools：定义了 list_files 和 delete_file 工具，基于文件系统 MCP Server 的实现（参考 src/filesystem 的功能）。
path：假设桌面路径为 /home/user/Desktop，实际路径由 MCP 客户端或上下文提供。
Gemini 将首先调用 list_files 获取文件列表。

2. Gemini API 响应（生成 list_files 调用）

Gemini 解析用户提示，决定先调用 list_files 获取桌面文件列表。

响应：

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "list_files",
              "args": {
                "path": "/home/user/Desktop"
              }
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

说明：

Gemini 生成 list_files 调用，指定桌面路径。
MCP 客户端将处理此调用并与 MCP Server 通信。

3. MCP 客户端向 MCP Server 发送 list_files 请求

MCP 客户端通过 JSON-RPC 向文件系统 MCP Server 发送 list_files 请求。

请求：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "list_files",
  "params": {
    "path": "/home/user/Desktop"
  },
  "id": "call_001"
}

说明：

URL：MCP Server 运行在本地 http://localhost:8080/mcp。
method：调用 list_files，符合文件系统 MCP Server 的工具定义。

4. MCP Server 响应 list_files

MCP Server 执行 list_files，返回桌面文件列表。假设桌面包含以下文件：

screenshot1.png
screenshot2.jpg
document.txt

响应

{
  "jsonrpc": "2.0",
  "result": {
    "files": [
      {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
      {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
      {"path": "/home/user/Desktop/document.txt", "type": "file"}
    ]
  },
  "id": "call_001"
}

说明：

返回的 files 数组列出桌面所有文件。
MCP 客户端将筛选出 .png 和 .jpg 文件。

5. MCP 客户端筛选截图并发起 delete_file 请求

MCP 客户端解析 list_files 结果，筛选出截图文件（screenshot1.png 和 screenshot2.jpg），并逐一发送 delete_file 请求。

请求 1（删除 screenshot1.png）：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot1.png"
  },
  "id": "call_002"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
  },
  "id": "call_002"
}

请求 2（删除 screenshot2.jpg）：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot2.jpg"
  },
  "id": "call_003"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
  },
  "id": "call_003"
}

说明：

MCP 客户端依次调用 delete_file，每次删除一个截图文件。

MCP Server 执行实际删除操作（基于 src/filesystem 的实现，可能是 Python 的 os.remove）。

6. MCP 客户端汇总结果并返回给 Gemini

MCP 客户端收集所有 delete_file 的结果，构造响应并通过新的 Gemini API 请求发送回 Gemini。请求：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "list_files",
            "args": {
              "path": "/home/user/Desktop"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "list_files",
            "response": {
              "files": [
                {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
                {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
                {"path": "/home/user/Desktop/document.txt", "type": "file"}
              ]
            }
          }
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot1.png"
            }
          }
        },
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot2.jpg"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
            }
          }
        },
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
            }
          }
        }
      ]
    }
  ]
}

说明：请求包含整个对话历史：用户提示、list_files 调用和结果、多次 delete_file 调用和结果。

Gemini 将根据这些信息生成最终响应。

7. Gemini API 生成最终响应

Gemini 接收所有工具调用的结果，生成用户友好的响应。响应：

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "已成功删除桌面上 2 个截图文件：screenshot1.png 和 screenshot2.jpg。"
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

说明： Gemini 总结 delete_file 的结果，生成自然语言响应。

总结

使用 github.com/modelcontextprotocol/servers/tree/main/src/filesystem 的 MCP Server，通过 Gemini API 和 MCP 协议实现“删除桌面上所有截图”需要以下步骤：

Gemini 调用 list_files 获取桌面文件列表。
MCP 客户端筛选 .png 和 .jpg 文件，逐一调用 delete_file。
MCP Server 执行删除并返回结果。
Gemini 汇总结果生成最终响应。

简化的 Google gemini function call 工作原理

官方文档《函数调用的工作原理》说的很清楚，这里只挂一个图：

mcp server 通过 Discovering prompts 将自身的能力归纳为一种提示（prompts）。以 MCP-timeserver举例

1. MCP 客户端查询工具列表（工具发现）

MCP 客户端通过 ListToolsRequest 获取 TimeServer 的工具列表。

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": "tools_001"
}

响应
{
  "jsonrpc": "2.0",
  "result": {
    "tools": [
      {
        "name": "get_current_time",
        "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": []
        }
      },
      {
        "name": "convert_timezone",
        "description": "将时间转换为指定时区。",
        "parameters": {
          "type": "object",
          "properties": {
            "time": { "type": "string", "description": "ISO 8601 时间字符串" },
            "timezone": { "type": "string", "description": "目标时区，如 'America/New_York'" }
          },
          "required": ["time", "timezone"]
        }
      }
    ]
  },
  "id": "tools_001"
}

2. 用户向 Gemini API 发送提示

用户通过 HTTP POST 请求向 Gemini API 发送提示，要求获取当前时间。

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "告诉我当前时间"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "get_current_time",
          "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
          "parameters": {
            "type": "object",
            "properties": {},
            "required": []
          }
        },
        {
          "name": "convert_timezone",
          "description": "将时间转换为指定时区。",
          "parameters": {
            "type": "object",
            "properties": {
              "time": { "type": "string" },
              "timezone": { "type": "string" }
            },
            "required": ["time", "timezone"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

tools：包含 TimeServer 的工具 schema，由 MCP 客户端提供。

提示：“告诉我当前时间”明确要求时间信息。

3. Gemini 匹配工具（语义匹配）

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "get_current_time",
              "args": {}
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

4. MCP 客户端调用 TimeServer 工具

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_current_time",
    "arguments": {}
  },
  "id": "call_001"
}

MCP 客户端将结果返回 Gemini
Gemini 生成最终响应

Gemini 将根据提示和工具描述进行匹配到合适的工具，mcp 客户端复制转换这些工具信息，发起服务调用，后面在处理一下这些信息，返回给用户。

cline 插件在这个过程中实际充当了ai 客户端，mcp 客户端，mcp服务端（拉起本地mcp server服务）多种职责。当然，有些AI客户端为了限制资源占用过于臃肿的问题，不会负责拉起mcp server 微服务的职责，而是限制只使用远程的mcp server，这也是可以的。

get_current_time 只是简化的一种需求，实际的执行流程，可能还涉及并行调用，组合调用多种函数的场景，这里就不展开了。

总结

以我们人类看来，让AI删除桌面文件，这个流程非常繁琐，我们需要把多个上下文作为提示词导入到API调用之中，从而渐进地让AI找到答案，并执行。这其实就有点像开卷考试——老师说答案都在那本书里面，让考生自己找并写出来。

看起来有点蠢，只能说是在当前历史局限性下的一种过渡方案。如果能制定一种通用知识库的标准，让离线的AI先行预热知识库数据，那么往后的调用会高效地多。

参考链接

[1] Prompts https://modelcontextprotocol.io/docs/concepts/prompts#discovering-prompts

[2] How does OpenAI Function Calling work? https://www.youtube.com/watch?v=Qor2VZoBib0&ab_channel=LearnDatawithMark

Ordinary AI is a beast trapped in a box. Limited permissions confine it to a dialog box, lying on a hospital bed like a vegetable, moaning. The MCP protocol is like AI’s “blood vessels” and “neurons”—a signal transmission that enables AI instructions.

Here I use vscode with the cline client, paired with Google gemini as the underlying large model “kernel”. I’ll skip the configuration process.

cline UI Interaction

After configuring Google gemini and integrating the file-system mcp.

"github.com/modelcontextprotocol/servers/tree/main/src/filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/zeusro/Desktop"
      ],
      "disabled": false,
      "autoApprove": [
        "read_file",
        "write_file",
        "read_multiple_files",
        "list_directory",
        "directory_tree",
        "get_file_info",
        "list_allowed_directories"
      ]
    }

After enabling, you’ll find that the mcp-server-filesystem process is started locally with npx.

ps aux | grep "npx.*server-filesystem"

Directly declare “delete all screenshots on my desktop” in the cline dialog
Click Approve to agree to let Cline browse directory files.

Click run command to execute the command, then you can see vs code input and execute the corresponding command in the terminal.

This uses function call combined with mcp server to produce results.

Three-way Interaction Between Client, Google Gemini, and MCP Server

1. MCP Server Analysis

Source: github.com/modelcontextprotocol/servers/tree/main/src/filesystem
Function: Provides file system operation tools, including:
- list_files: List files in the specified directory.
- read_file: Read file content.
- write_file: Write file content.
- delete_file: Delete the specified file.
Protocol: Uses JSON-RPC, provides services through HTTP or STDIO, supports tool discovery (ListTools request) and tool invocation.
Key Point: No direct delete_screenshots tool, but can get file list through list_files, combined with multiple delete_file calls to delete .png and .jpg files.

2. Adjusted Assumptions

User prompt “delete all screenshots on desktop” is parsed by Gemini as:
1. Call list_files to get desktop file list.
2. Filter .png and .jpg files, call delete_file one by one.
MCP Server runs locally, listening on http://localhost:8080.
Desktop path example: /home/user/Desktop.

3. Process

User sends prompt through Gemini API.
Gemini generates list_files call to get desktop file list.
MCP client filters screenshot files and initiates multiple delete_file calls.
MCP Server executes delete operations and returns results.
Gemini summarizes results and generates final response.

Complete Web Request Example

The following is a web request flow based on Gemini API and the specified MCP Server.

1. User Sends Request to Gemini API

User sends prompt through HTTP POST request to Gemini API.

Request:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "list_files",
          "description": "列出指定目录中的文件和子目录。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要列出文件的目录路径，例如 '/home/user/Desktop'。"
              }
            },
            "required": ["path"]
          }
        },
        {
          "name": "delete_file",
          "description": "删除指定路径的文件。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要删除的文件的完整路径，例如 '/home/user/Desktop/screenshot.png'。"
              }
            },
            "required": ["path"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

Explanation:

tools: Defines list_files and delete_file tools, based on the file system MCP Server implementation (refer to src/filesystem functionality).
path: Assumes desktop path is /home/user/Desktop, actual path provided by MCP client or context.
Gemini will first call list_files to get file list.

2. Gemini API Response (Generates list_files Call)

Gemini parses user prompt and decides to call list_files first to get desktop file list.

Response:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "list_files",
              "args": {
                "path": "/home/user/Desktop"
              }
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

Explanation:

Gemini generates list_files call, specifying desktop path.
MCP client will handle this call and communicate with MCP Server.

3. MCP Client Sends list_files Request to MCP Server

MCP client sends list_files request to file system MCP Server through JSON-RPC.

Request:

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "list_files",
  "params": {
    "path": "/home/user/Desktop"
  },
  "id": "call_001"
}

Explanation:

URL: MCP Server runs locally at http://localhost:8080/mcp.
method: Calls list_files, conforming to file system MCP Server tool definition.

4. MCP Server Responds to list_files

MCP Server executes list_files and returns desktop file list. Assume desktop contains the following files:

screenshot1.png
screenshot2.jpg
document.txt

Response

{
  "jsonrpc": "2.0",
  "result": {
    "files": [
      {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
      {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
      {"path": "/home/user/Desktop/document.txt", "type": "file"}
    ]
  },
  "id": "call_001"
}

Explanation:

The returned files array lists all desktop files.
MCP client will filter out .png and .jpg files.

5. MCP Client Filters Screenshots and Initiates delete_file Requests

MCP client parses list_files results, filters out screenshot files (screenshot1.png and screenshot2.jpg), and sends delete_file requests one by one.

Request 1 (Delete screenshot1.png):

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot1.png"
  },
  "id": "call_002"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
  },
  "id": "call_002"
}

Request 2 (Delete screenshot2.jpg):

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot2.jpg"
  },
  "id": "call_003"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
  },
  "id": "call_003"
}

Explanation:

MCP client calls delete_file sequentially, deleting one screenshot file each time.

MCP Server executes actual delete operation (based on src/filesystem implementation, likely Python’s os.remove).

6. MCP Client Summarizes Results and Returns to Gemini

MCP client collects all delete_file results, constructs response and sends back to Gemini through a new Gemini API request. Request:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "list_files",
            "args": {
              "path": "/home/user/Desktop"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "list_files",
            "response": {
              "files": [
                {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
                {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
                {"path": "/home/user/Desktop/document.txt", "type": "file"}
              ]
            }
          }
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot1.png"
            }
          }
        },
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot2.jpg"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
            }
          }
        },
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
            }
          }
        }
      ]
    }
  ]
}

Explanation: The request contains the entire conversation history: user prompt, list_files call and results, multiple delete_file calls and results.

Gemini will generate final response based on this information.

7. Gemini API Generates Final Response

Gemini receives all tool call results and generates a user-friendly response. Response:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "已成功删除桌面上 2 个截图文件：screenshot1.png 和 screenshot2.jpg。"
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

Explanation: Gemini summarizes delete_file results and generates natural language response.

Summary

Using the MCP Server from github.com/modelcontextprotocol/servers/tree/main/src/filesystem, implementing “delete all screenshots on desktop” through Gemini API and MCP protocol requires the following steps:

Gemini calls list_files to get desktop file list.
MCP client filters .png and .jpg files and calls delete_file one by one.
MCP Server executes deletion and returns results.
Gemini summarizes results and generates final response.

Simplified Google Gemini Function Call Working Principle

The official documentation “How Function Calling Works” is very clear, here I’ll just include a diagram:

MCP server summarizes its capabilities as prompts through Discovering prompts. Using MCP-timeserver as an example

1. MCP Client Queries Tool List (Tool Discovery)

MCP client gets TimeServer tool list through ListToolsRequest.

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": "tools_001"
}

Response
{
  "jsonrpc": "2.0",
  "result": {
    "tools": [
      {
        "name": "get_current_time",
        "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": []
        }
      },
      {
        "name": "convert_timezone",
        "description": "将时间转换为指定时区。",
        "parameters": {
          "type": "object",
          "properties": {
            "time": { "type": "string", "description": "ISO 8601 时间字符串" },
            "timezone": { "type": "string", "description": "目标时区，如 'America/New_York'" }
          },
          "required": ["time", "timezone"]
        }
      }
    ]
  },
  "id": "tools_001"
}

2. User Sends Prompt to Gemini API

User sends prompt through HTTP POST request to Gemini API, requesting current time.

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "告诉我当前时间"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "get_current_time",
          "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
          "parameters": {
            "type": "object",
            "properties": {},
            "required": []
          }
        },
        {
          "name": "convert_timezone",
          "description": "将时间转换为指定时区。",
          "parameters": {
            "type": "object",
            "properties": {
              "time": { "type": "string" },
              "timezone": { "type": "string" }
            },
            "required": ["time", "timezone"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

tools: Contains TimeServer tool schema, provided by MCP client.

Prompt: “告诉我当前时间” clearly requests time information.

3. Gemini Matches Tool (Semantic Matching)

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "get_current_time",
              "args": {}
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

4. MCP Client Calls TimeServer Tool

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_current_time",
    "arguments": {}
  },
  "id": "call_001"
}

MCP Client Returns Results to Gemini
Gemini Generates Final Response

Gemini will match appropriate tools based on prompts and tool descriptions. The MCP client copies and converts this tool information, initiates service calls, then processes this information and returns it to the user.

The cline plugin actually serves multiple roles in this process: AI client, MCP client, and MCP server (launching local MCP server service). Of course, some AI clients, to limit resource consumption and avoid being too bloated, won’t take responsibility for launching MCP server microservices, but instead limit to only using remote MCP servers, which is also acceptable.

get_current_time is just a simplified requirement. The actual execution flow may also involve parallel calls, combined calls of multiple functions, which I won’t expand on here.

Summary

From our human perspective, having AI delete desktop files is a very cumbersome process. We need to import multiple contexts as prompts into API calls, progressively letting AI find answers and execute. This is actually a bit like an open-book exam—the teacher says the answers are all in that book, letting students find and write them out themselves.

It looks a bit stupid, can only be said to be a transitional solution under current historical limitations. If we could establish a standard for a universal knowledge base, letting offline AI pre-warm knowledge base data, then future calls would be much more efficient.

References

[1] Prompts https://modelcontextprotocol.io/docs/concepts/prompts#discovering-prompts

[2] How does OpenAI Function Calling work? https://www.youtube.com/watch?v=Qor2VZoBib0&ab_channel=LearnDatawithMark

普通のAIは箱の中に閉じ込められた猛獣です。権限の制限により、ダイアログボックスに閉じ込められ、植物人間のように病床に横たわり、うめいています。MCPプロトコルは、AIの「血管」と「ニューロン」に相当し、AI命令を実現できる信号伝達です。

ここでは、vscodeとclineクライアントを使用し、Google geminiを基盤となる大規模モデル「カーネル」として組み合わせています。設定プロセスは省略します。

cline UIインタラクション

Google geminiを設定し、file-system mcpを統合した後。

"github.com/modelcontextprotocol/servers/tree/main/src/filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/zeusro/Desktop"
      ],
      "disabled": false,
      "autoApprove": [
        "read_file",
        "write_file",
        "read_multiple_files",
        "list_directory",
        "directory_tree",
        "get_file_info",
        "list_allowed_directories"
      ]
    }

有効にすると、ローカルでnpxを使用してmcp-server-filesystemプロセスが起動されていることがわかります。

ps aux | grep "npx.*server-filesystem"

clineダイアログで直接「デスクトップ上のすべてのスクリーンショットを削除」と宣言
Approveをクリックして、Clineにディレクトリファイルを閲覧させることに同意。

run commandをクリックしてコマンドを実行すると、vs codeがターミナルで対応するコマンドを入力して実行していることがわかります。

これは、function callとmcp serverを連動させて結果を生成するために使用されます。

クライアント、Google Gemini、MCP Serverの三者間インタラクション

1. MCP Server分析

ソース：github.com/modelcontextprotocol/servers/tree/main/src/filesystem
機能：ファイルシステム操作ツールを提供します：
- list_files：指定されたディレクトリ内のファイルを一覧表示。
- read_file：ファイル内容を読み取る。
- write_file：ファイル内容を書き込む。
- delete_file：指定されたファイルを削除。
プロトコル：JSON-RPCを使用し、HTTPまたはSTDIOを通じてサービスを提供し、ツール発見（ListToolsリクエスト）とツール呼び出しをサポート。
重要なポイント：直接的なdelete_screenshotsツールはありませんが、list_filesを通じてファイルリストを取得し、複数のdelete_file呼び出しと組み合わせて.pngと.jpgファイルを削除できます。

2. 調整された仮定

ユーザーのプロンプト「デスクトップ上のすべてのスクリーンショットを削除」は、Geminiによって次のように解析されます：
1. list_filesを呼び出してデスクトップファイルリストを取得。
2. .pngと.jpgファイルをフィルタリングし、delete_fileを1つずつ呼び出し。
MCP Serverはローカルで実行され、http://localhost:8080でリッスン。
デスクトップパスの例：/home/user/Desktop。

3. プロセス

ユーザーがGemini APIを通じてプロンプトを送信。
Geminiがlist_files呼び出しを生成し、デスクトップファイルリストを取得。
MCPクライアントがスクリーンショットファイルをフィルタリングし、複数のdelete_file呼び出しを開始。
MCP Serverが削除操作を実行し、結果を返す。
Geminiが結果をまとめ、最終応答を生成。

完全なWebリクエスト例

以下は、Gemini APIと指定されたMCP Serverに基づくWebリクエストフローです。

1. ユーザーがGemini APIにリクエストを送信

ユーザーがHTTP POSTリクエストを通じてGemini APIにプロンプトを送信。

リクエスト：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "list_files",
          "description": "列出指定目录中的文件和子目录。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要列出文件的目录路径，例如 '/home/user/Desktop'。"
              }
            },
            "required": ["path"]
          }
        },
        {
          "name": "delete_file",
          "description": "删除指定路径的文件。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要删除的文件的完整路径，例如 '/home/user/Desktop/screenshot.png'。"
              }
            },
            "required": ["path"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

説明：

tools：list_filesとdelete_fileツールを定義し、ファイルシステムMCP Serverの実装に基づいています（src/filesystemの機能を参照）。
path：デスクトップパスが/home/user/Desktopであると仮定し、実際のパスはMCPクライアントまたはコンテキストによって提供されます。
Geminiは最初にlist_filesを呼び出してファイルリストを取得します。

2. Gemini API応答（list_files呼び出しを生成）

Geminiがユーザープロンプトを解析し、最初にlist_filesを呼び出してデスクトップファイルリストを取得することを決定。

応答：

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "list_files",
              "args": {
                "path": "/home/user/Desktop"
              }
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

説明：

Geminiがlist_files呼び出しを生成し、デスクトップパスを指定。
MCPクライアントがこの呼び出しを処理し、MCP Serverと通信します。

3. MCPクライアントがMCP Serverにlist_filesリクエストを送信

MCPクライアントがJSON-RPCを通じてファイルシステムMCP Serverにlist_filesリクエストを送信。

リクエスト：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "list_files",
  "params": {
    "path": "/home/user/Desktop"
  },
  "id": "call_001"
}

説明：

URL：MCP Serverはローカルのhttp://localhost:8080/mcpで実行されています。
method：list_filesを呼び出し、ファイルシステムMCP Serverのツール定義に準拠。

4. MCP Serverがlist_filesに応答

MCP Serverがlist_filesを実行し、デスクトップファイルリストを返します。デスクトップに以下のファイルが含まれていると仮定：

screenshot1.png
screenshot2.jpg
document.txt

応答

{
  "jsonrpc": "2.0",
  "result": {
    "files": [
      {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
      {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
      {"path": "/home/user/Desktop/document.txt", "type": "file"}
    ]
  },
  "id": "call_001"
}

説明：

返されたfiles配列がデスクトップのすべてのファイルを一覧表示。
MCPクライアントが.pngと.jpgファイルをフィルタリングします。

5. MCPクライアントがスクリーンショットをフィルタリングし、delete_fileリクエストを開始

MCPクライアントがlist_files結果を解析し、スクリーンショットファイル（screenshot1.pngとscreenshot2.jpg）をフィルタリングし、delete_fileリクエストを1つずつ送信。

リクエスト1（screenshot1.pngを削除）：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot1.png"
  },
  "id": "call_002"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
  },
  "id": "call_002"
}

リクエスト2（screenshot2.jpgを削除）：

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot2.jpg"
  },
  "id": "call_003"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
  },
  "id": "call_003"
}

説明：

MCPクライアントがdelete_fileを順次呼び出し、毎回1つのスクリーンショットファイルを削除。

MCP Serverが実際の削除操作を実行（src/filesystemの実装に基づき、おそらくPythonのos.remove）。

6. MCPクライアントが結果をまとめ、Geminiに返す

MCPクライアントがすべてのdelete_file結果を収集し、応答を構築し、新しいGemini APIリクエストを通じてGeminiに送信。リクエスト：

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "list_files",
            "args": {
              "path": "/home/user/Desktop"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "list_files",
            "response": {
              "files": [
                {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
                {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
                {"path": "/home/user/Desktop/document.txt", "type": "file"}
              ]
            }
          }
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot1.png"
            }
          }
        },
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot2.jpg"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
            }
          }
        },
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
            }
          }
        }
      ]
    }
  ]
}

説明：リクエストには会話履歴全体が含まれます：ユーザープロンプト、list_files呼び出しと結果、複数のdelete_file呼び出しと結果。

Geminiがこの情報に基づいて最終応答を生成します。

7. Gemini APIが最終応答を生成

Geminiがすべてのツール呼び出し結果を受信し、ユーザーフレンドリーな応答を生成。応答：

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "已成功删除桌面上 2 个截图文件：screenshot1.png 和 screenshot2.jpg。"
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

説明： Geminiがdelete_file結果をまとめ、自然言語応答を生成。

まとめ

github.com/modelcontextprotocol/servers/tree/main/src/filesystemのMCP Serverを使用し、Gemini APIとMCPプロトコルを通じて「デスクトップ上のすべてのスクリーンショットを削除」を実装するには、以下の手順が必要です：

Geminiがlist_filesを呼び出してデスクトップファイルリストを取得。
MCPクライアントが.pngと.jpgファイルをフィルタリングし、delete_fileを1つずつ呼び出し。
MCP Serverが削除を実行し、結果を返す。
Geminiが結果をまとめ、最終応答を生成。

簡略化されたGoogle gemini function callの動作原理

公式ドキュメント「関数呼び出しの動作原理」は非常に明確です。ここでは図を1つだけ含めます：

mcp serverはDiscovering promptsを通じて、自身の能力をプロンプト（prompts）としてまとめます。 MCP-timeserverを例にします

1. MCPクライアントがツールリストをクエリ（ツール発見）

MCPクライアントがListToolsRequestを通じてTimeServerのツールリストを取得。

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": "tools_001"
}

応答
{
  "jsonrpc": "2.0",
  "result": {
    "tools": [
      {
        "name": "get_current_time",
        "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": []
        }
      },
      {
        "name": "convert_timezone",
        "description": "将时间转换为指定时区。",
        "parameters": {
          "type": "object",
          "properties": {
            "time": { "type": "string", "description": "ISO 8601 时间字符串" },
            "timezone": { "type": "string", "description": "目标时区，如 'America/New_York'" }
          },
          "required": ["time", "timezone"]
        }
      }
    ]
  },
  "id": "tools_001"
}

2. ユーザーがGemini APIにプロンプトを送信

ユーザーがHTTP POSTリクエストを通じてGemini APIにプロンプトを送信し、現在の時刻を要求。

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "告诉我当前时间"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "get_current_time",
          "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
          "parameters": {
            "type": "object",
            "properties": {},
            "required": []
          }
        },
        {
          "name": "convert_timezone",
          "description": "将时间转换为指定时区。",
          "parameters": {
            "type": "object",
            "properties": {
              "time": { "type": "string" },
              "timezone": { "type": "string" }
            },
            "required": ["time", "timezone"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

tools：TimeServerのツールスキーマを含み、MCPクライアントによって提供されます。

プロンプト：「告诉我当前时间」は時間情報を明確に要求します。

3. Geminiがツールをマッチング（セマンティックマッチング）

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "get_current_time",
              "args": {}
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

4. MCPクライアントがTimeServerツールを呼び出し

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_current_time",
    "arguments": {}
  },
  "id": "call_001"
}

MCPクライアントが結果をGeminiに返す
Geminiが最終応答を生成

Geminiがプロンプトとツールの説明に基づいて適切なツールをマッチングし、mcpクライアントがこれらのツール情報をコピーして変換し、サービス呼び出しを開始し、その後これらの情報を処理してユーザーに返します。

clineプラグインは、このプロセスで実際にaiクライアント、mcpクライアント、mcpサーバー（ローカルmcp serverサービスを起動）の複数の役割を果たします。もちろん、リソース消費を制限し、肥大化の問題を避けるために、一部のAIクライアントはmcp serverマイクロサービスの起動を担当せず、リモートのmcp serverのみを使用するように制限することも可能です。

get_current_timeは単純化された要件にすぎず、実際の実行フローには、並列呼び出し、複数の関数の組み合わせ呼び出しなどのシナリオも含まれる可能性がありますが、ここでは展開しません。

まとめ

私たち人間の視点から見ると、AIにデスクトップファイルを削除させることは、非常に煩雑なプロセスです。複数のコンテキストをプロンプトとしてAPI呼び出しにインポートし、段階的にAIに答えを見つけさせ、実行させる必要があります。これは実際には開かれた本の試験のようなものです—教師が答えはすべてその本の中にあると言い、受験者に自分で見つけて書き出すようにさせます。

少し愚かに見えますが、現在の歴史的制約の下での移行ソリューションと言えるだけです。オフラインAIが知識ベースデータを事前にウォームアップできるように、ユニバーサル知識ベースの標準を制定できれば、今後の呼び出しははるかに効率的になります。

参考リンク

[1] Prompts https://modelcontextprotocol.io/docs/concepts/prompts#discovering-prompts

[2] How does OpenAI Function Calling work? https://www.youtube.com/watch?v=Qor2VZoBib0&ab_channel=LearnDatawithMark

Обычный AI — это зверь, запертый в коробке. Ограниченные разрешения заставляют его оставаться в диалоговом окне, лежать на больничной кровати, как овощ, стонать. Протокол MCP подобен “кровеносным сосудам” и “нейронам” AI — это передача сигналов, которая позволяет выполнять инструкции AI.

Здесь я использую vscode с клиентом cline, в паре с Google gemini в качестве базовой большой модели “ядро”. Процесс конфигурации я пропущу.

Взаимодействие с UI cline

После настройки Google gemini и интеграции file-system mcp.

"github.com/modelcontextprotocol/servers/tree/main/src/filesystem": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/server-filesystem",
        "/Users/zeusro/Desktop"
      ],
      "disabled": false,
      "autoApprove": [
        "read_file",
        "write_file",
        "read_multiple_files",
        "list_directory",
        "directory_tree",
        "get_file_info",
        "list_allowed_directories"
      ]
    }

После включения вы обнаружите, что процесс mcp-server-filesystem запущен локально с помощью npx.

ps aux | grep "npx.*server-filesystem"

Напрямую объявить “удалить все скриншоты на моём рабочем столе” в диалоге cline
Нажмите Approve, чтобы согласиться позволить Cline просматривать файлы каталога.

Нажмите run command, чтобы выполнить команду, затем вы увидите, как vs code вводит и выполняет соответствующую команду в терминале.

Здесь используется function call в сочетании с mcp server для получения результатов.

Трёхстороннее взаимодействие между клиентом, Google Gemini и MCP Server

1. Анализ MCP Server

Источник: github.com/modelcontextprotocol/servers/tree/main/src/filesystem
Функция: Предоставляет инструменты для работы с файловой системой, включая:
- list_files: Список файлов в указанном каталоге.
- read_file: Чтение содержимого файла.
- write_file: Запись содержимого файла.
- delete_file: Удаление указанного файла.
Протокол: Использует JSON-RPC, предоставляет услуги через HTTP или STDIO, поддерживает обнаружение инструментов (запрос ListTools) и вызов инструментов.
Ключевой момент: Нет прямого инструмента delete_screenshots, но можно получить список файлов через list_files, в сочетании с несколькими вызовами delete_file для удаления файлов .png и .jpg.

2. Скорректированные предположения

Подсказка пользователя “удалить все скриншоты на рабочем столе” анализируется Gemini как:
1. Вызов list_files для получения списка файлов рабочего стола.
2. Фильтрация файлов .png и .jpg, вызов delete_file по одному.
MCP Server работает локально, прослушивает http://localhost:8080.
Пример пути рабочего стола: /home/user/Desktop.

3. Процесс

Пользователь отправляет подсказку через Gemini API.
Gemini генерирует вызов list_files для получения списка файлов рабочего стола.
MCP-клиент фильтрует файлы скриншотов и инициирует несколько вызовов delete_file.
MCP Server выполняет операции удаления и возвращает результаты.
Gemini суммирует результаты и генерирует окончательный ответ.

Полный пример веб-запроса

Ниже приведён поток веб-запросов на основе Gemini API и указанного MCP Server.

1. Пользователь отправляет запрос в Gemini API

Пользователь отправляет подсказку через HTTP POST-запрос в Gemini API.

Запрос:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "list_files",
          "description": "列出指定目录中的文件和子目录。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要列出文件的目录路径，例如 '/home/user/Desktop'。"
              }
            },
            "required": ["path"]
          }
        },
        {
          "name": "delete_file",
          "description": "删除指定路径的文件。",
          "parameters": {
            "type": "object",
            "properties": {
              "path": {
                "type": "string",
                "description": "要删除的文件的完整路径，例如 '/home/user/Desktop/screenshot.png'。"
              }
            },
            "required": ["path"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

Объяснение:

tools: Определяет инструменты list_files и delete_file на основе реализации MCP Server файловой системы (см. функциональность src/filesystem).
path: Предполагает, что путь рабочего стола — /home/user/Desktop, фактический путь предоставляется MCP-клиентом или контекстом.
Gemini сначала вызовет list_files для получения списка файлов.

2. Ответ Gemini API (генерирует вызов list_files)

Gemini анализирует подсказку пользователя и решает сначала вызвать list_files для получения списка файлов рабочего стола.

Ответ:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "list_files",
              "args": {
                "path": "/home/user/Desktop"
              }
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

Объяснение:

Gemini генерирует вызов list_files, указывая путь рабочего стола.
MCP-клиент обработает этот вызов и будет общаться с MCP Server.

3. MCP-клиент отправляет запрос list_files в MCP Server

MCP-клиент отправляет запрос list_files в MCP Server файловой системы через JSON-RPC.

Запрос:

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "list_files",
  "params": {
    "path": "/home/user/Desktop"
  },
  "id": "call_001"
}

Объяснение:

URL: MCP Server работает локально на http://localhost:8080/mcp.
method: Вызывает list_files, соответствует определению инструмента MCP Server файловой системы.

4. MCP Server отвечает на list_files

MCP Server выполняет list_files и возвращает список файлов рабочего стола. Предположим, что на рабочем столе есть следующие файлы:

screenshot1.png
screenshot2.jpg
document.txt

Ответ

{
  "jsonrpc": "2.0",
  "result": {
    "files": [
      {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
      {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
      {"path": "/home/user/Desktop/document.txt", "type": "file"}
    ]
  },
  "id": "call_001"
}

Объяснение:

Возвращённый массив files перечисляет все файлы рабочего стола.
MCP-клиент отфильтрует файлы .png и .jpg.

5. MCP-клиент фильтрует скриншоты и инициирует запросы delete_file

MCP-клиент анализирует результаты list_files, отфильтровывает файлы скриншотов (screenshot1.png и screenshot2.jpg) и отправляет запросы delete_file по одному.

Запрос 1 (удалить screenshot1.png):

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot1.png"
  },
  "id": "call_002"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
  },
  "id": "call_002"
}

Запрос 2 (удалить screenshot2.jpg):

POST http://localhost:8080/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "delete_file",
  "params": {
    "path": "/home/user/Desktop/screenshot2.jpg"
  },
  "id": "call_003"
}

{
  "jsonrpc": "2.0",
  "result": {
    "success": true,
    "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
  },
  "id": "call_003"
}

Объяснение:

MCP-клиент последовательно вызывает delete_file, каждый раз удаляя один файл скриншота.

MCP Server выполняет фактическую операцию удаления (на основе реализации src/filesystem, вероятно, os.remove в Python).

6. MCP-клиент суммирует результаты и возвращает Gemini

MCP-клиент собирает все результаты delete_file, конструирует ответ и отправляет обратно в Gemini через новый запрос Gemini API. Запрос:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "删除桌面上所有截图"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "list_files",
            "args": {
              "path": "/home/user/Desktop"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "list_files",
            "response": {
              "files": [
                {"path": "/home/user/Desktop/screenshot1.png", "type": "file"},
                {"path": "/home/user/Desktop/screenshot2.jpg", "type": "file"},
                {"path": "/home/user/Desktop/document.txt", "type": "file"}
              ]
            }
          }
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot1.png"
            }
          }
        },
        {
          "function_call": {
            "name": "delete_file",
            "args": {
              "path": "/home/user/Desktop/screenshot2.jpg"
            }
          }
        }
      ]
    },
    {
      "role": "function",
      "parts": [
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot1.png 已删除"
            }
          }
        },
        {
          "function_response": {
            "name": "delete_file",
            "response": {
              "success": true,
              "message": "文件 /home/user/Desktop/screenshot2.jpg 已删除"
            }
          }
        }
      ]
    }
  ]
}

Объяснение: Запрос содержит всю историю разговора: подсказка пользователя, вызов list_files и результаты, несколько вызовов delete_file и результаты.

Gemini сгенерирует окончательный ответ на основе этой информации.

7. Gemini API генерирует окончательный ответ

Gemini получает все результаты вызовов инструментов и генерирует удобный для пользователя ответ. Ответ:

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "text": "已成功删除桌面上 2 个截图文件：screenshot1.png 和 screenshot2.jpg。"
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

Объяснение: Gemini суммирует результаты delete_file и генерирует ответ на естественном языке.

Резюме

Используя MCP Server из github.com/modelcontextprotocol/servers/tree/main/src/filesystem, реализация “удалить все скриншоты на рабочем столе” через Gemini API и протокол MCP требует следующих шагов:

Gemini вызывает list_files для получения списка файлов рабочего стола.
MCP-клиент фильтрует файлы .png и .jpg и вызывает delete_file по одному.
MCP Server выполняет удаление и возвращает результаты.
Gemini суммирует результаты и генерирует окончательный ответ.

Упрощённый принцип работы Google Gemini Function Call

Официальная документация “Как работает вызов функций” очень ясна, здесь я просто включу диаграмму:

MCP server суммирует свои возможности как подсказки (prompts) через Discovering prompts. В качестве примера используем MCP-timeserver

1. MCP-клиент запрашивает список инструментов (обнаружение инструментов)

MCP-клиент получает список инструментов TimeServer через ListToolsRequest.

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/list",
  "id": "tools_001"
}

Ответ
{
  "jsonrpc": "2.0",
  "result": {
    "tools": [
      {
        "name": "get_current_time",
        "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
        "parameters": {
          "type": "object",
          "properties": {},
          "required": []
        }
      },
      {
        "name": "convert_timezone",
        "description": "将时间转换为指定时区。",
        "parameters": {
          "type": "object",
          "properties": {
            "time": { "type": "string", "description": "ISO 8601 时间字符串" },
            "timezone": { "type": "string", "description": "目标时区，如 'America/New_York'" }
          },
          "required": ["time", "timezone"]
        }
      }
    ]
  },
  "id": "tools_001"
}

2. Пользователь отправляет подсказку в Gemini API

Пользователь отправляет подсказку через HTTP POST-запрос в Gemini API, запрашивая текущее время.

POST https://us-central1-aiplatform.googleapis.com/v1/projects/your-project-id/locations/us-central1/publishers/google/models/gemini-1.5-flash-002:generateContent
Authorization: Bearer your-access-token
Content-Type: application/json

{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "告诉我当前时间"
        }
      ]
    }
  ],
  "tools": [
    {
      "function_declarations": [
        {
          "name": "get_current_time",
          "description": "获取当前系统时间，返回 ISO 8601 格式的字符串。",
          "parameters": {
            "type": "object",
            "properties": {},
            "required": []
          }
        },
        {
          "name": "convert_timezone",
          "description": "将时间转换为指定时区。",
          "parameters": {
            "type": "object",
            "properties": {
              "time": { "type": "string" },
              "timezone": { "type": "string" }
            },
            "required": ["time", "timezone"]
          }
        }
      ]
    }
  ],
  "generation_config": {
    "temperature": 0.4,
    "top_p": 1,
    "top_k": 32,
    "max_output_tokens": 2048
  }
}

tools: Содержит схему инструментов TimeServer, предоставленную MCP-клиентом.

Подсказка: “告诉我当前时间” явно запрашивает информацию о времени.

3. Gemini сопоставляет инструмент (семантическое сопоставление)

{
  "candidates": [
    {
      "content": {
        "role": "model",
        "parts": [
          {
            "function_call": {
              "name": "get_current_time",
              "args": {}
            }
          }
        ]
      },
      "finish_reason": "STOP"
    }
  ]
}

4. MCP-клиент вызывает инструмент TimeServer

POST http://localhost:8081/mcp
Content-Type: application/json

{
  "jsonrpc": "2.0",
  "method": "tools/call",
  "params": {
    "name": "get_current_time",
    "arguments": {}
  },
  "id": "call_001"
}

MCP-клиент возвращает результаты в Gemini
Gemini генерирует окончательный ответ

Gemini будет сопоставлять подходящие инструменты на основе подсказок и описаний инструментов. MCP-клиент копирует и преобразует эту информацию об инструментах, инициирует вызовы сервисов, затем обрабатывает эту информацию и возвращает её пользователю.

Плагин cline фактически выполняет несколько ролей в этом процессе: AI-клиент, MCP-клиент, MCP-сервер (запуск локального сервиса MCP server). Конечно, некоторые AI-клиенты, чтобы ограничить потребление ресурсов и избежать чрезмерной раздутости, не будут брать на себя ответственность за запуск микросервисов MCP server, а вместо этого ограничатся использованием только удалённых MCP-серверов, что также приемлемо.

get_current_time — это всего лишь упрощённое требование. Фактический поток выполнения может также включать параллельные вызовы, комбинированные вызовы нескольких функций, которые я здесь не буду расширять.

Резюме

С нашей человеческой точки зрения, заставить AI удалить файлы рабочего стола — это очень громоздкий процесс. Нам нужно импортировать несколько контекстов как подсказки в вызовы API, постепенно позволяя AI находить ответы и выполнять их. Это на самом деле немного похоже на экзамен с открытой книгой — учитель говорит, что ответы все в той книге, позволяя студентам найти и написать их самим.

Выглядит немного глупо, можно сказать только переходное решение в рамках текущих исторических ограничений. Если бы мы могли установить стандарт для универсальной базы знаний, позволяя офлайн AI предварительно разогревать данные базы знаний, то будущие вызовы были бы намного более эффективными.

Ссылки

[1] Prompts https://modelcontextprotocol.io/docs/concepts/prompts#discovering-prompts

[2] How does OpenAI Function Calling work? https://www.youtube.com/watch?v=Qor2VZoBib0&ab_channel=LearnDatawithMark

💬 讨论 / Discussion

对这篇文章有想法？欢迎在 GitHub 上发起讨论。
Have thoughts on this post? Start a discussion on GitHub.

在 GitHub 参与讨论 / Discuss on GitHub