HTTP协议multipart/form-data

2022/08/02 10:16

最近有一个通过网页上传文件需求要实现,在开发的过程中接触到multipart/form-data表单类型,感觉挺有趣的,并且应用范围非常广泛,有必要稍微深入的学习一下。

1. Form

<form method="post" action="/form">
    <label for="fname">First name:</label><br>
    <input type="text" id="fname" name="fname"><br>
    <label for="lname">Last name:</label><br>
    <input type="text" id="lname" name="fname"> <br>
    <button type="submit">submit</button>
</form>

这种是最基础的HTML表单形式,其中label标签的for的值用于指定label所绑定元素的idinputname属性的值是表单的提交时候的数据名称,数据值则是input中所输入的值,下面是POST的HTTP数据包,有些不重要的请求头就省略了。

POST /form HTTP/1.0
Host: localhost:8000
Content-Type: application/x-www-form-urlencoded
Content-Length: 5

fname=&fname=

HTML请求头上所标注的Content-Typex-www-form-urlencoded,这是表单提交的默认类型,另外一种类型则是本文要重点介绍的multipart/form-data。前者从名字就可以看出来了它是通过URL encoding 对非ASCII字符进行处理的,所以对于一些非ASCII尤其是二进制的文件效率就比较低了,所以当需要通过表单来提交文件的时候需要使用后者。对于x-www-form-urlencoded格式其Body部分的数据组织形式是直接采用key1={value1}&key2={value2}的形式进行表现,key与value通过等号连接不同的键值对则通过&隔开。

2. Multipart Form1

要采用multipart/form-data格式传输数据也非常简单,只需要在form标签中加一个enctype="multipart/form-data"就可以了,所以上面的那个HTML可以表示为

<form method="post" action="/form" enctype="multipart/form-data">
    <label for="fname">First name:</label><br>
    <input type="text" id="fname" name="fname"><br>
    <label for="lname">Last name:</label><br>
    <input type="text" id="lname" name="fname"> <br>
    <button type="submit">submit</button>
</form>

下面是HTTP的数据包,同样省略了一些不重要的请求头。

POST /form HTTP/1.0
Host: localhost:8000
Content-Type: multipart/form-data; boundary=---------------------------7110427581564010345444571575
Content-Length: 285

-----------------------------7110427581564010345444571575
Content-Disposition: form-data; name="fname"

东
-----------------------------7110427581564010345444571575
Content-Disposition: form-data; name="fname"

流
-----------------------------7110427581564010345444571575

可以看到Content-Type已经变成了multipart/form-data并且还多了一个boundary=...,下面的Body也同上面的完全不同了。这个boundary是由用户指定的一个定界符,要求这个定界符必须不能在数据的正文中出现,如果我们在浏览器中提交表单的话这个定界符由浏览器自动生成,下面是由RFC对boundary给出的解释2

The boundary delimiter MUST NOT appear inside any of the encapsulated parts, on a line by itself or as the prefix of any line. This implies that it is crucial that the composing agent be able to choose and specify a unique boundary parameter value that does not contain the boundary parameter value of an enclosing multipart as a prefix.

NOTE: Because boundary delimiters must not appear in the body parts being encapsulated, a user agent must exercise care to choose a unique boundary parameter value. The boundary parameter value in the example above could have been the result of an algorithm designed to produce boundary delimiters with a very low probability of already existing in the data to be encapsulated without having to prescan the data. Alternate algorithms might result in more “readable” boundary delimiters for a recipient with an old user agent, but would require more attention to the possibility that the boundary delimiter might appear at the beginning of some line in the encapsulated part. The simplest boundary delimiter line possible is something like “—”, with a closing boundary delimiter line of “—–”.

下面的HTML Body部分被boundary分成了几个部分,每一个部分同样包含着Header与Body,Header包含了数据名称和数据类型,Body则是数据正文。如果用户通过multipart/form-data提交了txt或者gif文件的话,其HTML Body则是这样的

   Content-Type: multipart/form-data; boundary=AaB03x

   --AaB03x
   Content-Disposition: form-data; name="submit-name"

   Larry
   --AaB03x
   Content-Disposition: form-data; name="files"
   Content-Type: multipart/mixed; boundary=BbC04y

   --BbC04y
   Content-Disposition: file; filename="file1.txt"
   Content-Type: text/plain

   ... contents of file1.txt ...
   --BbC04y
   Content-Disposition: file; filename="file2.gif"
   Content-Type: image/gif
   Content-Transfer-Encoding: binary

   ...contents of file2.gif...
   --BbC04y--
   --AaB03x--

服务器可以通过Content-TypeContent-Transfer-Encoding头的内容对不同部分的数据分别进行处理。

3. Go与Python发送multipart/form-data

Go与Python都可以非常方便的发送multipart/form-data表单。

3.1 Python

Python可以通过requests3只需几行既可以发送multipart/form-data表单了,下面是一个简单的示例。

# 需要上传的文件,key表示文件名,后面是文件的Body
upload_files = {
    'image': open(name, 'rb'),
}
# 需要上传的表单数据,Key为名字,后面的为值
upload_data = {
    'fname': fname,
    'lname': lname,
}

# 当同时带有数据和文件的时候requests会自动采用multipart/form-data发送数据
resp = client.post(url,files=upload_files ,data=upload_data)

3.2 Go

Go可以通过标准库的multipart4包来构造multipart/form-data数据,下面是一个简单的示例。

// 构造多表单
func structureMultiform(data map[string]io.Reader, imagePath string) (bytes.Buffer, string) {
    // 初始化缓冲区
    var b bytes.Buffer
    w := multipart.NewWriter(&b)

    // 写入表单数据
    for key, r := range data {
        var fw io.Writer
        fw, err := w.CreateFormField(key)
        checkErr(err)
        _, err = io.Copy(fw, r)
        checkErr(err)
    }
    // 写入图片
    img, err := os.Open(imagePath)
    checkErr(err)
    fw, err := w.CreateFormFile("image", img.Name())
    _, err = io.Copy(fw, img)
    checkErr(err)
    contentType := w.FormDataContentType()

    // 关闭 multipart writer
    _ = w.Close()
    return b, contentType
}

  1. Forms in HTML documents (w3.org)↩︎

  2. RFC 2046: Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types (rfc-editor.org)↩︎

  3. Requests: HTTP for Humans™ — Requests 2.27.1 documentation (python-requests.org)↩︎

  4. multipart package - mime/multipart - pkg.go.dev↩︎