起因
最近一个部署了 go 应用的服务器出现了 OOM 的现象,内存占用过高。
原因
通过 Pyroscope 分析得出是因为 Minio 的 go sdk 中的 PutObject 函数占用了大量的内存。 Pyroscope 是什么,前面的文章已经介绍过了,这里就不过多介绍了。
接下来我们通过查看相关的源码来查看是什么原因。
// PutObject creates an object in a bucket.
//
// You must have WRITE permissions on a bucket to create an object.
//
// - For size smaller than 16MiB PutObject automatically does a
// single atomic PUT operation.
//
// - For size larger than 16MiB PutObject automatically does a
// multipart upload operation.
//
// - For size input as -1 PutObject does a multipart Put operation
// until input stream reaches EOF. Maximum object size that can
// be uploaded through this operation will be 5TiB.
//
// WARNING: Passing down '-1' will use memory and these cannot
// be reused for best outcomes for PutObject(), pass the size always.
//
// NOTE: Upon errors during upload multipart operation is entirely aborted.
func (c *Client) PutObject(ctx context.Context, bucketName, objectName string, reader io.Reader, objectSize int64,
opts PutObjectOptions,
) (info UploadInfo, err error) {
if objectSize < 0 && opts.DisableMultipart {
return UploadInfo{}, errors.New("object size must be provided with disable multipart upload")
}
err = opts.validate()
if err != nil {
return UploadInfo{}, err
}
return c.putObjectCommon(ctx, bucketName, objectName, reader, objectSize, opts)
}
从方法的注释可以看出,当传递的大小为 -1 时,会进行多次 put 操作,直到输入流结束。 多次 put 操作的最大大小为 5TiB, 并且不能重用内存,导致占用大量内存。
接下来继续深入,我们看看这个函数的源码。
func OptimalPartInfo(objectSize int64, configuredPartSize uint64) (totalPartsCount int, partSize int64, lastPartSize int64, err error) {
// object size is '-1' set it to 5TiB.
var unknownSize bool
if objectSize == -1 {
unknownSize = true
objectSize = maxMultipartPutObjectSize
}
// object size is larger than supported maximum.
if objectSize > maxMultipartPutObjectSize {
err = errEntityTooLarge(objectSize, maxMultipartPutObjectSize, "", "")
return
}
var partSizeFlt float64
if configuredPartSize > 0 {
if int64(configuredPartSize) > objectSize {
err = errEntityTooLarge(int64(configuredPartSize), objectSize, "", "")
return
}
if !unknownSize {
if objectSize > (int64(configuredPartSize) * maxPartsCount) {
err = errInvalidArgument("Part size * max_parts(10000) is lesser than input objectSize.")
return
}
}
if configuredPartSize < absMinPartSize {
err = errInvalidArgument("Input part size is smaller than allowed minimum of 5MiB.")
return
}
if configuredPartSize > maxPartSize {
err = errInvalidArgument("Input part size is bigger than allowed maximum of 5GiB.")
return
}
partSizeFlt = float64(configuredPartSize)
if unknownSize {
// If input has unknown size and part size is configured
// keep it to maximum allowed as per 10000 parts.
objectSize = int64(configuredPartSize) * maxPartsCount
}
} else {
configuredPartSize = minPartSize
// Use floats for part size for all calculations to avoid
// overflows during float64 to int64 conversions.
partSizeFlt = float64(objectSize / maxPartsCount)
partSizeFlt = math.Ceil(partSizeFlt/float64(configuredPartSize)) * float64(configuredPartSize)
}
// Total parts count.
totalPartsCount = int(math.Ceil(float64(objectSize) / partSizeFlt))
// Part size.
partSize = int64(partSizeFlt)
// Last part size.
lastPartSize = objectSize - int64(totalPartsCount-1)*partSize
return totalPartsCount, partSize, lastPartSize, nil
}
从函数中可以看出,这个函数的作用是计算出最佳的分片大小,并且计算出总的分片数量。 当大小为 -1 时,会使用最大的 5TiB。可以看出当没有指定对象大小时,每次会使用较大的内存。
小结
在我们使用 MinIO sdk 时,在使用 PutObject 方法时,最好指定要上传的对象的大小,避免造成内存资源的浪费。