Python实现视频爬取
发布时间:2020-09-02 04:49:18 所属栏目:Python 来源:互联网
导读:Python可以用来做什么?公司里主要是爬取数据,并把爬回来的数据进行分析和挖掘,然而我们自己可以用它来爬取一些资源去使用,比如,想看的剧。本文中,小编将分享爬取视频的代码,大家存起来试试吧!
|
Python可以用来做什么?公司里主要是爬取数据,并把爬回来的数据进行分析和挖掘,然而我们自己可以用它来爬取一些资源去使用,比如,想看的剧。本文中,小编将分享爬取视频的代码,大家存起来试试吧! 下载流式文件,requests库中请求的stream设为True就可以啦,文档在此。 先找一个视频地址试验一下: # -*- coding: utf-8 -*- import requests def download_file(url, path): with requests.get(url, stream=True) as r: chunk_size = 1024 content_size = int(r.headers['content-length']) print '下载开始' with open(path, "wb") as f: for chunk in r.iter_content(chunk_size=chunk_size): f.write(chunk) if __name__ == '__main__': url = '就在原帖...' path = '想存哪都行' download_file(url, path) 遭遇当头一棒: AttributeError: __exit__ 这文档也会骗人的么! 看样子是没有实现上下文需要的__exit__方法。既然只是为了保证要让r最后close以释放连接池,那就使用contextlib的closing特性好了: # -*- coding: utf-8 -*- import requests from contextlib import closing def download_file(url, path): with closing(requests.get(url, stream=True)) as r: chunk_size = 1024 content_size = int(r.headers['content-length']) print '下载开始' with open(path, "wb") as f: for chunk in r.iter_content(chunk_size=chunk_size): f.write(chunk) 程序正常运行了,不过我盯着这文件,怎么大小不见变啊,到底是完成了多少了呢?还是要让下好的内容及时存进硬盘,还能省点内存是不是: # -*- coding: utf-8 -*- import requests from contextlib import closing import os def download_file(url, path): with closing(requests.get(url, stream=True)) as r: chunk_size = 1024 content_size = int(r.headers['content-length']) print '下载开始' with open(path, "wb") as f: for chunk in r.iter_content(chunk_size=chunk_size): f.write(chunk) f.flush() os.fsync(f.fileno()) 文件以肉眼可见的速度在增大,真心疼我的硬盘,还是最后一次写入硬盘吧,程序中记个数就好了: def download_file(url, path):
with closing(requests.get(url, stream=True)) as r:
chunk_size = 1024
content_size = int(r.headers['content-length'])
print '下载开始'
with open(path, "wb") as f:
n = 1
for chunk in r.iter_content(chunk_size=chunk_size):
loaded = n*1024.0/content_size
f.write(chunk)
print '已下载{0:%}'.format(loaded)
n += 1结果就很直观了: 已下载2.579129% 已下载2.581255% 已下载2.583382% 已下载2.585508% 心怀远大理想的我怎么会只满足于这一个呢,写个类一起使用吧: # -*- coding: utf-8 -*-
import requests
from contextlib import closing
import time
def download_file(url, path):
with closing(requests.get(url, stream=True)) as r:
chunk_size = 1024*10
content_size = int(r.headers['content-length'])
print '下载开始'
with open(path, "wb") as f:
p = ProgressData(size = content_size, unit='Kb', block=chunk_size)
for chunk in r.iter_content(chunk_size=chunk_size):
f.write(chunk)
p.output()
class ProgressData(object):
def __init__(self, block,size, unit, file_name='', ):
self.file_name = file_name
self.block = block/1000.0
self.size = size/1000.0
self.unit = unit
self.count = 0
self.start = time.time()
def output(self):
self.end = time.time()
self.count += 1
speed = self.block/(self.end-self.start) if (self.end-self.start)>0 else 0
self.start = time.time()
loaded = self.count*self.block
progress = round(loaded/self.size, 4)
if loaded >= self.size:
print u'%s下载完成rn'%self.file_name
else:
print u'{0}下载进度{1:.2f}{2}/{3:.2f}{4} 下载速度{5:.2%} {6:.2f}{7}/s'.
format(self.file_name, loaded, self.unit,
self.size, self.unit, progress, speed, self.unit)
print '%50s'%('/'*int((1-progress)*50)) (编辑:哈尔滨站长网) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
