国产精品chinese,色综合天天综合精品网国产在线,成午夜免费视频在线观看,清纯女学生被强行糟蹋小说

    <td id="ojr13"><tr id="ojr13"><label id="ojr13"></label></tr></td>
        • <source id="ojr13"></source>
            <td id="ojr13"><ins id="ojr13"><label id="ojr13"></label></ins></td>

            Article / 文章中心

            Python爬蟲:Scrapy鏈接解析器LinkExtractor返回Link對象

            發(fā)布時間:2021-11-23 點擊數(shù):551

            LinkExtractor

            from scrapy.linkextractors import LinkExtractor

            Link

            from scrapy.link import Link

            Link四個屬性

            url text fragment nofollow

            如果需要解析出文本,需要在 LinkExtractor 的參數(shù)中添加參數(shù):attrs

            link_extractor = LinkExtractor(attrs=('href','text'))  links = link_extractor.extract_links(response)

            使用示例

            import scrapy  from scrapy.linkextractors import LinkExtractor   class DemoSpider(scrapy.Spider):  name = 'spider'    start_urls = [  "https://book.douban.com/"  ]   def parse(self, response):  # 參數(shù)是正則表達式  link_extractor = LinkExtractor(allow="https://www.tianyancha.com/brand/b.*")   links = link_extractor.extract_links(response)   for link in links:  print(link.text, link.url)   if __name__ == '__main__':  cmdline.execute("scrapy crawl spider".split())