I want to get the rows of this site with cheerio, but because the site needs to be loaded, it only shows me the first 10 rows. How can I get all rows of this table? coinmarketcap.com enter image description here

在这个网站上,第一页的桌子有100 rows个.我需要获取所有这100 rows个的信息,但我编写的这段代码只给出了first 10个.因为当我加载站点时,它在第一时刻只显示前10个,其余的被加载然后显示.

const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");

let PORT = 8000;
let links = "https://coinmarketcap.com";

const app = express();

axios.get(link).then((response) => {
  const html = response.data;
  const $ = cheerio.load(html);

  $(".coin-logo").each(function (i) {
    console.log($(this).attr("src"), i);
  });
});

app.listen(PORT, () => console.log(`server is running on PORT: ${PORT}`));

在控制台中

server is running on PORT: 8000
https://s2.coinmarketcap.com/static/img/coins/64x64/1.png 0
https://s2.coinmarketcap.com/static/img/coins/64x64/1027.png 1
https://s2.coinmarketcap.com/static/img/coins/64x64/825.png 2
https://s2.coinmarketcap.com/static/img/coins/64x64/1839.png 3
https://s2.coinmarketcap.com/static/img/coins/64x64/3408.png 4
https://s2.coinmarketcap.com/static/img/coins/64x64/52.png 5
https://s2.coinmarketcap.com/static/img/coins/64x64/2010.png 6
https://s2.coinmarketcap.com/static/img/coins/64x64/3890.png 7
https://s2.coinmarketcap.com/static/img/coins/64x64/74.png 8
https://s2.coinmarketcap.com/static/img/coins/64x64/5426.png 9

Returns only the first ten rows. While the table has 100 rows. enter image description here

推荐答案

这是一个Reaction/Next.js应用程序,这意味着数据不在axios请求的静态HTML中,而是在页面加载后由JS添加到DOM中.单页应用程序(SPA)的数据通常通过API端点进入,如果不安全,您通常可以直接访问该端点.

在本例中,数据是(幸运的)<script id="__NEXT_DATA__">,JS在页面加载后使用它来创建您在dev工具中看到的可见元素.您可以按如下方式获取数据:

const axios = require("axios");
const cheerio = require("cheerio");
require("util").inspect.defaultOptions.depth = null;

const url = "<Your URL>";

axios.get(url).then(response => {
  const html = response.data;
  const $ = cheerio.load(html);
  const payload = $("#__NEXT_DATA__").first().text();
  const {data} = JSON.parse(JSON.parse(payload).props.initialState)
    .cryptocurrency.listingLatest;
  console.log(data);
});

该 struct 是压缩的,没有标头.如果希望将标头映射到数据以使其更具可读性,则可以:

const payload = $("#__NEXT_DATA__").first().text();
const {data} = JSON.parse(
  JSON.parse(payload).props.initialState
).cryptocurrency.listingLatest;
const [{keysArr}, ...rest] = data;
const withKeys = rest.map(e =>
  Object.fromEntries(
    e.map((e, i) => [keysArr[i] ?? "unknown", e])
  )
);
console.log(withKeys.slice(0, 10));

现在,下面的代码显示了与站点前几列类似的数据:

const summary = withKeys.map(e => ({
  "id": e.id,
  "name": e.name,
  "symbol": e.symbol,
  "price": e["quote.USD.price"],
  "1h": e["quote.USD.percentChange1h"],
  "24h": e["quote.USD.percentChange24h"],
  "marketCap": e["quote.USD.marketCap"],
}));
console.log(summary);
console.log(summary.length); // => 100

输出:

[
  {
    id: 1,
    name: 'Bitcoin',
    symbol: 'BTC',
    price: 28422.366435538406,
    '1h': -0.02014955,
    '24h': 5.28633725,
    marketCap: 549443765021.2035
  },
  {
    id: 1027,
    name: 'Ethereum',
    symbol: 'ETH',
    price: 1809.5420505966479,
    '1h': 0.00783376,
    '24h': 3.55526047,
    marketCap: 221440656815.19766
  },
  {
    id: 825,
    name: 'Tether',
    symbol: 'USDT',
    price: 0.9998792896202411,
    '1h': -0.00074249,
    '24h': -0.02658359,
    marketCap: 79511295635.83586
  },
  // ...
]

Node.js相关问答推荐

monorepo内的NPM包使用不在注册表中的本地包

GraphQL MongoDB Mongoose填充字段未获取多个类别

FHIR 服务器:尽管 JSON 格式正确,但在 POST 请求中接收未定义请求正文

使用 Google Drive API 按文件夹 ID 检索文件夹的内容

Mongodb - 在数组数组中查找()

在nodejs中为oauth请求创建和存储csrf令牌的最佳方法

在 MacOS Ventura 上使用 NVM 安装 node ?

Nodejs 从链接数组中获取数据并保存到 mongodb

使用 fs.createWriteStream 将数据写入 bigquery (node.js) 时出现模式错误

如何调用同名的两个函数?

为什么我的 Heroku Express API 数据是持久的,即使它只是来自一个变量

部署云功能 Firebase/Stripe 时出错

如何让我的 Next.js 应用在运行 ubuntu-latest 的 Azure 应用服务中启动?

socket.io 发出回调合适吗?

密码的 Node.js 散列

nodejs v10.3.0 的 gulp 任务问题:src\node_contextify.cc:629: Assertion `args[1]->IsString()' failed

使用 Mongoose 进行多对多映射

我可以让 node --inspect 自动打开 Chrome

deno vs ts-node:有什么区别

CORS 错误:预检响应中的 Access-Control-Allow-Headers 不允许请求标头字段授权