Currently, under the condition that the URL format is not fixed, only a limited amount of sample data can be captured after importing the URL. Is it impossible to capture more information?
what yu mean... exactly.... post your sitemap to think over your problem....
here is my sitemap.I mean that this plugin can only crawl about 200+ records, and it will automatically stop afterwards. No matter how many URL addresses I provide, it won't continue to run. I have to crawl data in batches of 200 URLs each time. Is there a good solution for this?
{"_id":"shouyirenxinxi","startUrl":["http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=[10000-15479]&title=产品全景图&tabIndex=4"],"selectors":[{"id":"受益人账户全称","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(1)","multiple":false,"regex":""},{"id":"受益人账户类型","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(2)","multiple":false,"regex":""},{"id":"wrapper_for_受益人账户全称_受益人账户类型","parentSelectors":["_root"],"type":"SelectorElement","selector":"tr.el-table__row","multiple":true},{"id":"募集发行期次","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(3)","multiple":false,"regex":""},{"id":"最终受益人类型","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":" td:nth-child(4)","multiple":false,"regex":""},{"id":"最终受益人全称","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(5)","multiple":false,"regex":""},{"id":"是否有关联关系","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(6)","multiple":false,"regex":""},{"id":"初始投资金额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(7)","multiple":false,"regex":""},{"id":"初始受益权份额(份)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(8)","multiple":false,"regex":""},{"id":"行权金额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(9)","multiple":false,"regex":""},{"id":"行权份额(份)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":" td:nth-child(10)","multiple":false,"regex":""},{"id":"受益权转让金额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(11)","multiple":false,"regex":""},{"id":"受益权转让份额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":" td:nth-child(12)","multiple":false,"regex":""},{"id":"还本金额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":" td:nth-child(13)","multiple":false,"regex":""},{"id":"还本份额(份)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":" td:nth-child(14)","multiple":false,"regex":""},{"id":"现投资金额(万元)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(15)\t","multiple":false,"regex":""},{"id":"现投资份额(份)","parentSelectors":["wrapper_for_受益人账户全称_受益人账户类型"],"type":"SelectorText","selector":"td:nth-child(16)\t","multiple":false,"regex":""}]}
your URL is a local website, so there is no opportunity to reach it... sorry
Seems like you are also using ranges. Maybe use a pagination?
Ranges didn't work for me in the link itself.
You can generate all the links and input them into the Sitemap:
Example:
{
"_id": "shouyirenxinxi",
"startUrl": [
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10000&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10001&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10002&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10003&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10004&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10005&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10006&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10007&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10008&title=产品全景图&tabIndex=4",
"http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq?productId=10009&title=产品全景图&tabIndex=4"
MORE HERE
]
}
Code to generate all the links in range [10000-15479] in PYTHON:
# Define the base URL for generating the product panorama URLs
base_url = "http://10.3.1.22/#/meta-form/form/form-render/product_panorama_zq"
# Define the product ID range (from 10000 to 10009) CAN EDIT THIS FOR MORE
start_id = 10000
end_id = 10009
# Initialize an empty list to store the generated URLs
url_list = []
# Loop through the specified range of product IDs
for product_id in range(start_id, end_id + 1):
# Format the complete URL with the current product ID
url = f"{base_url}?productId={product_id}&title=产品全景图&tabIndex=4"
# Add the generated URL to the list
url_list.append(url)
# Print the generated URLs
for url in url_list:
print(url)