I want to scrape a list of URLs within a range. I have 400 URLs with random ID numbers in the range of [182819-291846]. I'm trying to use this format to limit which entries I get results for with this URL format, as per the documentation guide: https://www.netgalley.co.uk/catalog/book/[182819-291846]
I have tried doing them individually but with 444 URLs, it's a big task. I planned to extract the data for IDs [182819-291846] and cross-reference the values in Excel to fill in the missing fields. If there's a more efficient way of working, I would be happy to learn from you!
Url format: The Good Part | Sophie Cousens | 9780593539897 | NetGalley
Sitemap:
{"_id":"netgalley","startUrl":["https://www.netgalley.co.uk/catalog/book/"],"selectors":[{"id":"element-title","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h1[itemprop='name']","type":"SelectorText"},{"id":"element-author","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"[itemprop='author'] span","type":"SelectorText"},{"id":"element-publisher","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h6[itemprop='publisher']","type":"SelectorText"},{"id":"element-pubdate","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".pb-1 strong:nth-of-type(1)","type":"SelectorText"},{"id":"element-archivedate","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"strong:nth-of-type(2)","type":"SelectorText"},{"id":"Edition","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".d-none tr:contains('EDITION') td.ps-2","type":"SelectorText"},{"id":"element-synopsis","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"div#descriptionFullTextBox","type":"SelectorText"},{"id":"Length","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".d-none tr:nth-of-type(5) td.ps-2, .d-none tr:contains('PAGES') td.ps-2","type":"SelectorText"},{"id":"Genre","multiple":false,"parentSelectors":["_root"],"regex":"","selector":"h6 span[itemprop='genre']","type":"SelectorText"},{"id":"element-hashtags","multiple":false,"parentSelectors":["_root"],"regex":"","selector":".mb-0 b","type":"SelectorText"},{"id":"element-image","multiple":false,"parentSelectors":["_root"],"selector":"img.cover-width","type":"SelectorImage"}]}
List of individual URL IDs: 182819, 184836, 184848, 188490, 192546, 192760, 195531, 195540, 199416, 202938, 203695, 206241, 206391, 209048, 209383, 209714, 209729, 210413, 210874, 210961, 210982, 211863, 211903, 212170, 212194, 213267, 213476, 213615, 213743, 214110, 214869, 215274, 215300, 215715, 216462, 216620, 216746, 216882, 217070, 217156, 217376, 217670, 217804, 218021, 218143, 218154, 218156, 218158, 218166, 218174, 218381, 218422, 218543, 218983, 219047, 219167, 219176, 219416, 219548, 219614, 220064, 220083, 220197, 220274, 220275, 220339, 220594, 220744, 221037, 221039, 221069, 221191, 221437, 221541, 221806, 221816, 222127, 222498, 222499, 222518, 222594, 222622, 222650, 222713, 222755, 223434, 223532, 223553, 223575, 223702, 224127, 224786, 224850, 224928, 224999, 225219, 225303, 225544, 225881, 226907, 227013, 227435, 227438, 227704, 227955, 228571, 229028, 229532, 230401, 230510, 230541, 230564, 230881, 230919, 230925, 231098, 231371, 231390, 231615, 231769, 231952, 232146, 232562, 232689, 232797, 232959, 233146, 233313, 233344, 233490, 233691, 233693, 233714, 233843, 234145, 234205, 234958, 235114, 235136, 235137, 235278, 235295, 235490, 235580, 235704, 235727, 235762, 235880, 235893, 236091, 236232, 236268, 236532, 236747, 236787, 237011, 237026, 237354, 237370, 237545, 238016, 238214, 238599, 238654, 239662, 240202, 240318, 240439, 240538, 240543, 240645, 240689, 240765, 240787, 241070, 241812, 241832, 242007, 242881, 242932, 243098, 243100, 243101, 243988, 244070, 244134, 244210, 244338, 244476, 244517, 244549, 245586, 246281, 246301, 246682, 246703, 246711, 246794, 246963, 247031, 247044, 247182, 247505, 247506, 247507, 247516, 247768, 247952, 247959, 248218, 248622, 248848, 249506, 249730, 249864, 250214, 250742, 251065, 251549, 252206, 253060, 253220, 253371, 253816, 254028, 254098, 254412, 254560, 254804, 254838, 254872, 255507, 255796, 255939, 256193, 256268, 256297, 256751, 256961, 257144, 257253, 257480, 257486, 257553, 257783, 257999, 258139, 258235, 258712, 258721, 258735, 258969, 259213, 259340, 259456, 259532, 259874, 259968, 260918, 260983, 261002, 261003, 261104, 261111, 261133, 261272, 261284, 261436, 261549, 262072, 262202, 262910, 262932, 263244, 263482, 263497, 263724, 264183, 264188, 264559, 264707, 264914, 265036, 265169, 265221, 265276, 265569, 265634, 265656, 266024, 266066, 266095, 266096, 266097, 266177, 266296, 266476, 266630, 266803, 267260, 267434, 267451, 267655, 267660, 268234, 268398, 268743, 268804, 268807, 269107, 269371, 269465, 269502, 269633, 269679, 269803, 270304, 270362, 270374, 270490, 270596, 270684, 270685, 270716, 270883, 271035, 271107, 271251, 271378, 271440, 271515, 271646, 271789, 271846, 272177, 272495, 272532, 272623, 272669, 272753, 273064, 273248, 273503, 273532, 273535, 273558, 273926, 274124, 274302, 274542, 274839, 274846, 274908, 275125, 275174, 275190, 275219, 275558, 275643, 276075, 276078, 276171, 276437, 276445, 276717, 276722, 277065, 278077, 278578, 278612, 278728, 278734, 278853, 278967, 279022, 279238, 279290, 279397, 279483, 279504, 279652, 280157, 280265, 280337, 280439, 280440, 280554, 280570, 280680, 280707, 280860, 280954, 281100, 281243, 281260, 281294, 281737, 281871, 282036, 282860, 283402, 283616, 283641, 283792, 283813, 284018, 284055, 284083, 284090, 284656, 284800, 284900, 285303, 285444, 285469, 285605, 285738, 285772, 286357, 286611, 286750, 286976, 287028, 287119, 287196, 287199, 287559, 288200, 288565, 289004, 289097, 289132, 289156, 289555, 289638, 289844, 290231, 290496, 290728, 290758, 291305, 291788, 291846
