Please help! Cant find a solution!

mvb · September 4, 2018, 8:10pm

Hello everybody!

Help needed! i really love this software and make good use of it. But i have one problem; i want to track auction results of one specifiek site. https://www.troostwijkauctions.com

This site automatic loads new results/lot numers when you scroll down; i've read for hours and tried almost every option but i cant find a working solution. I really hope this is possible with this software because everything else works great.

Is there somebody who can try this for me with a regular auction?

The selector 'link' works but then he skips several results

thanks in advance en greets

bretfeig · September 5, 2018, 2:59am

Post your sitemap if you want help. You’ll get a quicker response.

mvb · September 5, 2018, 11:18am

{"_id":"twa","startUrl":["https://www.troostwijkauctions.com/nl/printing-industry/01-27213/"],"selectors":[{"id":"next","type":"SelectorElementClick","parentSelectors":["_root"],"selector":"a.next","multiple":true,"delay":0,"clickElementSelector":"a.next","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"obj","type":"SelectorLink","parentSelectors":["_root","next"],"selector":"a.title","multiple":true,"delay":0},{"id":"titel","type":"SelectorText","parentSelectors":["obj"],"selector":"h1","multiple":false,"regex":"","delay":0}]}

Here is the sitemap. I cant figure out how to navigate to the several pages one by one. And if it is working; it skips several results. Hopefully somebody can give me the right instructions

jeremyrem · September 5, 2018, 7:48pm

Man this is a tough one.

Looks like it will only scrape what is in view.

The alternative is use their api to get all listing data, and parse that instead.

https://api.troostwijkauctions.com/lot/7/list?batchSize=999&listType=7&offset=0&sortOption=0&saleID=27213&parentID=0&relationID=0&buildversion=201807311

That will show you up to 999 listings in that category, from there you can use jq and a script or cron job to pull the data, and feed it into something that can parse it or feed it through jq to reformat / extract the data you want (i.e. the title)

Hope that helps

mvb · September 6, 2018, 7:03am

Thanks for your reaction. Bad news that it isn't possible to scrape directly. The dynamical loading of the site makes it really dificult.

Your api 'solution' is hard to understand for me; not familiar with that kind of technique.

Other question; if i use the link selector and select all the pages, it will include automatic the next button (volgende). is it possible to undo this and only select the pages?

jeremyrem · September 6, 2018, 1:47pm

The problem is, it will only scrape the posting that are in view (6 posts, for me) so I would scrape item 1 - 6, 10-16, 20-26, etc if I used the next button.

Heres an easier way to use the API.

Goto https://api.troostwijkauctions.com/lot/7/list?batchSize=999&listType=7&offset=0&sortOption=0&saleID=27213&parentID=0&relationID=0&buildversion=2018073111 and copy the whole page.
Next, go to https://sqlify.io/convert/json/to/csv & click "Paste raw data"
Paste the what we copied previously there and make sure CSV is selected then click "Conver to CSV"
Go ahead and delete all of the source fields except for $.results[*].t

$.results[*].t = Title of the post

If you want more fields, these ones are noteworthy I guess

$.results[].lc = List Category
$.results[].d = Item Description (under post title on the listing page)
$.results[].mf = Manufacture
$.results[]..srl = Serial Number of device
$.results[].typ = Type
$.results[].yb = Year
$.results[*].url = url to detail page (minus https://www.troostwijkauctions.com/nl/)

After you have deleted fields you do not want and then click Save scheme and continue
Then when it is finished click Download file

mvb · September 11, 2018, 7:59am

Hi Jeremy,

thanks for your help! i didn't succeed to extract the data. This is totally new for me. The problem is that i can open the api-adress.

How do you resolve this adress of an auction?

bretfeig · September 11, 2018, 8:36am

Couldn't open it either, until I removed a "1" from the end. Then it worked

https://api.troostwijkauctions.com/lot/7/list?batchSize=999&listType=7&offset=0&sortOption=0&saleID=27213&parentID=0&relationID=0&buildversion=201807311

jeremyrem · September 11, 2018, 2:22pm

So what you want to do is goto https://sqlify.io/convert/json/to/csv and click "Input a URL", paste https://api.troostwijkauctions.com/lot/7/list?batchSize=999&listType=7&offset=0&sortOption=0&saleID=27213&parentID=0&relationID=0&buildversion=201807311 & click CSV & then Convert to CSV

That page contains all the information you wanted, title of the auction + all the data for each item (displayed on the items page)

After you click Convert to CSV, it will take you to a page to map the extracted data

It will have 3 columns, Source field, Result field name, Type

Do not touch Source field.

Result field name are column names, so you can edit those to what you want in the final CSV.

Go ahead and click the trashcan icon next to the lines you do not need/want

$.results[].t = Title of the post
$.results[].lc = List Category
$.results[].d = Item Description (under post title on the listing page)
$.results[].mf = Manufacture
$.results[]..srl = Serial Number of device
$.results[].typ = Type
$.results[].yb = Year
$.results[].url = url to detail page (minus https://www.troostwijkauctions.com/nl/)

After you are done click Save schema and continue

Download the CSV

To resolve the urls of the items, just prepend https://www.troostwijkauctions.com/nl/ to them all

Might be easiest to copy the json into notepad++ and use find/replace

Find: "url": "
Replace: "url": "https://www.troostwijkauctions.com/nl

Then copy that and convert it to CSV

mvb · September 17, 2018, 11:27am

Hi,

i managed to pull out the data. Not yet a clean result but iam working on it.

Can somebody please tell me how to resolve an api adress of other auctions ?

jeremyrem · September 19, 2018, 4:14pm

Goto any of the listing pages
i.e.
https://www.troostwijkauctions.com/uk/food-processing/01-27046/

in chrome/opera right click anywhere and click "Inspect element"

Click on the Network tab and press F5

Find the connection to "https://api.troostwijkauctions.com"
For me it was Named "100?100&buildversion=201807311"

You can copy that url or you can click on the Response tab to get the json data

Copy all the data in the box and paste it to parse, or just use that url

mvb · October 11, 2018, 7:23am

thanks! really appreciate your help. Been a whila ago but still working on it! And making progress, thanks so far!

mvb · April 24, 2024, 7:38am

Over the past few years, I have used the solution mentioned above extensively. Unfortunately, the website mentioned above has been reset. I am now unable to retrieve the data via an API. I am not very experienced. Can anyone help me figure out where I should now use the API and data?