Scrap ASPX that not change the URL

Describe the problem.

Url: http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx

Sitemap:
{"_id":"aupsasanitario","startUrl":["http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx"],"selectors":[{"id":"link","type":"SelectorLink","selector":"td:nth-of-type(n+2) a","parentSelectors":["_root"],"multiple":true,"delay":0},{"id":"tabla","type":"SelectorTable","selector":"table#ContentPlaceHolder1_GVListadoNoti","parentSelectors":["link"],"multiple":true,"columns":[{"header":"No. Registro","name":"No. Registro","extract":true},{"header":"Producto","name":"Producto","extract":true},{"header":"Fabricante","name":"Fabricante","extract":true},{"header":"PaĆ­s","name":"PaĆ­s","extract":true},{"header":"Estatus","name":"Estatus","extract":true},{"header":"Arancel","name":"Arancel","extract":true},{"header":"Codigo Barra","name":"Codigo Barra","extract":false},{"header":"1","name":"1","extract":false},{"header":"2","name":"2","extract":false},{"header":"3","name":"3","extract":false},{"header":"4","name":"4","extract":false},{"header":"5","name":"5","extract":false},{"header":"6","name":"6","extract":false},{"header":"7","name":"7","extract":false},{"header":"8","name":"8","extract":false},{"header":"9","name":"9","extract":false},{"header":"10","name":"10","extract":false},{"header":"...","name":"...","extract":false}],"delay":0,"tableDataRowSelector":"tr:nth-of-type(n+2)","tableHeaderRowSelector":"tr:nth-of-type(1)"}]}

I try to get all the tables, and delay the load of the page to make a search
but i have these msg.


Error de servidor en la aplicación '/SISNIA'.
Error en runtime
Descripción: Error de aplicación en el servidor. La configuración actual de errores personalizados de esta aplicación evita que se muestren los detalles del error de la aplicación de manera remota (por razones de seguridad). Sin embargo, se pueden ver los detalles en los exploradores que se ejecuten localmente en el servidor.

Detalles: Para habilitar los detalles de este mensaje de error específico de forma que sean visibles en equipos remotos, cree una etiqueta en el archivo de configuración "web.config" ubicado en el directorio raíz de la aplicación Web actual. La etiqueta debe tener el atributo "mode" establecido como "Off".

Notas: La pÔgina de errores que estÔ viendo actualmente se puede reemplazar por una pÔgina de errores personalizada si se modifica el atributo "defaultRedirect" de la etiqueta de configuración de la aplicación para que señale una dirección URL de pÔginas de errores personalizadas.


thanks

Hi there!

Very interesting table. Pagination is a table too, table inside a table. No errors so far.
You have to set Page load delay to 8000 ms so you will have a chance to type something into search field and hit 'Search'.

Draft sitemap (work in progress):
{"_id":"aupsasanitario","startUrl":["http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx"],"selectors":[{"id":"rows","type":"SelectorElementClick","selector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:nth-child(n+2):not(:last-child)","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:last-child tbody tr td:nth-of-type(n+2) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueHTMLText"},{"id":"no_registro","type":"SelectorText","selector":"td:nth-of-type(1)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Producto","type":"SelectorText","selector":"td:nth-of-type(2)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Fabricante","type":"SelectorText","selector":"td:nth-of-type(3)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"PaĆ­s","type":"SelectorText","selector":"td:nth-of-type(4)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Estatus","type":"SelectorText","selector":"td:nth-of-type(5)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Arancel","type":"SelectorText","selector":"td:nth-of-type(6)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Codigo Barra","type":"SelectorElementAttribute","selector":"img","parentSelectors":["rows"],"multiple":false,"extractAttribute":"src","delay":0},{"id":"Number in the end tha has no meaning","type":"SelectorText","selector":"td:nth-of-type(8)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Link","type":"SelectorLink","selector":"a","parentSelectors":["rows"],"multiple":false,"delay":0}]}

thanks i will try it

RIght now im triying somethuing like these... but only click in one page more.

{"_id":"aupsasanitario","startUrl":["http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx"],"selectors":[{"id":"rows","type":"SelectorElementClick","selector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:nth-child(n+2):not(:last-child)","parentSelectors":["links"],"multiple":true,"delay":"3000","clickElementSelector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:last-child tbody tr td:nth-of-type(n+2) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueHTMLText"},{"id":"no_registro","type":"SelectorText","selector":"td:nth-of-type(1)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Producto","type":"SelectorText","selector":"td:nth-of-type(2)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Fabricante","type":"SelectorText","selector":"td:nth-of-type(3)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"PaĆ­s","type":"SelectorText","selector":"td:nth-of-type(4)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Estatus","type":"SelectorText","selector":"td:nth-of-type(5)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Arancel","type":"SelectorText","selector":"td:nth-of-type(6)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"links","type":"SelectorElementClick","selector":"div.Main","parentSelectors":["_root"],"multiple":true,"delay":0,"clickElementSelector":"td div td td:nth-of-type(1) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

You've made a little mistake and added another Element Click selector inside one that already works.

I've found the solution, just limited pagination to capture only real links (CSS selector that will pick only elements that has 'href' attribute').

Your fixed sitemap:
{"_id":"aupsasanitario2","startUrl":["http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx"],"selectors":[{"id":"rows","type":"SelectorElement","selector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:nth-child(n+2):not(:last-child)","parentSelectors":["links"],"multiple":true,"delay":""},{"id":"no_registro","type":"SelectorText","selector":"td:nth-of-type(1)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Producto","type":"SelectorText","selector":"td:nth-of-type(2)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Fabricante","type":"SelectorText","selector":"td:nth-of-type(3)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"PaĆ­s","type":"SelectorText","selector":"td:nth-of-type(4)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Estatus","type":"SelectorText","selector":"td:nth-of-type(5)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Arancel","type":"SelectorText","selector":"td:nth-of-type(6)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"links","type":"SelectorElementClick","selector":"div.Main","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:last-child a[href]","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"}]}

Don't forget to set page load delay up to 6000 ms to set your search.

Now stops in the page 20, in the second series of the pages, the page add "..." to the firsts 10

{"_id":"aupsasanitario2","startUrl":["http://200.46.196.152/SISNIA/Forms/Regsanitario.aspx"],"selectors":[{"id":"rows","type":"SelectorElement","selector":"table[id=ContentPlaceHolder1_GVListadoNoti] tr:nth-child(n+2):not(:last-child)","parentSelectors":["links"],"multiple":true,"delay":""},{"id":"no_registro","type":"SelectorText","selector":"td:nth-of-type(1)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Producto","type":"SelectorText","selector":"td:nth-of-type(2)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Fabricante","type":"SelectorText","selector":"td:nth-of-type(3)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"PaĆ­s","type":"SelectorText","selector":"td:nth-of-type(4)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Estatus","type":"SelectorText","selector":"td:nth-of-type(5)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"Arancel","type":"SelectorText","selector":"td:nth-of-type(6)","parentSelectors":["rows"],"multiple":false,"regex":"","delay":0},{"id":"links","type":"SelectorElementClick","selector":"div.Main","parentSelectors":["_root"],"multiple":true,"delay":"2000","clickElementSelector":"td:nth-of-type(n+2) a","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueHTMLText"}]}

These Work... THANKS For the Help!!! :grinning:

1 Like