|
Post by cinereus on Sept 30, 2019 10:53:47 GMT
Anyone thought of a secure way to scrape the members-only rate trend info?
|
|
|
Post by propman on Sept 30, 2019 11:54:07 GMT
Anyone thought of a secure way to scrape the members-only rate trend info? Not sure what you are asking about, please me more specific.
Thanks
- PM
|
|
|
Post by cinereus on Sept 30, 2019 12:50:39 GMT
|
|
corto
Member of DD Central
one-syllabistic
Posts: 850
Likes: 356
|
Post by corto on Sept 30, 2019 13:05:36 GMT
scrape
verb
1.
drag or pull a hard or sharp implement across (a surface or object) so as to remove dirt or other matter.
"remove the green tops from the carrots and scrape them"
2.
rub or cause to rub by accident against a rough or hard surface, causing damage or injury.
"he smashed into the wall and felt his teeth scrape against the plaster"
|
|
|
Post by Deleted on Sept 30, 2019 13:07:37 GMT
|
|
Stonk
Stonking
Posts: 735
Likes: 658
|
Post by Stonk on Sept 30, 2019 14:48:58 GMT
The URL is incomplete. Since you earlier said "rate trend info", are you referring to the historical market rates page (https://members.ratesetter.com/ratesetter_info/rate_trends.aspx) ?
If so, there's no need to scrape it: there are downloadable CSV files of the data further down the page.
|
|
cobi
Member of DD Central
Posts: 76
Likes: 80
|
Post by cobi on Sept 30, 2019 16:44:29 GMT
Unfortunately this has not been updated ratesetterClientI used to have something but it hasn't worked for a while and I can't be bothered to keep up with the frequent website changes
|
|
|
Post by cinereus on Sept 30, 2019 19:05:42 GMT
The URL is incomplete. Since you earlier said "rate trend info", are you referring to the historical market rates page (https://members.ratesetter.com/ratesetter_info/rate_trends.aspx) ?
If so, there's no need to scrape it: there are downloadable CSV files of the data further down the page.
I don't know why it got cut off. I mean the live market info at members.ratesetter.com/your_lending/lend_money/market_full.aspx?ID=1
|
|
Stonk
Stonking
Posts: 735
Likes: 658
|
Post by Stonk on Sept 30, 2019 20:45:24 GMT
The URL is incomplete. Since you earlier said "rate trend info", are you referring to the historical market rates page (https://members.ratesetter.com/ratesetter_info/rate_trends.aspx) ?
If so, there's no need to scrape it: there are downloadable CSV files of the data further down the page.
I don't know why it got cut off. I mean the live market info at members.ratesetter.com/your_lending/lend_money/market_full.aspx?ID=1
How's your coding? There's nothing too difficult there. It's a very simple web page, and easy to parse once you've got it. The fiddly bit (depending on the language and libraries you use) will be getting the HTML, for which you'll need to emulate being logged in (e.g., visit the login screen and login once, retain the cookies, and pass them with each request for the above URL).
It's been 3 or 4 years since I did such a thing. I imagine it has probably got even easier by now.
|
|
aju
Member of DD Central
Posts: 3,480
Likes: 917
|
Post by aju on Sept 30, 2019 21:38:44 GMT
There are tools that will scrape the screen and some will allow you to actually grab the data quite quickly. I did play with them when we first started on RS late last year. One problem I seem to remember with the screens i wanted to scrape when analysing the data it was not just a case of getting the data but in fact then repatching it back together. One problem was that £355.21 does not actually appear in the resultant as 355.21 but more as 355 . 21 on some entries and 355. 21 and so on it didn't even appear consistently in each row either. Of course one could easily trim the spaces out in that scenario but the inconsistency became quite odd and kept changing each time i hit a new line.
As Stonk says the other issue was getting it to actually pretend to be a legit user, login and then perform the relevant actions to get to the screen you want. I gave up and resorted to simply allowing the screen the be loaded by me and creating a chrome extension - you can copy any that look closely like they may do what you want - that would then simply reload the screen every minute of so and place it in the left hand side and keep it on screen watching for any changes. I found I could be reading docs or working in other screens of RS and still notice changes etc. Not what you want but it did work for me to build up a pattern of changes etc. It did work well though for changes in the last lending rate change until the time/date was removed - it still does to an extent but it seemed that the rate change was not always reporting new lend but relend as well but that's another story.
One other looming problem also may be that the screen you are wanting to scrape looks like RS might be going to remove it soon if the new rate choice tools are anything to go by - I think its highly likely the new rate setting screen will incorporate the screen you are trying to screen scrape thus blowing your work out of the water. I could be wrong but RS is making changes to this area and looks like its consolidating a lot of this stuff into single new screens that would require further potentially extensive changes.
|
|
|
Post by cinereus on Oct 1, 2019 10:41:55 GMT
How's your coding? There's nothing too difficult there. It's a very simple web page, and easy to parse once you've got it. The fiddly bit (depending on the language and libraries you use) will be getting the HTML, for which you'll need to emulate being logged in (e.g., visit the login screen and login once, retain the cookies, and pass them with each request for the above URL).
It's been 3 or 4 years since I did such a thing. I imagine it has probably got even easier by now.
I'm fine with scraping in general (even with cookies) but these cookies expire.
|
|
|
Post by cinereus on Oct 1, 2019 10:42:30 GMT
There are tools that will scrape the screen and some will allow you to actually grab the data quite quickly. I did play with them when we first started on RS late last year. One problem I seem to remember with the screens i wanted to scrape when analysing the data it was not just a case of getting the data but in fact then repatching it back together. One problem was that £355.21 does not actually appear in the resultant as 355.21 but more as 355 . 21 on some entries and 355. 21 and so on it didn't even appear consistently in each row either. Of course one could easily trim the spaces out in that scenario but the inconsistency became quite odd and kept changing each time i hit a new line. No matter what language you use, parsing the data is the easy bit with a touch of regex.
|
|
Stonk
Stonking
Posts: 735
Likes: 658
|
Post by Stonk on Oct 1, 2019 11:00:15 GMT
How's your coding? There's nothing too difficult there. It's a very simple web page, and easy to parse once you've got it. The fiddly bit (depending on the language and libraries you use) will be getting the HTML, for which you'll need to emulate being logged in (e.g., visit the login screen and login once, retain the cookies, and pass them with each request for the above URL).
It's been 3 or 4 years since I did such a thing. I imagine it has probably got even easier by now.
I'm fine with scraping in general (even with cookies) but these cookies expire.
Each HTTP request returns you an updated set of cookies with a bit longer to live, so to speak. Just like if you were to manually refresh the page in a browser: it would keep you logged in.
(1) Send an HTTP request to simulate logging in (examine the login page HTML to determine how). Store the cookies from the response. (2) HTTP the market data URL, passing the stored cookies. Replace the previously stored cookies with those from the response. (3) Parse HTML etc. Yes, regex is probably best.
(4) Wait a bit. Please! Their systems are slow enough already.
(5) Go to (2).
|
|
|
Post by cinereus on Oct 4, 2019 15:33:22 GMT
Got it working but the cookies still expire even if I scrape every 10 mins.
|
|
Stonk
Stonking
Posts: 735
Likes: 658
|
Post by Stonk on Oct 4, 2019 18:52:52 GMT
The cookies should not expire if you are accurately simulating the way a user/browser would handle them. The first HTTP request (to login) returns a response with cookies. Each and every subsequent HTTP request sends the cookies that were returned with the immediately previous response. I think that's what a browser does. It seemed to work when I used to poll FC's "loans on offer" page.
If you're doing that, and I think you probably are, then I don't know. When the cookies expire, does the response contain the HTML of the login page, or some other error page?
|
|