I’m not very familiar with PHP, or the php scraperwiki library.
Looking at the code I’m wondering if it’s possible that some of the MD5 hashes are getting interpreted as DateTime values.
The next thing that catches my eye is the string interpolation used to create the SQL query here.
To get more detail about what’s happening, you can call R::debug();
, which will make the SQL library print out debugging info about what it’s doing. I’ve pasted some sample output below, and I’ve got a sample at https://morph.io/jamezpolley/PA_v1.
Morph will only capture the first 10,000 lines of output; it looks like this is generating about 25-30 lines of output for every record, so you might have to get a bit creative to get useful output that shows the records which are being skipped. If you can do anything to narrow it down (maybe look at the records that make it through and drop those from the source?) that should help.
<br>
SELECT name FROM sqlite_master
WHERE type='table' AND name!='sqlite_sequence';<br>
Array
(
)
<br>
resultset: 1 rows<br>
INSERT or REPLACE INTO data (id, datestamp, page, phone, position, dealfeatures, dealheadlines, data, monthly, upfront, total, contract, networkbrand, fullcode, prikey, notes) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)<br>
Array
(
[0] => 1350
[1] => 19-06-2019 09:33
[2] => https://www.uswitch.com/mobiles/samsung-galaxy-note-9-deals/?variant=128gb-black&data=30000&monthly_cost=0-40&upfront_cost=0-150&sort_by=monthly_cost&resellers=true&networks=ee
[3] => SamsungGalaxy Note 9 deals
[4] => 6
[5] => Upfront costFreeTotal cost£804.00Contract length24 months£60 cashback
[6] => 30GB data £ 36.00 per month
[7] => 30GB data
[8] => £ 36.00 per month
[9] => Upfront costFree
[10] => Total cost£804.00
[11] => Contract length24 months
[12] => EE via Buy Mobile Phones
[13] => EE via Buy Mobile PhonesSee Deal 30GB data £ 36.00 per month Upfront costFreeTotal cost£804.00Contract length24 months£60 cashback
[14] => 9ba71fa19951ae2a2e23d2007a96b6ee
[15] =>
)