The parser property avito + proxy (version 1.0)

Sold 0
Refunds 0
Good feedbacks 0
Bad feedbacks 0

The parser property website avito (version 1.0):

- Support for a list of proxy servers (to work correctly you need to specify your proxy list)

- The ability to parse several pages of this section property

- PHP parser code contains qualitative comments and will be clear to most programmers and administrators
The parser property website avito (version 1.0)


The script is implemented strictly in accordance with the following terms of reference:


1. Problem:

Write a parser for the site «Avito» for «Real Estate - Residential - Purchase":

http://www.avito.ru/novosibirsk/kvartiry/prodam


2. Requirements:

The parser must be written in PHP5.


3. Proxy:

It is necessary to carry out the parsing through a proxy server, the addresses of which are read from a file proxies.txt.

Selecting a proxy to carry out the order.

The format of the proxy address: [IP-address]: [port].

Example:

187.85.3.3:3128

123.129.240.172:8081

122.96.59.102:81

222.180.173.2:8080


4. Links:

Pars site need after the collection of links and identifying new products.

It should be stored in a reference file avito.txt, which continually add new links found on the site. Identification of new products is done by comparing the found links with existing file avito.txt. New links must be parsed, and has repeated skipped. After parsing the link should be recorded in a file avito.txt, for further comparison.

File format avito.txt:

[Link] [link to the date of detection of the site]

Example:

http://www.avito.ru/novosibirsk/kvartiry/1-k_kvartira_38_m_2424_et._286887052;26.02.2014

http://www.avito.ru/novosibirsk/kvartiry/1-k_kvartira_39_m_1010_et._286885150;26.02.2014

http://www.avito.ru/novosibirsk/kvartiry/1-k_kvartira_34_m_910_et._286885518;25.02.2014


5. The format parsed information:

Parsed ads should be stored in text format in the umbilical cord:

[Source] [Agenstsvo or Private] [phone] [area] [Street] [House] [Number of rooms] [total area] [living space] [kitchen area] [floor] ; [storeys] [material] [price] [contact] [Comment], [link]

Example:

avito;Частное;83482743;Ленинский;Ватутина;12;2;54;35;8;2;5;кирпич;2700;Игорь Petrov; renovated; http: //www.avito.ru/novosibirsk/kvartiry/...


Note: all ads first field is always «avito».

If information is not provided (for example, no house numbers), you must leave the field blank, ie, leave semicolon.


6. Report of the parser:

Upon completion of the parser is necessary to generate a report about parsed information in the form (report - a text file with one line):

[Number of references at the time of parsing] [amount collected links] [number of new products] [number rasprasennyh novelties]

Example:

22437; 22421; 146; 145

Report formirovt better in a separate folder «Reports», and his name should correspond to the date and time of report generation in such a way (for example, the formation of a report in 2014 on March 12 the time - 14 hours 30 minutes 15 seconds): 20140312_143015

Format:

[Year] [month] [date] _ [hours] [minutes] [second]