Gebruiker:RonaldB/Open proxy fighting (old)

Uit Wikipedia, de vrije encyclopedie
This page will be updated soon


I have been working for some months now to develop the most efficient and effective method to fight open proxies. After various trials, the following approach turned out most optimal:

  • Acquire ip's from various internet sources and put these in a database (now >120k records). Some 20 sources are used, which show quite some (partial) overlap. So a lot of copy/paste is taking place. The db is growing by some 500+ ip's/day.
  • Run repeatedly scans, checking those ip's and store the result in the db as well. A hit is bingo, but a non-hit says nothing. As PC's may be on or off, experience (querying the db) shows that it may take up to some 100 trials and 60 days to get a hit on some occasions. Roughly 25% of the total has been confirmed at any moment (confirmed and not confirmed are growing continuously). Scanning is done at a speed between 5 and 20 ip's per second, depending on various parameters. The principle is based on requesting a specific page from a proprietary server via the suspect ip as proxy. This page returns http header information, which is interpreted by the scanning program.
  • The scanning system discovers cascaded proxies (open proxies using other (zombie) computers to connect to the server) and earmarks the exit nodes.
  • The same (automatic) grabbing is done for TOR, distinguishing exit and onion nodes.
  • TOR exit nodes are blocked as a precaution (> 10k now - see here for the list). The same for (cascaded) exit servers (some 1000 now - see here for the list). This blocking is automated (batch-wise).
  • Other open proxies are blocked on an ad-hoc basis (via template, etc.)
  • The database is also (manually) being expanded with provider proxies, proxies for mobile use, web based proxies (anonymizers) and alike.
  • Because all info is stored in a db, some odds and ends become apparent. As an example some ip ranges of a mobile operator which appear open to the entire internet.
  • The database, in particular the ports (some tens are more or less frequently used, but the db contains >500 unique confirmed ports), are kept secret for obvious reasons.
  • Fed by the irc rc channel, another program is querying the database and reporting each open proxy edit on a special page. Test runs on nl:w, en:w, de:w and fr:w gave the results as shown below.

Since the systematic approach being initiated, spamming, trolling etc. on nl reduced dramatically. I cannot prove whether this is directly a result of the systematic approach, but there must be some correlation.


Test runs on some wiki's[bewerken | brontekst bewerken]

A bot is monitoring the rc feeds from irc for several of the larger wiki's and queries the ip edits against the open proxy database. These test runs started 25-12-06 03:30 (CET) and have been running around the clock with some minor interruptions only.

The following table gives the results. Numbers are averages per 24 hours.


Results from 24-12-06* till 24-01-07
Numbers are per 24 hrs nl:w en:w de:w fr:w en:b com
Total edits 15.200 223.000 46.000 27.000 1.300 11.900
IP edits 1.500 51.000 10.400 4.100 170 420
  - as percentage of total 9,7% 23% 23% 15,1% 13,3% 3,5%
Open proxy edits 2 135 44 15 1 3
  - as percentage of IP edits 0,11% 0,26% 0,43% 0,36% 0,5% 0,64%


Notes (*)
* en:b was added 27-12-06 03:15 (CET), commons was added 8-1-07 02:20 (CET)


  • Open proxies found include: normal OP's, cascaded exit nodes and TOR exit nodes. The OP's are stored in a database to enable some future analysis.
  • The difference between the results of nl:w compared to the others seems significant. For mutual differences between the non nl:w results, the amount of data may not be big enough yet to draw statistically significant conclusions.


Same scans, some weeks later, spanning a period of 60 hours


Results from 14-02-07 till 17-02-07
Numbers are per 24 hrs nl:w en:w de:w fr:w
Total edits 13.000 282.000 49.000 31.000
IP edits 1.500 65.000 12.200 4.700
  - as percentage of total 11,2% 23% 25% 15,3%
Open proxy edits 1 143 85 9
  - as percentage of IP edits 0,05% 0,22% 0,7% 0,19%


Some statistical trends[bewerken | brontekst bewerken]

Growth over time of the (normal) open proxies in the database

The graph at the right shows how the database of "normal" open proxies has grown over time. The trend till January 2007 can be understood easily. As more ip's are already contained in the db, the probability that an ip is "new" decreases. A kind of saturation effect, assuming the total amount of open proxies is finite. New ones may be added, others may be reconfigured and not open anymore.
However the upward trend from January onwards is remarkable. This could be a side effect of the "Happy New Year" worm (containing postcard.exe as attachment). This worm is believed to install a trojan horse on the victim computers and it took some time till the major mail servers started to block this worm.


Issues (WIP)[bewerken | brontekst bewerken]

  • Initially I assumed that the number of TOR exit nodes would saturate to a certain value. This appears to be not the case. Although there is a hard-core of TOR exit nodes, there are apparently quite some ip's acting as exit node for a while and then disappearing. The database is therefore extended with more date/time information, enabling the auto unblocking of exit nodes that have not been seen anymore for a certain period of time.
  • All programs are adapted in order to enable the conversion of the database to a type that is more scaleable.
  • The program monitoring the rc feeds from irc is now capable to report automatically an edit by some type of open proxy (see example). This feature is now active on nl:w for some months. It will soon be implemented for the other major languages.
  • Some heuristic methods are considered as extension for this program, i.e. the detection of edits by multiple ip's on the same article within a short period of time.
  • The analysis tools are continuously being enhanced. The most recent extension involves the addition of suspect ip's to the database, which can periodically be inspected with multiple ports.
  • The major challenge at the moment is the discovery of zombie networks, i.e. computers infected by some sort of virus (worm, trojan or backdoor). Some progress has been made on this issue.


To be continued ...