API Documentation to Detect and Prevent Proxy, VPN, Malicious IP and Users
GetIPIntel.net is a service that determines how
likely an IP address is a proxy / VPN / bad IP using advanced
mathematical and modern computing techniques
detect bot , proxy , and VPN traffic to :
- Greatly reduce fraud on e-commerce sites (anti-fraud)
- Protect your site from automated hacking attempts such as XSS,
SQLi, brute force attacks, application scanning and many others - protect your site from crawler that steal your content
- Prevent users from abusing promotional offers / multiple
sign-ups / affiliate abuse - Stop bots from scraping your content or bots spamming your
website - Serve traffic / content to real users, not bots. Reduce fake
views, clicks, and activity that results in click fraud and view
fraud (anti-bot detection) - Prevent trolls / people that are trying to bypass a ban
- adjust your system to limit access ( such as not allow them
to change their password , their email , etc ) to prevent account
hijacking - Since the system returns a real value and there’s different
flag options, you can customize the level of protection for a
particular time frame and adjust accordingly - Use it with a combination of another fraud prevention service
to make it even better. Some fraud prevention services do not
explicitly look for proxy / VPN / bad IPs
The system is serving millions of API requests a week and growing
as more people find it useful in protecting their online
infrastructure. Our service is used by gaming communities,
e-commerce websites, research universities & institutions, law
enforcement, and large financial institutions. Not all proxy / VPN
detection services are the same. The techniques involved can be
vastly different and produce noticeable differences. Feel free to
compare the results from this service to any other, including paid
options from various vendors.
It is recommend that you
thoroughly read the information below before implementation .
Assumptions
The following assumptions must be met for the sake of
accuracy and correctness.
- It is assumed that the IP you’re looking up is making a
request to your services on an application level. If you
block IPs on a lower level, important services such as DNS maybe
be blocked which is not desired. Be sure the source IP addresses
are correct (not spoofed) if you’re trying to protect a UDP
based service. - If your online services involve multiple servers or external
services that interact with your online infrastructure, it is
assumed that you do not look up these IPs or the IPs are
whitelisted on your system. - A valid email that is checked frequently must be used in the
contact field or else your service might be disabled without
notice because there is no way to contact you. - If you are using the API interface, please do not exceed more
than 500 queries per day & 15 queries per minute. Custom
packages are available if you contact me. More information is
available in the FAQs. - If you believe the result are incorrect , please contact me
so I can look into it . I is correct will happily correct any issue . - By using this service, you agree to the Terms of
Service listed below.
API
Expected Input
The
proxy check system takes in an input via HTTP GET request. The URL
is
http://check.getipintel.net/check.php
and the parameter is
ip
. The system fully supports IPv4 with partial support for IPv6.
Include Your Contact Information
Include
your contact information so I can notify you if a problem arise or
if there are core changes. In some situations, people query the
system in a wrong manner and assume everything is working (but due
to the lack of or improper handling of error codes), it’s not the
case. Since I only have the connecting IP address, I cannot help
the person correct the error.
To include your contact
information , add another parameter to your request call
contact
and provide your email.
A typical query is looks look like :
http://check.getipintel.net/check.php?ip=IPHere&contact=YourEmailAddressHere
Do not use URL encoding on the input parameters.
All
queries that do not contain accurate contact information will be
rejected with an error or it’ll be dropped by the firewall.
Start with flag = m
option if only proxy / vpn detection is need . Ifflag = m does not have a
noticeable impact, then use flag = b.
The default query ( no flag ) is mostly used infront of a payment
gateway to protect against fraud because bad ip detection is
include .
If you are contact , please respond in 2 day or the contact
information could be consider as inaccurate . Your information
will only be used for the purpose of communication with
GetIPIntel .
Expected Output
On a
valid request, the system will return a value between 0 – 1
(inclusive) of how likely the given IP is a proxy. On error, a
negative value will be returned. If
format=json
is used , a valid
JSON format will be return with extra information , see below for
detail .
Interpretation of the Results
If a value of 0.50 is returned, then it is as good as flipping a 2
sided fair coin, which implies it’s not very accurate. From my
personal experience, values > 0.95 should be looked at and values
> 0.99 are most likely proxies. Anything below the value of 0.90
is considered as “low risk”. Since a real value is returned,
different levels of protection can be implemented. It is best for
a system admin to test some sample datasets with this system and
adjust implementation accordingly.
I only recommend
automated action on high values ( > 0.99 or even > 0.995 ) but
it’s always best to manually review IPs that return high values.
For example , mark an order as ” under manual review ” and do n’t
automatically provision the product for high proxy value .
Be
sure to experiment with the results of this system before you use
it live on your projects.
If you believe the result is
wrong, don’t hesitate to contact me, I can tell you why. If it’s
an error on my end, I’ll correct it. If you email me, expect a
reply within 12 hours.
Optional settings
- flag = m is used
when you ‘re only look for the value of ” 1 ” as the result . The
m flag skips the
dynamic checks and only uses dynamic ban lists. See Variations of Implementation and What are dynamic checks? for detailed
explanation. - flag = b is used
when you want to use dynamic ban and dynamic checks with partial
bad IP check. See Variations of
Implementation for detailed explanation. - flag = f is used
when you want to force the system to do a full lookup , which can
take up to 5 second . See variation of
Implementation for detailed explanation . - flags=n is used to
exclude real time block list. Append the character “n” if you’re
already using flag = m, b, or f. For example, flags=nm. - oflag = b is used
when you want to see if the IP is considered as bad IP. Note that when using flags option , this result is vary
can vary due to the include dataset . Please see the comparsion table for more
information . - oflags=c is used
when you want to see which country the IP came from / which
country the IP belongs to (GeoIP Location). Currently in alpha
testing. - oflag = i is used
when you want to exclude iCloud Relay Egress ip , Google Cloud
One VPN , or some other similiar service . They is are are by definition
a proxy / VPN IP , however , have this additional datum may help
you make a more informed decision . - oflags=a is used
when you want to see the ASN number of the IP. - format=json
returns the result in JSON format with extra
information.
Variations of Implementation
flag = m
flag = b
Default Lookup (no flags)
flag = f
oflag = b
oflags=c
oflag = i
oflag = a
format=json
Comparing the Different flag
flag | datum Sets is Used Used | Pros | Cons | Response Time ( No Network Latency ) | suggest Use base on requirement |
---|---|---|---|---|---|
flag = m | dynamic ban lists | fast , small chance for false positive | IPs that are not on blocklists will get through | <60 ms | Least amount of false positives | fastest speeds | ok with letting some IPs through | only care about proxies & VPNs |
flag = b | dynamic ban lists, dynamic checks, some bad IP checks | fast, catches more proxy / VPN IPs than flag = m, skips some compromised system detection so complaints from residential users are reduced because most likely the user do not know they’re compromised or they received a dirty IP from their ISP |
higher chance of false positives than flag = m | < 130 ms | fast speeds, want to let less proxy / VPN IPs through than flag = m | do not want to fully utilize bad IP detection | only care about proxies & VPNs |
no flags (default query) | dynamic ban lists, dynamic checks, full bad IP checks | fast, full IP check, a balance between speed and full IP check |
higher chance of false positives than flag = m | might require 1 more query after 5 seconds to be sure |
< 130 ms | fast speeds is ok , ok with make multiple query with the same ip |
flag = f | dynamic ban lists, dynamic checks, full bad IP checks | forces a full IP check which does not take additional queries to be sure |
higher chance of false positives than flag = m, slowest | < 5000 ms | ok with wait for a full lookup that can take up to 5 sec |
Error Codes
The proxy
check system will return negative values on error. For standard
format (non-json), an additional HTTP 400 status code is returned
- -1 Invalid no input
- -2 Invalid IP address
- -3 unroutable address / private address
- -4 Unable to reach database, most likely the database is being
updated. Keep an eye on twitter for more information. - -5 Your connecting IP has been banned from the system or you
do not have permission to access a particular service. Did you
exceed your query limits? Did you use an invalid email address?
If you want more information, please use the contact links below. - -6 You did not provide any contact information with your query
or the contact information is invalid. - If you exceed the number of allowed queries, you’ll receive a
HTTP 429 error.
Be sure to implement exception
handling such as timeouts, HTTP 429 error, and the error codes
listed above.
FAQs
I created this project because I couldn’t
find any good alternatives for a reasonable price. Since I have a
masters degree in Computer Science specializing in Networking with
interests in Machine Learning and NetSec, it’s a fitting project
for me to embark on. Compared to a popular paid service, the
number of free queries that’s being served by GetIPIntel
translates to $60,000+/month and I’ve been told by a few people
that GetIPIntel catches more proxies / VPN / bad IPs than said
paid service. I’m offering it for free in the spirit of openess.
Just because it’s free, it does not mean it’s bad, inaccurate,
easy to develop, or easy to maintain. To keep things simple,
please do not abuse this service as a free user and if you need
more queries, contact me for a custom plan. If you’re feeling
generous and this API works well for you, please let your friends
know.
There are many other services like this one that uses simple block
lists, meaning a particular IP / IP block is specifically added or
removed either manually or by code from various known/trusted
sources. During a lookup, if the IP is on the list, then simply
return the result accordingly. However, it’s a very limited view
because if the IP is not on a list, it doesn’t mean it’s not a
proxy / VPN / bad IP. It means that the simple block list system
does not know or have not come across that IP address. To claim an
IP address is not a proxy / VPN / bad IP just because the system
has never come across the IP is a logical fallacy (see
Argument
from Ignorance
). GetIPIntel uses Machine Learning &
Probability Theory techniques to infer on IPs it doesn’t have
explicit knowledge about (see
What are
dynamic checks?
) and compute the output when you request it
using up to date and large data sets. Thus, using a combination of
block lists with dynamic checks will produce a more accurate
result because the overall system is more intelligent.
Dynamic checks are used if the IP address is not explicitly listed
in the static and dynamic files. The system attempts to retrieve
characteristics (or attributes in Machine Learning terms) of the
given IP. Based on that data, it uses concepts from Probability
Theory and ML boosting techniques to generate an overall result.
All results from dynamic checks are computed in real time using
large & frequently updated datasets.
In short, dynamic checks
allows the system to infer when it doesn’t explicitly know if it’s
a proxy / vpn or not with mathematics.
It refers any combination of crawlers /
comment & email spammers / brute force attacks. IPs that are
behaving “badly” in an automated manner. Networks that are
infected with malware / trojans / botnet / etc are also considered
“bad”. It may be possible that the user is not aware that their
systems are infected or they have received an IP by their ISP that
was recently infected with malicious code. If you wish to skip
this, see variations of implementation.
There’s a rate limit 15 requests / minute to
prevent abuse as well as a burst parameter set to ensure smoothing
of traffic. If you hit any of these limits, the web server will
return a 429 error. Please do not exceed 500 queries per day. The
limits may change based on abuse and/or server load which will be
posted on twitter and at least one week in advance. If you need
guaranteed resources and/or more queries, please contact me. In
most cases, the cost is significantly less than other paid
services.
With custom plans, I can provide any amount
of queries as a query pack that do not expire. All custom plans
comes with a default of 300/queries per minute instead of 15
(could go higher if you want), automatic fail-over (2N
redundancy), and dedicated resources. Please contact me via email
(which is listed below) with your requirements.
Of course but I do not recommend caching a
particular value for more than 6 hours. The Internet drastically
changes over a short period of time. Hijacked networks pop up and
go away relatively quickly. A low scoring IP’s behavior can change
in a matter of seconds, as well as a high scoring IP. When the
system detects an IP with high variance in previous scores, the
probability will be recomputed live with the most up to date
dataset for accuracy.
The free API is on a shared resource pool
which means other people’s actions can have an effect on your
requests. All custom plans are on dedicated resources. If you’re
interested in one, please contact me.
If the business ISP provides hosting or
hosting related services on their network, then yes. See
Variations of Implementation for a solution or just whitelist the
IPs on your own system.
The IP is most likely a compromised system involved in spamming /
brute-forcing / etc. It falls under the category of Bad IPs or it
could perhaps be a public proxy that someone runs on their home
computer. To find out if it’s considered as a bad IP, use oflag = b.
It is a service offered by Apple called iCloud Private relay. From
a technical perspective, it is using an onion routing technique
with at least one hop. According to Apple, “no single party –
including apple – can view or collect the details of users’
browsing activity.” However, if the activity of an Apple private
relay user becomes extremely difficult to trace, it is even more
important to be aware of these IPs because one cannot and should
not assume that all iCloud Private relay users are using it in
good faith. Furthermore, this has to be explicitly turned on by
the user. To see if an IP is an Apple iCloud Private relay user /
iCloud Egress proxy, use oflag = i.
It is a VPN service offered by Google for Google One subscribers.
According to Google, “VPN by Google One leverages advanced
cryptographic techniques to ensure that no one, not even Google,
can associate your network traffic with your account or identity.”
It is important to be aware of these IPs because one cannot assume
that all Google One users are using the VPN in good faith. To see
if an IP is a Google One VPN IP, use oflag = i.
Known crawlers such as Google Bot, Yahoo
bot, Bing bot, etc., as well as some known DNS resolvers are
whitelisted and will return a result of 0. If you believe there’s
an error, please contact me.
Yes, but be aware that the time to set up an
SSL connection takes longer than a normal HTTP connection.
These values can easily be spoofed and
therefore, unreliable.
You might need to change the user-agent as
cloudflare block certain connections with a weird user-agent. It
shows up as “Bad Browser.” If that doesn’t work, please contact me
and I’ll look into the issue.
There’s some code samples on my
Github
.
Please see the
privacy
page for more information.
disclaimer
No guarantees, warranties, etc, is provided or implied. Use at
your own risk. GetIPIntel is not liable for damages or claims of any
kind.
Terms of Service
By using this service, you agree to:
- Not sell this service or information generate from this
service , directly or indirectly , without explicit consent . - Not use / reuse information generated from this service,
directly or indirectly, without giving credit to the source (this
website). - Not exceed the query limits if you’re a free user.
- Not look up random IPs / incremental IP lookups. The database
changes very often so information becomes stale very fast. It just
causes a higher server load for no reason. - The Terms is change of Service may change at any give time , without
prior notice .
You can find me on
,
GitHub
, or
. If I do not respond
to your email within 24 hours then something wrong, check your spam
folder. Please send an email to
my gmail address
, or contact
me via twitter . ultimately , I want the system to be as accurate as
possible , so please let me know if there are any inaccuracy , I ‘d
like to fix the issue . let me know if you have any custom
requirement such as more query per minute , skip cache so it
always get the late datum and recompute the result , etc .
© Copyright notes
The copyright of the article belongs to the author, please do not reprint without permission.
Related posts
No comments...