Setting Up Proxies in Google Colab and Jupyter
Right, let’s get straight to the business of running proxies in Google Colab or Jupyter—no faffing about. Whether you’re scraping data, bypassing geo-restrictions, or just wanting a bit of privacy, proxies are your go-to mates. There’s a knack to doing it right, though, especially on platforms like Colab and Jupyter that sometimes have their own quirks.
Why Use Proxies with Colab and Jupyter?
Scenario | Benefit of Proxy |
---|---|
Web scraping | Avoiding IP bans |
Accessing geo-blocked APIs | Unblocking content |
Research with anonymity | Masking your digital footprint |
Choosing Your Proxy Source
Now, before you go bush-bashing through the wilds of the internet looking for proxies, let’s make it easy. ProxyRoller is your mate here—offers fresh, free proxies ready to go. More on them in a tick.
Types of Proxies
Type | Description | Typical Use |
---|---|---|
HTTP/HTTPS | Standard web proxies | Web scraping, crawling |
SOCKS4/SOCKS5 | Lower-level, supports more protocols | Streaming, P2P, etc. |
Rotating | Changes IP frequently | Avoiding rate limits |
Residential | Real user IPs, harder to block | Scraping, automation |
For most Colab/Jupyter work, HTTP/HTTPS proxies will do the trick.
Getting Free Proxies from ProxyRoller
- Head over to ProxyRoller.
- Click on the “Get Free Proxies” button.
- Copy the proxy list—looks something like
ip:port
.
Bit of Aussie advice: test your proxies, because free proxies can be fickle, like Melbourne weather.
Configuring Proxies in Google Colab
Colab runs in a virtual machine, so you need to instruct Python (and related libraries) to use a proxy. Here’s how you do it, Arvid-style:
Setting HTTP/HTTPS Proxy for requests
import requests
proxies = {
'http': 'http://username:password@proxy_ip:proxy_port',
'https': 'http://username:password@proxy_ip:proxy_port'
}
response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())
- If your proxy doesn’t need authentication, leave out
username:password@
.
Setting Proxies Globally in Notebook
Sometimes you want everything to go through the proxy. Set environment variables:
import os
os.environ['http_proxy'] = 'http://proxy_ip:proxy_port'
os.environ['https_proxy'] = 'http://proxy_ip:proxy_port'
Now, any library that respects these environment variables (requests, urllib, etc.) will use the proxy.
Rotating Proxies
If you’re scraping like a dingo on a chicken farm, rotate your proxies to dodge bans:
import random
proxy_list = [
'http://ip1:port1',
'http://ip2:port2',
'http://ip3:port3'
]
def get_random_proxy():
return {'http': random.choice(proxy_list), 'https': random.choice(proxy_list)}
for i in range(10):
proxies = get_random_proxy()
response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())
Configuring Proxies in Jupyter Notebook
Much the same as Colab, mate. Here’s the drill:
For requests
and urllib
import requests
proxies = {
'http': 'http://proxy_ip:proxy_port',
'https': 'http://proxy_ip:proxy_port'
}
response = requests.get('https://httpbin.org/ip', proxies=proxies)
print(response.json())
For System-wide Proxy (Jupyter Kernel)
import os
os.environ['HTTP_PROXY'] = 'http://proxy_ip:proxy_port'
os.environ['HTTPS_PROXY'] = 'http://proxy_ip:proxy_port'
For Selenium (Headless Browsers)
If you’re running Selenium in Jupyter (bit of a power move):
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
proxy_ip_port = 'proxy_ip:proxy_port'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(f'--proxy-server={proxy_ip_port}')
driver = webdriver.Chrome(options=chrome_options)
driver.get('https://httpbin.org/ip')
Comparing Proxy Methods
Method | Scope | Tools/Libraries | Use Case |
---|---|---|---|
requests proxies |
Per-request | requests, urllib3 | Scraping, API calls |
Env variables | Global (session) | All libraries (most) | Consistent proxy usage |
Selenium proxy | Browser automation | selenium | Web automation/scraping |
Testing Your Proxy
Always test if your proxy’s working—otherwise, you might be the digital equivalent of bushwhacking in circles.
import requests
proxies = {
'http': 'http://proxy_ip:proxy_port',
'https': 'http://proxy_ip:proxy_port'
}
response = requests.get('https://httpbin.org/ip', proxies=proxies)
print("Proxy IP:", response.json())
If the IP matches the proxy, you’re golden.
Common Pitfalls (and Quick Fixes)
Issue | What it Looks Like | How to Fix |
---|---|---|
Connection timeout | Requests hang, no response | Try a different proxy |
403/407 Proxy Auth Required | Authentication error | Use username/password if needed |
Proxy not working in Colab | No change in IP, errors | Check environment variables |
SSL issues | SSL handshake failed | Use HTTP or set verify=False |
Useful Resources
- ProxyRoller – Free Proxy List
- Python requests documentation
- Jupyter Notebook docs
- Google Colab FAQ
- Selenium Proxy Docs
And there you go—no need to wrestle a croc to get your proxy game on point in Colab or Jupyter. If you need fresh proxies, remember ProxyRoller’s always open and doesn’t bite.
Comments (0)
There are no comments here yet, you can be the first!