Network Troubleshooting

Network issues are common; the symptoms vary; the actual cause is often somewhere unexpected. The systematic approach: narrow down where in the network the problem is, before guessing at fixes.

This page covers the diagnostic toolkit and the workflow.

The systematic approach

When a network problem appears:

Reproduce: confirm the issue. Does it happen reliably or intermittently?
Narrow scope: client problem? Network path? DNS? Server?
Tools per layer: each layer has tools to test it
Fix at the right layer: don't fix server when DNS is the problem

Layer-by-layer tools

DNS

dig example.com
nslookup example.com
host example.com

dig is the most flexible:

dig example.com           # A record
dig example.com AAAA      # IPv6
dig example.com MX        # mail
dig +trace example.com    # show full resolution path
dig @8.8.8.8 example.com  # specific resolver

If DNS doesn't resolve, that's the problem. If it resolves but to wrong IP, propagation or config issue.

Reachability

ping <host>
ping6 <host>  # IPv6

Basic round-trip test. If ping fails, host is unreachable or filtering ICMP. Many cloud networks block ICMP.

ping doesn't test application — just basic IP connectivity.

Path

traceroute <host>
mtr <host>     # combines ping + traceroute, continuous

Shows each hop. Useful for identifying where packets are lost or delayed.

mtr is more useful than traceroute for ongoing debugging — refreshes continuously.

Port reachability

nc -zv <host> <port>     # check if port is open
telnet <host> <port>     # interactive
nmap -p <port> <host>    # more thorough

If port unreachable: firewall, security group, or service not listening.

TCP details

ss -tnp                  # active connections
ss -tlnp                 # listening sockets
ss -s                    # summary
netstat -an | grep ESTAB # legacy version

Shows what's connected to what. Important for diagnosing connection-pool issues, port exhaustion, etc.

HTTP

curl -v https://example.com
curl -I https://example.com    # headers only
curl --resolve host:80:1.2.3.4 ...   # override DNS

Verbose curl shows the full TLS handshake, headers, response. For HTTP-level issues, this is the workhorse.

curl --trace-ascii out.txt ...  # full trace

For deep debugging.

Packet capture

tcpdump -i any -nn host <host>
tcpdump -i any -nn -w capture.pcap host <host>

Capture and analyze. The .pcap file opens in Wireshark for visualization.

For debugging issues that need to see actual packets — TCP retransmits, missing handshakes, malformed headers.

TLS

openssl s_client -connect host:443
openssl s_client -connect host:443 -servername sni.example.com

Tests TLS handshake. Useful for cert issues, SNI problems, protocol mismatches.

Common diagnostic flows

"Site is slow"

time curl -o /dev/null https://example.com — total time
curl -w '@curl-format.txt' ... — break down by phase
- time_namelookup (DNS)
- time_connect (TCP)
- time_appconnect (TLS)
- time_starttransfer (TTFB)
Identify which phase is slow; investigate that

"Can't connect"

dig <host> — DNS works?
ping <host> — IP reachable? (may be blocked)
nc -zv <host> <port> — port open?
curl -v https://<host> — application responds?

If DNS fails, fix DNS. If port closed, fix firewall/security group. If application fails, server-side issue.

"Intermittent failures"

mtr for several minutes. Look for:

Packet loss at specific hop
Latency spikes
Routing changes

Often a network mid-path issue that's not your immediate infrastructure.

"TLS error"

openssl s_client -connect host:443 -showcerts

Examine the cert chain. Common issues:

Expired cert
Wrong CN/SAN
Missing intermediate certs
Untrusted CA

Cloud-specific tools

AWS

VPC Flow Logs
VPC Reachability Analyzer
Route 53 Resolver query logging

Kubernetes

kubectl exec into pod, run standard tools
kubectl logs
Service mesh tooling (Istio dashboards, etc.)

Containers

Network namespaces; ip netns
Inside-container tools may be limited; install on need

Common pitfalls

Different DNS in different places

Local DNS resolver, application DNS cache, VPC resolver — may give different answers.

Caching obscuring problems

Browser cache, CDN cache, DNS cache. When debugging, work to bypass caches.

Logging at the wrong layer

Web server logs don't show network errors. Application logs may not show TLS issues. Look at the right place for the right symptom.

Time differences

Client clock vs. server clock matters for TLS (cert validity windows, JWT expiration).

MTU issues

Fragmented packets through tunnels. Ping with -M do -s 1472 to test path MTU.

Common failure patterns

Guessing instead of measuring. Assume DNS works without checking.
Fixing symptoms, not causes. Restart the app when the network is the problem.
No baseline. Don't know what "normal" looks like.
Logs not preserved. Issue resolves; logs gone; can't analyze later.
Trusting the network is healthy. It isn't, sometimes.