Prodfiler

Agent error running on Kubernetes/GKE

Hey guys, I am trying Prodfiler by install it on our dev/test Google Kubernetes Engine (GKE) via Helm. I followed the installation steps and the agents are up and running.

➜  ~ k get po
NAME                  READY   STATUS    RESTARTS   AGE
pf-host-agent-2s9n7   1/1     Running   0          9m43s
pf-host-agent-2x67h   1/1     Running   0          9m43s
pf-host-agent-676ds   1/1     Running   0          9m43s
pf-host-agent-8cbpc   1/1     Running   0          9m43s
pf-host-agent-cffz9   1/1     Running   0          9m43s
pf-host-agent-gfvmc   1/1     Running   0          5m48s

This is what I am seeing when I check the logs of one of the Pod.

time="2021-09-08T10:05:49.696831575Z" level=info msg="Starting Prodfiler Host Agent (revision head-95189364, build timestamp 1631078186)"
time="2021-09-08T10:05:49.706383112Z" level=info msg="Automatically determining environment and machine ID ..."
time="2021-09-08T10:05:49.709119717Z" level=warning msg="Environment tester (azure) failed: failed to unmarshal JSON response: invalid character '<' looking for beginning of value"
time="2021-09-08T10:05:50.000074029Z" level=warning msg="Environment tester (aws) failed: failed to get aws metadata: EC2MetadataRequestError: failed to get EC2 instance identity document\ncaused by: EC2MetadataError: failed to make EC2Metadata request\n\tstatus code: 404, request id: \ncaused by: <!DOCTYPE html>\n<html lang=en>\n  <meta charset=utf-8>\n  <meta name=viewport content=\"initial-scale=1, minimum-scale=1, width=device-width\">\n  <title>Error 404 (Not Found)!!1</title>\n  <style>\n    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n  </style>\n  <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n  <p><b>404.</b> <ins>That’s an error.</ins>\n  <p>The requested URL <code>/latest/dynamic/instance-identity/document</code> was not found on this server.  <ins>That’s all we know.</ins>\n"
time="2021-09-08T10:05:50.000156670Z" level=info msg="Environment: gcp, machine ID: 0x9a4b6e82ed1d212c"
time="2021-09-08T10:05:50.000444182Z" level=info msg="Assigned ProjectID: 588"
time="2021-09-08T10:05:51.351731104Z" level=info msg="Start CPU metrics"
time="2021-09-08T10:05:51.351924603Z" level=info msg="Start I/O metrics"
time="2021-09-08T10:05:51.772353107Z" level=info msg="Found fsbase offset: 2728 (via x86_fsbase_write_task)"
time="2021-09-08T10:05:51.772878920Z" level=info msg="Environment variable DOCKER_HOST not set"
time="2021-09-08T10:05:51.776023127Z" level=info msg="eBPF tracer loaded"
time="2021-09-08T10:05:53.611689163Z" level=info msg="Attached tracer program"
time="2021-09-08T10:05:53.612004334Z" level=info msg="Attached sched monitor"
time="2021-09-08T10:06:06.576204096Z" level=error msg="Failed to invoke AddMetrics(): rpc error: code = Internal desc = failed to process request"
time="2021-09-08T10:06:06.576254294Z" level=error msg="Request to collection agent(data.run.prodfiler.com:443) failed: Internal"
time="2021-09-08T10:06:13.659137955Z" level=error msg="Failed to update container metadata: failed to update container metadata for pid 5840: failed to get kubernetes pod metadata for pid 5840: failed to find matching kubernetes pod/container metadata for containerID 'dc8ef1b19a1a3d380fc22e1ea177a8dea5d684c2afe97156a1a761ea8574963d' in 12 pods"
time="2021-09-08T10:06:13.718268242Z" level=error msg="Failed to update container metadata: failed to update container metadata for pid 5840: failed to get kubernetes pod metadata for pid 5840: failed to find matching kubernetes pod/container metadata for containerID 'dc8ef1b19a1a3d380fc22e1ea177a8dea5d684c2afe97156a1a761ea8574963d' in 12 pods"
time="2021-09-08T10:06:13.778609536Z" level=error msg="Failed to update container metadata: failed to update container metadata for pid 5840: failed to get kubernetes pod metadata for pid 5840: failed to find matching kubernetes pod/container metadata for containerID 'dc8ef1b19a1a3d380fc22e1ea177a8dea5d684c2afe97156a1a761ea8574963d' in 12 pods"

So I am wondering whether the agent can’t run/install something due to the GKE/GCP machines. Btw, these are the machines info.

NAME			STATUS   ROLES    AGE     VERSION            INTERNAL-IP    EXTERNAL-IP      	OS-IMAGE                             	KERNEL-VERSION   CONTAINER-RUNTIME
gke-machine-1   Ready    <none>   8m25s   v1.20.8-gke.2100   10.148.x.x     35.x.y.z   			Container-Optimized OS from Google   	5.4.120+         docker://20.10.3
gke-machine-2	Ready    <none>   13h     v1.20.8-gke.2100   10.148.x.x     34.x.y.z    		Container-Optimized OS from Google   	5.4.120+         docker://20.10.3
gke-machine-3	Ready    <none>   16h     v1.20.8-gke.2100   10.148.x.x     34.x.y.z   			Container-Optimized OS from Google   	5.4.120+         docker://20.10.3
gke-machine-4   Ready    <none>   15h     v1.20.8-gke.2100   10.148.x.x     35.x.y.z   			Container-Optimized OS from Google   	5.4.120+         docker://20.10.3
gke-machine-5   Ready    <none>   13h     v1.20.8-gke.2100   10.148.x.x     35.x.y.z    		Container-Optimized OS from Google   	5.4.120+         docker://20.10.3
gke-machine-6   Ready    <none>   9h      v1.20.8-gke.2100   10.148.x.x     35.x.y.z   			Container-Optimized OS from Google   	5.4.120+         docker://20.10.3

Let me know if you need more info. Thanks!

Hello :grinning_face_with_smiling_eyes:

first of all, thanks for using Prodfiler! :heart:

These types of error

time="2021-09-08T10:06:13.778609536Z" level=error msg="Failed to update container metadata: failed to update container metadata for pid 5840: failed to get kubernetes pod metadata for pid 5840: failed to find matching kubernetes pod/container metadata for containerID 'dc8ef1b19a1a3d380fc22e1ea177a8dea5d684c2afe97156a1a761ea8574963d' in 12 pods"

might not necessarily be a signal that something is not working in the host-agent installation.
We might encounter this error when trying to read metadata for a Pod that is being deleted, but for which the process in the kernel still exists.

Are these error messages frequent, and they appear with a combination of container IDs and PIDs?
Did you find them in all host-agent Pods?