Attempting to deploy the Netem service fails after deploying kubernetes but is ok before

Actually someone ran into similar behaviour lately: discovery/enoslib#70

But in my local setup, making sure that roles= discover_networks(roles, networks) is called before the Netem service solved the issue.

For me, I run the roles= discover_networks(roles, networks) and it still fails. I solved the problem by applying Netem before calling Kubespray. PS: the following modified version of cli.py/deploy() has support for embedded Netem config in config.yaml:

def deploy(provider, force, conf, env, only_netem):
    config = load_config(conf)

    # Extract tc config and leave the 
    # provider's config object intact
    tc = None
    if "tc" in config[provider]:
        tc = dict(config[provider]["tc"])
        config[provider].pop('tc', None)    

    t.PROVIDERS[provider](config, force, env=env)
    t.inventory(env=env)
    
    if tc is not None:
        t.netem(config=tc) 

    t.prepare(env=env)
    t.post_install(env=env)
    t.hints(env=env)

However, before changing the order of the calls, the Netem deployment used to fail mysteriously with an undefined object error. Can you try your deployment on nantes/ecotype and nancy/grvingt ?

Hi @kmanaoui,

I have debugged this today. The short story is that the deployment is creatign a kube-ipvs0 (with a dash). During the deployment of the Netem service, there's a lookup based on the interface names in the Ansible facts. The corresponding fact in Ansible is keyed with ansible_kube_ipvs0 (with underscores). This make the lookup fail and crash the Netem deployment.

I'll apply a patch in EnOSlib to be a bit more defensive. After the release you should be able to apply the network configuration wherever you want.

!!

Nevertheless we should be careful with the Netem service when traffic is handle by some SDN systems (like calico). Depending on how the traffic is isolated (e.g encapsulated in VLANs) there might be some surprise. For instance, the traffic limitations may be ok at the host level but inefficient from the application point of view. So we/(you:)) must double check that the limitations are set correctly for the kind of traffic you are studying.

mentioned in commit discovery/enoslib@a1d7337e

Hi Matthieu,

Thank you very much for your email. I have started using the new version and it's working great. Yes, I have actually thought about the Calico stuff and hopefully it's working because its natively using flannel for Kubernetes and flannel is based on a VXLAN encapsulation but eventually packets get forwarded using the host IP so the latency is well applied to the application containers too. Thanks again for your great efforts.

Best regards ++

closed

Attempting to deploy the Netem service fails after deploying kubernetes but is ok before

Child items ...

Activity

Admin message

Attempting to deploy the Netem service fails after deploying kubernetes but is ok before

Activity