Production-ready microservice in Rust: 11. OpenTelemetry

Posted 2023-11-19 12:40:00 by sanyi ‐ 7 min read

Forwarding tracing data to SigNoz using OpenTelemetry

In this article I will implement the basics of OpenTelemetry integration. What is OpenTelemetry? According to their site opentelemetry.io:

"OpenTelemetry is a collection of tools, APIs, and SDKs. Use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software’s performance and behavior."

We already use tracing in this application, but the events are simply written to the standard output. Using OpenTelemetry we can forward this data into various data collectors.

There are multiple open-source options such as Jaeger, Zipkin and SigNoz. I used SigNoz for these tests.

The test setup is quite simple. Clone the SigNoz repository:

git clone -b main https://github.com/SigNoz/signoz.git && cd signoz/deploy/

and run the install.sh in the signoz/deploy directory:

./install.sh

The script will setup a test environment using docker-compose, it takes a few minutes to create all the containers and initialize the database.

When the setup completes, open the administration console and create the default admin user at http://localhost:3301/.

After login you will see the traces of the SigNoz application itself.

SigNoz listens for OpenTelemetry data via the OTLP protocol on port 4317, so our rust application will access it on this url: http://localhost:4317

First I added a few more endpoints to the application: a simple GET /v1/dogs endpoint to list all the dogs from the database and a GET /v1/dogs/:id endpoint to get information about a single dog.

The new handlers in api::handlers::dogs.rs:

pub async fn list(
    State(state): State<Arc<ApplicationState>>,
) -> Result<Json<DogListResponse>, AppError> {
    let dogs = Dog::find().all(state.db_conn.load().as_ref()).await?;
    let n = dogs.len();

    let response = DogListResponse {
        status: Status::Success,
        data: dogs,
    };

    tracing::info!("number of dogs: {}", n);

    Ok(Json(response))
}

pub async fn get(
    State(state): State<Arc<ApplicationState>>,
    Path(dog_id): Path<i32>,
) -> Result<Json<DogGetResponse>, AppError> {
    let dog = Dog::find_by_id(dog_id)
        .one(state.db_conn.load().as_ref())
        .await?;

    let response = DogGetResponse {
        status: Status::Success,
        data: dog,
    };

    Ok(Json(response))
}

The corresponding data structures in api::response::dog.rs:

#[derive(Serialize)]
pub struct DogListResponse {
    pub status: Status,
    pub data: Vec<Model>,
}

#[derive(Serialize)]
pub struct DogGetResponse {
    pub status: Status,
    pub data: Option<Model>,
}

The extended routing in api::v1.rs:

pub fn configure(state: Arc<ApplicationState>) -> Router {
    Router::new()
        .route(
            "/hello",
            get(handlers::hello::hello).with_state(state.clone()),
        )
        .route(
            "/login",
            post(handlers::login::login).with_state(state.clone()),
        )
        .route(
            "/dogs",
            post(handlers::dogs::create)
                .with_state(state.clone())
                .route_layer(middleware::from_fn_with_state(state.clone(), auth)),
        )
        .route("/dogs", get(handlers::dogs::list).with_state(state.clone()))
        .route("/dogs/:id", get(handlers::dogs::get).with_state(state))
}

These endpoints are not authenticated, anyone can retrieve information about the dogs.

Now move on to the OpenTelemetry integration. First, we have to remove the global tracing subscriber from main.rs:

let subscriber = Registry::default()
    .with(LevelFilter::from_level(Level::DEBUG))
    .with(tracing_subscriber::fmt::Layer::default().with_writer(std::io::stdout));

tracing::subscriber::set_global_default(subscriber).expect("Failed to set subscriber");

because we will create a new subscriber in commands::serve.rs and the subscriber cannot be updated, we can set it only once during application initialization.

We have to add a few new dependencies in Cargo.toml and update some existing ones:

[dependencies]
tracing-subscriber = { version = "0.3.18", features = ["registry", "env-filter"] }
tracing-opentelemetry = { version = "0.22" }
tower-http = { version = "0.4.3", features = ["trace"] }
opentelemetry = { version = "0.21.0", features = ["metrics", "logs"] }
opentelemetry_sdk = { version = "0.21.1", features = ["rt-tokio", "logs"] }
opentelemetry-otlp = { version = "0.14.0", features = ["tonic", "metrics", "logs"]  }
opentelemetry-semantic-conventions = { version = "0.13.0" }
opentelemetry-http = "0.10.0"

The version numbers are important, because there are incompatibilities between different versions.

I introduced a new configuration option too, to specify OpenTelemetry collector url in settings.rs:

#[derive(Debug, Deserialize, Default, Clone)]
#[allow(unused)]
pub struct Tracing {
    pub otlp_endpoint: Option<String>,
}

#[derive(Debug, Deserialize, Default, Clone)]
#[allow(unused)]
pub struct Settings {
    #[serde(default)]
    pub config: ConfigInfo,
    #[serde(default)]
    pub database: Database,
    #[serde(default)]
    pub logging: Logging,
    #[serde(default)]
    pub tracing: Tracing,
    #[serde(default)]
    pub token_secret: String,
    #[serde(default)]
    pub token_timeout_seconds: i64,
}

Most of the changes affect the commands::serve.rs module. The main part in the start_tokio method:

global::set_text_map_propagator(TraceContextPropagator::new());

let otlp_endpoint = settings
    .tracing
    .otlp_endpoint
    .clone()
    .unwrap_or("http://localhost:4317".to_string());

let tracer = init_tracer(&otlp_endpoint)?;

let telemetry_layer = tracing_opentelemetry::layer().with_tracer(tracer);
let subscriber = tracing_subscriber::registry()
    .with(LevelFilter::from_level(Level::DEBUG))
    .with(telemetry_layer);

subscriber.init();

let _meter_provider = init_metrics(&otlp_endpoint);
let _log_provider = init_logs(&otlp_endpoint);

We setup trace data propagation, then fetch the otlp_endpoint url from settings, defaulting to http://localhost:4317.

Initialize the tracer, create an OpenTelemetry layer for the tokio tracing library with this tracer and setup the subscriber to use the OpenTelemetry layer for data processing. Finally we initialize the subscriber.

We setup a metrics and a logs provider too, but we are not using them yet, that's why the variable names are prefixed with underscores.

Now the three initialization methods. First, init_tracer:

fn init_tracer(otlp_endpoint: &str) -> Result<sdktrace::Tracer, TraceError> {
    opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(otlp_endpoint),
        )
        .with_trace_config(
            sdktrace::config().with_resource(Resource::new(vec![KeyValue::new(
                "service.name",
                "shelter-project",
            )])),
        )
        .install_batch(runtime::Tokio)
}

This is a fairly standard OTLP initialization, we only pass an endpoint url and a service.name configuration to the tracer.

Similarly for metrics:

fn init_metrics(otlp_endpoint: &str) -> opentelemetry::metrics::Result<MeterProvider> {
    let export_config = ExportConfig {
        endpoint: otlp_endpoint.to_string(),
        ..ExportConfig::default()
    };
    opentelemetry_otlp::new_pipeline()
        .metrics(runtime::Tokio)
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_export_config(export_config),
        )
        .with_resource(Resource::new(vec![KeyValue::new(
            opentelemetry_semantic_conventions::resource::SERVICE_NAME,
            "shelter-project",
        )]))
        .build()
}

and logs:

fn init_logs(otlp_endpoint: &str) -> Result<opentelemetry_sdk::logs::Logger, LogError> {
    opentelemetry_otlp::new_pipeline()
        .logging()
        .with_log_config(
            Config::default().with_resource(Resource::new(vec![KeyValue::new(
                opentelemetry_semantic_conventions::resource::SERVICE_NAME,
                "shelter-project",
            )])),
        )
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint(otlp_endpoint.to_string()),
        )
        .install_batch(runtime::Tokio)
}

One more thing before we start to test our application: add instrumentations on our dog handler methods:

#[instrument(level = "info", name = "create_dog", skip_all)]
pub async fn create(...) { ... }

#[instrument(level = "info", name = "list_dogs", skip_all)]
pub async fn list(...) { ... }

#[instrument(level = "info", name = "get_dog", skip_all)]
pub async fn get(...) { ... }

Check the linked sources for some missing details, like required use statements!

Now we can compile and start the application:

$ cargo build
$ export SHELTER__DATABASE__URL="postgresql://postgres:[email protected]/shelter"
$ ./target/debug/shelter_main serve

Call the list endpoint a few times to generate some data:

$ curl http://127.0.0.1:8080/v1/dogs | json_pp

Now if you go to SigNoz on http://127.0.0.1:3301/ you can see the results. First select the Services menu to check that you can see the shelter-project application. It may not appear instantly, give it a minute or two and reload the page. When it appears go to the Traces menu and look for the Service Name filter. Clear all checkboxes and select the shelter-project only - all other traces are related to the SigNoz application itself.

Now you will see something like this:

.

entries with request and list_dogs operations.

If you click on one of the request operations you will see that it encapsulates a list_dogs operation. If you click on the list_dogs operation and select the Events tab on the right side you can even see the details of each event, like the exact SQL query executed and our tracing::info! call with the number of the dogs.

.

You can further browse the interface to see metrics like average duration of the operations, rate of the requests, etc.

You can find the sample code on GitHub

Next article »

Tags:
rust microservice dog-shelter opentelemetry signoz