DevTurtle logo DevTurtle

LangChain and Pydantic to generate responses in JSON format

guide/ GenAI Guide with LangChain

We have already seen how to use LangChain for creating LLM-based applications and we have tried to integrate it with both Ollama and OpenAI. Now we will try one of the most useful features of this framework: the ability to generate structured responses in JSON format.

One of the main problems of natural language is its unstructured nature which makes it impossible to integrate with classic applications based on deterministic algorithms. It is therefore very important to be able to reduce the variability of the outputs generated by forcing the model to respond based on standardized templates.

The advantages of the JSON format

The JSON (JavaScript Object Notation) format is undoubtedly the standard preferred by developers for several reasons:

  • Readability: It is easy to read and understand for both humans and machines.
  • Compatibility: It is supported by most programming languages ​​and platforms.
  • Flexibility: Can represent complex data structures simply and concisely.
  • Integration: Easily integrates with RESTful APIs, NoSQL databases, and many other modern technologies.

Define the model with Pydantic

In this tutorial we will use Pydantic, a library that allows you to create strongly typed data models in Python. These models can then be used for data validation and serialization. Unlike other validation libraries, Pydantic uses annotations to perform data validation and coercion automatically.

For our example we will try to generate posts for social media that are characterized by text and some tags:

Python
from langchain_core.pydantic_v1 import BaseModel, Field

class SocialPost(BaseModel):
    """Posts for social media"""
    tags: str = Field(description="Post tags")
    text: str = Field(description="Plain text of the post")

To define the model of a generic SocialPost we have:

  • used the docstring that describes the purpose of the class itself;
  • extended the Pydantic BaseModel which represents the base class of the validation framework;
  • used the Field function to define the object attributes with additional meta-information (e.g. descriptions, default values ​​etc…).

Use Pydantic models with LangChain

After defining the template we want to use for the output JSON, all that remains is to use it in our LangChain application:

Python
from langchain_openai import ChatOpenAI
from langchain_core.pydantic_v1 import BaseModel, Field


class SocialPost(BaseModel):
    """Posts for social media"""
    tags: str = Field(description="Post tags")
    text: str = Field(description="Plain text of the post")

llm = ChatOpenAI(model="gpt-3.5-turbo")
structured_llm = llm.with_structured_output(SocialPost)

response = structured_llm.invoke("Can you write a post about a beach holiday?")
print(response)

As you can see, the code is very similar to what we have already seen in the previous articles of this guide with the difference that this time we used the with_structured_output method to generate structured output that respects the format defined by the SocialPost class.

By launching the script in the console we can verify that the result is the expected one:

JSON
{
  "tags": "#beach #holiday #vacation",
  "text": "Dreaming of a beach holiday getaway! The sun, the sand, and the sea calling my name. Can't wait to relax and unwind by the ocean."
}

As you may have noticed, LangChain offers a powerful and flexible solution for generating structured responses in JSON format, making it easier to integrate language models into modern applications. If you are interested in the topic, I invite you to continue learning more about the topic on our blog!