Hugging Face Datasets uses Apache arrow for manipulating the data and to add a column hugging face has provided an extension to do so.
This got released in the latest version of the library as of the blog publishing date.
Add New Column
Code is straightforward for doing the same but with few minor observations.
When you load the datasets you get a DataSetDict which provides a dictionary of datasets and you have to choose the key. for eg in the above code, you can find it as a train
Happy coding !!!
Hugging face datasets provide a nice interface to load different types of ML datasets. It comes with a cleaner interface to load, process, and save data.
Install
You can use the following pip command to install the Datasets library.
pip install datasets
Load Data
To load CSV data, You need to use the load_datasets interface for the same.
Save/Convert to_csv
To Save or convert to CSV, You can use the following code.
System.Text.Json is one of the most used JSON libraries in dotnet, and it supports Pascal casing by default and provides an extra Camel casing support as the configuration parameter.
Let's start our Custom Naming Policy
Domain Logic
First is the domain or algorithm to convert the string to a snake case.
I got this from an Entity framework core project, and they support different naming Policies.
The next step is to Override the JsonNamingPolicy.
Now that the Snake Case Policy is available, it's time to add it to the Json Serialization Options property.
The JsonSerializerOptions class should not be created multiple times to optimize the code, as is the naming policy. We have created a static instance of the Naming class.
TLDR version, If you are trying to find how to inject the IConfiguration, Well you don't need to as it's already available in the builder.
If you are using dotnet 5 or the earlier version then you need to inject the configuration object into the startup and use the same.
Now with .net 6 its become quite easy.
Simply and good-looking code in dotnet 6.
In the side hustle which I am working on, I am using AWS Cloudformation SDK with Serverless stack to create DynamoDB.
When it comes to coding in c#. I am using the dynamodb object persistence layer for interacting with the DB. Object Persistence model using the C# attribute to specify the Table name.
For Example
Serverless stack or Cloudformation in this case creates the Table name dynamically based on the env for example in the case of Dev the name of the table will be dev-todo-storage.
Since it’s an attribute in .net, we cant dynamically replace it and it's a mandatory attribute for the object persistence to work.
Finding a WorkAround
After some searching around the dynamodb code, I came across the DynamoDBOperationConfig class. This class provides functionality to override the table name dynamically.
Simple but not easily searchable solution.