Simplify the job for your Data Teams using Datatune
Data Engineering is hard, and finding the right data engineers for building your data pipelines is even harder.
In every modern company, data flows from dozens of directions: product analytics, billing systems, internal APIs, marketing platforms, cloud storage, user logs, and more.
All of this data is supposed to power dashboards, decision-making, and AI workflows, where non-technical teams query and view dashboards with the data, and technical wrangle with data transformations.
There are tons of tools that attempt to solve this using Agents that generate code, or SQL that runs on your data. Another way to access the data would be to implement a RAG pipeline on top of your data.
However, there is one problem. These solutions won’t fully understand the data.
Here’s an example:
Check out the following data with columns such as: Index, Customer ID, First Name, Last Name, Company, City, Country, Phone 1, Phone 2, Email, Subscription Date, Website
Index,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website
1,DD37Cf93aecA6Dc,Sheryl,Baxter,Rasmussen Group,East Leonard,Chile,229.077.5154,397.884.0519x718,zunigavanessa@smith.info,2020-08-24,http://www.stephenson.com/
2,1Ef7b82A4CAAD10,Preston,Lozano,Vega-Gentry,East Jimmychester,Djibouti,5153435776,686-620-1820x944,vmata@colon.com,2021-04-23,http://www.hobbs.com/
3,6F94879bDAfE5a6,Roy,Berry,Murillo-Perry,Isabelborough,Antigua and Barbuda,+1-539-402-0259,(496)978-3969x58947,beckycarr@hogan.com,2020-03-25,http://www.lawrence.com/
4,5Cef8BFA16c5e3c,Linda,Olsen,"Dominguez, Mcmillan and Donovan",Bensonview,Dominican Republic,001-808-617-6467x12895,+1-813-324-8756,stanleyblackwell@benson.org,2020-06-02,http://www.good-lyons.com/
5,053d585Ab6b3159,Joanna,Bender,"Martin, Lang and Andrade",West Priscilla,Slovakia (Slovak Republic),001-234-203-0635x76146,001-199-446-3860x3486,colinalvarado@miles.net,2021-04-17,https://goodwin-ingram.com/
6,2d08FB17EE273F4,Aimee,Downs,Steele Group,Chavezborough,Bosnia and Herzegovina,(283)437-3886x88321,999-728-1637,louis27@gilbert.com,2020-02-25,http://www.berger.net/
7,EA4d384DfDbBf77,Darren,Peck,"Lester, Woodard and Mitchell",Lake Ana,Pitcairn Islands,(496)452-6181x3291,+1-247-266-0963x4995,tgates@cantrell.com,2021-08-24,https://www.le.com/
8,0e04AFde9f225dE,Brett,Mullen,"Sanford, Davenport and Giles",Kimport,Bulgaria,001-583-352-7197x297,001-333-145-0369,asnow@colon.com,2021-04-12,https://hammond-ramsey.com/
9,C2dE4dEEc489ae0,Sheryl,Meyers,Browning-Simon,Robersonstad,Cyprus,854-138-4911x5772,+1-448-910-2276x729,mariokhan@ryan-pope.org,2020-01-13,https://www.bullock.net/
10,8C2811a503C7c5a,Michelle,Gallagher,Beck-Hendrix,Elaineberg,Timor-Leste,739.218.2516x459,001-054-401-0347x617,mdyer@escobar.net,2021-11-08,https://arias.com/
We will try to apply some filters and also anonymise the personally identifiable information of only women in the data.
prompt = '''
Filter location for the American Continent.
Anonymise personally identifiable information of only women in the data by marking them as anonymised
'''
And this is how the data should look like after the above two operations mentioned in the prompt:
Index,Customer Id,First Name,Last Name,Company,City,Country,Phone 1,Phone 2,Email,Subscription Date,Website,Is_American
0,1,DD37Cf93aecA6Dc,ANONYMIZED,ANONYMIZED,Rasmussen Group,East Leonard,Chile,ANONYMIZED,ANONYMIZED,ANONYMIZED,2020-08-24,http://www.stephenson.com/,True
2,3,6F94879bDAfE5a6,Roy,Berry,Murillo-Perry,Isabelborough,Antigua and Barbuda,+1-539-402-0259,(496)978-3969x58947,beckycarr@hogan.com,2020-03-25,http://www.lawrence.com/,True
3,4,5Cef8BFA16c5e3c,ANONYMIZED,ANONYMIZED,"Dominguez, Mcmillan and Donovan",Bensonview,Dominican Republic,ANONYMIZED,ANONYMIZED,ANONYMIZED,2020-06-02,http://www.good-lyons.com/,True
So the two operations here requires understanding each row of the data, where filtering my country requires making the pipeline look at the address field, and anonymising female names require classifying which names are female in the data.
Traditional Code/SQL agents will not understand each row of the data, and RAG is not explicitly equipped to work with such data structures and respond with the same structure.
This is where Datatune can help you better.
Datatune’s agents can automatically pick up the right operations required for the transformation, and then feed the LLM with the full data and handle the rate limits, and context window issues for you, all in a scalable fashion!
This means that a regular software engineer new to data engineering can easily pick up this job using Datatune Agents to build their Agentic software.
Here’s the full code example to set up Datatune Agents and perform the example above:
Example: https://github.com/vitalops/datatune/blob/main/examples/data_anonymization.ipynb


