Diptanu Choudhury's Avatar

Diptanu Choudhury

@diptanu.bsky.social

CEO @tensorlake.bsky.social Past - AI Infrastructure at Facebook, LinkedIN, Hashicorp, Netflix

1,976 Followers  |  111 Following  |  69 Posts  |  Joined: 23.10.2024  |  1.9932

Latest posts by diptanu.bsky.social on Bluesky

Job update: a couple of weeks ago, I joined @tensorlake.ai full time. I’m having a lot of fun building the product with @diptanu.bsky.social and the rest of this wonderful team.

We have a few open positions if you’d like to work with us: www.linkedin.com/jobs/search/...

15.09.2025 19:29 β€” πŸ‘ 8    πŸ” 4    πŸ’¬ 1    πŸ“Œ 1

Some more color on this. What does it mean to help models solve interesting problems with MCP?

MCP is the easiest way to give models access to the gazzilion APIs enterprises have already built.

yes, it’s a security nightmare but nevertheless it helps teams with the Day 1 problems of 🚒 MVPs

11.09.2025 17:11 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

@chris.blue your post is spot on about the technical merits(or lack of) of MCP. Just sharing perspectives of people I have talked to :)

11.09.2025 16:31 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

I was a skeptic until I heard banks and hedge funds using MCP to enable models solve interesting business problems. And yes OpenAPI + tool calls would have just worked fine but developers like frameworks and abstractions for building the first MVP. MCP being more prescriptive thank OpenAPI helps.

11.09.2025 16:30 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 1

One of our customers told me today the biggest lift for structured extraction with @tensorlake is that their engineering team can now tweak the schema they want to extract from documents every week as they evolve their insurance platform.

These little things make AI find roots in enterprises.

18.07.2025 04:17 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
πŸš€ Tensorlake Cloud is Here: Document Ingestion and Orchestration for AI Workflows
Tired of brittle pipelines and unreliable document parsing? Tensorlake is the only platform built for mission-critical document ingestion and orchestration i... πŸš€ Tensorlake Cloud is Here: Document Ingestion and Orchestration for AI Workflows

Announcing Tensorlake Cloud

Up-leveling Document Ingestion and Workflows for building agentic applications and complex business workflows.

www.youtube.com/shorts/OCv-...

15.05.2025 16:06 β€” πŸ‘ 8    πŸ” 3    πŸ’¬ 1    πŸ“Œ 2

This would be a problem if you used the data in an agentic application for a real estate business.

This is why Table Structure Understanding models are still the most reliable way to parse tables in business critical applications.

07.04.2025 04:36 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Post image

Here's an instance where Gemini Flash 2 misinterpreted a table, sourced from a residential property's mold report. It mistakenly attributed mold data from outside the property to a bedroom, as it failed to parse some rows from the third column

07.04.2025 04:36 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 3    πŸ“Œ 0

The emergence of browser agents makes me feel we are lacking a google for APIs. Instead developers are making LLMs search for things, and turning websites into APIs by making LLMs click on the UI like humans. This approach is going to take a long time to get right, and wasteful for a lot of reasons

24.03.2025 15:45 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

It works pretty well! I will create an open source repo, with some code which ingests all these PDFs and creates datasets for people to work :)

19.03.2025 05:10 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

. @tensorlake.bsky.social is pretty good! Here is our Document Ingestion API working on a scanned page from the recently released Kennedy assassination classified reports.

Head over to tensorlake.ai for more! (The product is still in early preview)

19.03.2025 05:09 β€” πŸ‘ 6    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

Test prompts instantly with our intuitive JSON schema editor and extract complex tables, figures, and charts with ease.

Once your schema is set, run our API at scaleβ€”tens of thousands of times per dayβ€”ingesting new documents and writing structured data directly into your database or warehouse.

31.01.2025 19:08 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0
Video thumbnail

Structured Extraction is essential for AI engineering teams, we are now making it faster and more reliable than ever, whether you're turning PDFs, invoices, or reports into structured data.
Here is a sneak peak into our Structured Extraction engine.

31.01.2025 19:08 β€” πŸ‘ 12    πŸ” 2    πŸ’¬ 1    πŸ“Œ 0

I don’t live in New York but will come out just for that :)

27.12.2024 21:48 β€” πŸ‘ 7    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Does Dagster have a way of packaging workflows into artifacts along with dependencies? Looks like in your screenshot you are importing requests. Do you build some docker images out of band and make sure the workflow functions run on those images?

24.12.2024 22:23 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

This means we won’t need a dynamodb table anymore right?

19.12.2024 00:03 β€” πŸ‘ 4    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Meant creating an artifact which includes the system and python dependencies. I have seen some tools copying code into a container with system dependencies, which others pickling the code and putting it into a container.

18.12.2024 02:04 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 2    πŸ“Œ 0

Python Folks - which data/workflow engine has the best developer experience for packaging code? We have looked into - Modal, Beam, Airflow, Flyte, AWS Lambda, Prefect, Dagster and Spark. Haven’t seen any approach which is fast, reliable and intuitive.

17.12.2024 16:09 β€” πŸ‘ 10    πŸ” 2    πŸ’¬ 6    πŸ“Œ 0

Taking a break for 10 days for the first time since December last year! January is going to be great and you will hear about @tensorlake.bsky.social more often :)

14.12.2024 20:05 β€” πŸ‘ 6    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

We have been using O1 or Sonnet to solve a problem to understand the upper bounds of what models are capable of, and falling back to our internal models or open source models for economy and security. Been working pretty well, is this a common workflow?

09.12.2024 20:12 β€” πŸ‘ 4    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

I know man. I poured my heart out on nomad back in the day. It was a ton of hard work. Thanks for appreciating :)

04.12.2024 04:19 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

At the Hasicorp ReInvent party, no mention of Nomad and Consul 😭

04.12.2024 03:23 β€” πŸ‘ 5    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Landed in Vegas for reinvent! Say hello if you are around, would love to chat :)

03.12.2024 00:02 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Turned on Apple Intelligence this morning. We are a long way from having a personal assistant on the iPhone!

It wish it summarized all unread from Slack, Gmail, WhatsApp and messages and came up with a list of things I needed to respond :)

02.12.2024 15:35 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

This is why you see Anthropic and OpenAI go up the stack and integrating document sources and building agentic frameworks and building a better chat/workspace product.

01.12.2024 22:58 β€” πŸ‘ 0    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Alibaba has done an amazing job with open source models. At this point, the difference between @Alibaba_Qwen and closed vendors is just the product on top of models.

01.12.2024 22:58 β€” πŸ‘ 1    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0

Assuming you have a great document segmentation model (which is a big assumption) it's going to do the job better than anything else out there.

01.12.2024 21:41 β€” πŸ‘ 2    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Qwen2VL 72B is just better than every other closed and open source vision model for document understanding.

Like every other vision model, it's still incapable of retaining every single ground truth on dense documents.

01.12.2024 21:41 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 1    πŸ“Œ 0
Post image

Throwing the kitchen sink at a small problem. Whenever I work on an Applied AI problem I work with unconstrained compute to see if we can solve a business problem if money was not a constraint. If there is enough value in solving the problem, the economy of scale can kick in later.

30.11.2024 21:21 β€” πŸ‘ 3    πŸ” 0    πŸ’¬ 0    πŸ“Œ 0

Does NVIDIA have a 2 x H100 SKU or cloud vendors are slicing up 8 x H100 machines into 4 VMs?

29.11.2024 22:59 β€” πŸ‘ 3    πŸ” 1    πŸ’¬ 0    πŸ“Œ 0

@diptanu is following 19 prominent accounts