ggsql: A Grammar of Graphics for SQL

121 points31 comments2 hours ago
jorin

Really cool project! Would love to see a standard established for representing visualizations in SQL! I built a whole dashboarding tool on top of the idea: https://taleshape.com/shaper/docs/getting-started/ But Shaper takes a more pragmatic approach and just uses built in functionality to describe how to visualize the results. The most value I see with viz as SQL is that it's a great format for LLMs to specify what they want while making it easy to audit and reproduce. Just built a slack bot on top of that concept last week: https://taleshape.com/blog/build-your-own-data-analytics-sla...

anentropic

Maybe I skim read it too fast, but I did not find any clear description in the blog post or website docs of how this relates to SQL databases

I was kind of guessing that it doesn't run in a database, that it's a SQL-like syntax for a visualisation DSL handled by front end chart library.

That appears to be what is described in https://ggsql.org/get_started/anatomy.html

But then https://ggsql.org/faq.html has a section, "Can I use SQL queries inside the VISUALISE clause," which says, "Some parts of the syntax are passed on directly to the database".

The homepage says "ggsql interfaces directly with your database"

But it's not shown how that happens AFAICT

confused

show comments
getnormality

I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL?

Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language?

EDIT: OK, after looking at almost all of the documentation, I think I've finally figured it out. It's a standalone visualization app with a SQL-like API that currently has backends for DuckDB and SQLite and renders plots with Vegalite. They plan to support more backends and renderers in the future. As a commenter below said, it's supposed to help SQL specialists who don't know Python or R make visualizations.

show comments
kasperset

Will this ever integrate rest of the ggplot2 dependent packages described here: https://exts.ggplot2.tidyverse.org/gallery/ in the near or distant future? Sorry if it already mentioned somewhere.

show comments
jiehong

Outstanding!

This can replace a lot of Excel in the end.

It makes so much sense now that it exists!

rvba

1) does this alllw to export to Excel?

2) how to make manual adjustments?

efromvt

Love the layering approach - that solves a problem I’ve had with other sql/visual hybrids as you move past the basics charts.

data_ders

ok, this is definitely up my alley. color me nerd-sniped and forgive the onslaught of questions.

my questions are less about the syntax, which i'm largely familiar with knowing both SQL and ggplot.

i'm more interested in the backend architecture. Looking at the Cargo.toml [1], I was surprised to not see a visualization dependency like D3 or Vega. Is this intentional?

I'm certainly going to take this for a spin and I think this could be incredible for agentic analytics. I'm mostly curious right now what "deployment" looks like both currently in a utopian future.

utopia is easier -- what if databases supported it directly?!? but even then I think I'd rather have databases spit out an intermediate representation (IR) that could be handed to a viz engine, similar to how vega works. or perhaps the SQL is the IR?!

another question that arises from the question of composability: how distinct would a ggplot IR be from a metrics layer spec? could i use ggsql to create an IR that I then use R's ggplot to render (or vise versa maybe?)

as for the deployment story today, I'll likely learn most by doing (with agents). My experiment will be to kick off an agent to do something like: extract this dataset to S3 using dlt [2], model it using dbt [3], then use ggsql to visualize.

p.s. @thomasp85, I was a big fan of tidygraph back in the day [4]. love how small our data world is.

[1]: https://github.com/posit-dev/ggsql/blob/main/Cargo.toml

[2]: https://github.com/dlt-hub/dlt

[3]: https://github.com/dbt-labs/dbt-fusion

[4]: https://stackoverflow.com/questions/46466351/how-to-hide-unc...

show comments
gh5000

It is conceivable that this could become a duckdb extension, such that it can be used from within the duckdb CLI? That would be pretty slick.

show comments
thomasp85

The new visualisation tool from Posit. Combines SQL with the grammar of graphics, known from ggplot2, D3, and plotnine

show comments
kasperset

Looks intriguing. Brings plotting to Sql instead of “transforming” sql for plotting.

radarsat1

Wow, love this idea.

hei-lima

Really cool!

dartharva

Would be awesome if somehow coupled into Evidence.dev