If you read anything else in this blog, you can see that I like the internals and making sense of how things work under the hood. Otherwise, using the tech on top of it makes less sense. And less fun. I first heard about CockroachDB at GopherCon 2023 in San Diego, where they had a booth. I got a free book š
CockroachDB is a distributed SQL database written in Go. So when I was looking for something to contribute to, this felt like a natural fit. Their repo uses a nice mix of Go and SQL files to define and run their tests. The DB is Postgres-compatible, and while they implement their own storage engine (and more), they try to be as compliant as possible.
Glossary before we begin
- REGCLASS is a PostgreSQL data type that represents a ārelationā (table, sequence, view, etc.) by its internal object ID. When you write
'sc.myseq'::REGCLASS, youāre casting the string name'sc.myseq'into this special type. PostgreSQL/CockroachDB looks up the object by name and converts it to a numeric ID under the hood. Itās commonly used with functions likenextval()that need to reference a sequence. - UDF stands for User-Defined Function - a custom function you write yourself with
CREATE FUNCTION, as opposed to built-in functions. In this case the UDF was written in PL/pgSQL (a procedural SQL language), which is why itās sometimes called a āPL/pgSQL UDF.ā - DOid is a Go struct in CockroachDBās source code that represents a datum (a value) of PostgreSQLās OID type. āDā is the convention for ādatumā in that codebase, and āOidā is the type. An OID (Object Identifier) is just a numeric ID that PostgreSQL uses internally to uniquely identify database objects like tables, sequences, functions, etc.
- Schema-qualified just means a name that includes its schema prefix, like
sc.myseq(schemasc, objectmyseq), as opposed to justmyseqon its own. - search_path is a PostgreSQL setting that tells the database which schemas to search for objects when a schema name is not explicitly provided. For example, if search_path is
("public", "sc"), an unqualifiedmyseqlookup will first look inpublic.myseqand then insc.myseq.
The Bug
As recommended in their contributing guidelines, I went through the good first issue list and found myself an E-starter (their label for issues suited to first-time contributors): https://github.com/cockroachdb/cockroach/issues/167057.
The bug was that when CockroachDB saw 'sc.myseq'::REGCLASS inside a UDF, it correctly used the full name sc.myseq to look up the objectās numeric ID. However, it then threw away the sc. part when storing the name. Later, when it tried to re-look up the object using only myseq, it couldnāt find it if sc wasnāt in the default search path.
Navigating the codebase
The repo is insanely large with millions of lines of Go. The usual strategy for me is to look up the error message. The bug report said
something was failing at CREATE FUNCTION time with relation does not exist, so I searched for that string.
That gave me a manageable list of files. From there I could see which one was on the path for CREATE FUNCTION. One led to another, and within 20 minutes I was reading maybeTrackRegclassDependenciesForViews in pkg/sql/opt/optbuilder/builder.go. When CockroachDB compiles a UDF or view, it needs to record which database objects (tables, sequences, etc.) that function/view depends on. This is important so that, for example, you canāt drop a sequence that a UDF relies on without also dropping the UDF first. This function is responsible for detecting and recording those dependencies specifically for REGCLASS expressions. (Itās worth noting that maybeTrackRegclassDependenciesForViews applies to UDFs too, despite the name.)
Step by step what maybeTrackRegclassDependenciesForViews does:
- Checks if schema dependency tracking is even enabled. If not, it exits early.
- Checks if the expression is of type
REGCLASS. If itās something else, itās irrelevant and it exits. - Skips expressions that contain variables (like function arguments), since those canāt be resolved at compile time. Only constant values can be tracked.
- Evaluates the constant
REGCLASSexpression (e.g.'sc.myseq'::REGCLASS) to get the actual OID (numeric object ID). - Uses that OID to look up the actual data source (table/sequence) in the catalog.
- Appends it to
b.schemaDeps- the list of schema dependencies being collected.
The fix
Hereās the PR that I opened.
The crash was in step 5 - the data source lookup. The old code had two paths: it first called strconv.Atoi(regclass.String()) to check whether the value was stored as a pure numeric OID. If so, it called ResolveDataSourceByID directly, and that path worked fine. The bug was in the else branch, taken when the value was stored as a name.
In that branch, DOid.String() was called to get the objectās name for the lookup. But DOid.String() is designed to return just the resolved object name without schema qualification - correct for display purposes, but wrong here. It returned "myseq" instead of "sc.myseq". The code then passed that bare name to MakeUnqualifiedTableName and tried to resolve it via b.resolveDataSource, which failed because myseq didnāt exist in the default public schema. resolveDataSource ended up panicking.
id, err := strconv.Atoi(regclass.String())
if err == nil {
ds, _, err = b.catalog.ResolveDataSourceByID(b.ctx, cat.Flags{}, cat.StableID(id))
if err != nil {
panic(err)
}
} else {
tn := tree.MakeUnqualifiedTableName(tree.Name(regclass.String()))
ds, _, _ = b.resolveDataSource(&tn, privilege.SELECT)
}
The fix skips that lossy round-trip entirely and just looks up the object directly using the numeric OID, which is always unambiguous.
dOid, ok := regclass.(*tree.DOid)
if !ok {
panic(errors.AssertionFailedf(
"expected *tree.DOid from eval.Expr on REGCLASS expression, got %T", regclass,
))
}
ds, _, err := b.catalog.ResolveDataSourceByID(b.ctx, cat.Flags{}, cat.StableID(dOid.Oid))
if err != nil {
panic(err)
}
I also did some minor refactoring along the way for readability.
Testing
CockroachDB uses a system called logic tests for SQL-level testing. Instead of writing Go test code, you write plain text files that look like a conversation between you and the database. The test runner reads the file line by line and executes each statement, then checks that the output matches whatās written in the file.
There are a few key directives in these test files:
statement ok - run this SQL and expect it to succeed with no output:
statement ok
CREATE SCHEMA sc_62227;
CREATE SEQUENCE sc_62227.myseq;
query I - run this SQL and expect it to return rows. The I means the result is an integer column. After ---- you write the expected output:
query I
SELECT sc_62227.next_id()
----
1
let $variable - capture the result of a query into a variable you can reuse later. This is useful when you canāt predict an exact value (like an auto-assigned OID):
let $seqoid
SELECT 'sc_62227.myseq'::REGCLASS::OID;
Then you can use $seqoid in later statements.
subtest name / subtest end - groups related tests together under a named section, making it easy to run or identify a specific regression test in isolation.
The whole test file (udf_regressions) is just a long flat text file of these blocks, one regression test after another. The test runner is a Go program that parses and executes them, but as a contributor you never have to write Go to add a SQL-level test - you just append a new subtest block to the file. Itās a very readable format, and the test itself essentially doubles as documentation of the expected behavior.
Approved and merged
After the initial PR, I got a request to add another test. I did and got one of the approvals I needed. That was on April 6th. Performance tests failed but they seemed to be related to some GitHub infrastructure flakiness. I couldnāt retry them myself (no permissions), so I waited for quite some time for a response. I followed up on May 5th and was advised to rebase on the latest master. The PR was approved and merged on May 19th. The fix will be included in release 26.3.0. Itās been a really positive experience and Iām looking forward to contributing more.
Comments