Add best practices for statvars for Custom DC#683
Add best practices for statvars for Custom DC#683kmoscoe merged 87 commits intodatacommonsorg:masterfrom
Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Summary of ChangesHello @kmoscoe, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly improves the documentation for Custom Data Commons, particularly focusing on best practices for statistical variable management. The changes aim to provide clearer, more detailed guidance for users on defining, naming, and organizing statistical variables, ensuring better data quality and compatibility for potential contributions to the main Data Commons knowledge graph. It also refines explanations of core concepts like entity types and variable grouping. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request improves the documentation for custom Data Commons, focusing on best practices for statistical variables (statvars). The changes clarify when to reuse existing statvars, add naming convention guidelines, and provide better examples. I've found a few areas in the documentation that could be further improved for clarity and correctness, including a typo, a broken link, and a couple of confusing or incorrect examples. My suggestions aim to make the documentation more accurate and easier for users to follow.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
beets
left a comment
There was a problem hiding this comment.
wow, this is a great update. thanks for breaking down all the requirements.
there is currently a limit to the length of dcid's in custom dc. i'll ping that group to see where we are with it
might want to wait for ajai to approve this pr too
|
|
||
| Schema.org and the base Data Commons knowledge graph define entity types for just about everything in the world. An _entity type_ is a high-level concept, and is derived directly from a [`Class`](https://datacommons.org/browser/Class){: target="_blank"} type. The most common entity types in Data Commons are place types, such as `City`, `Country`, `AdministrativeArea1`, etc. Examples of other entity types are `Hospital`, `PublicSchool`, `Company`, `BusStation`, `Campground`, `Library` etc. It is rare that you would need to create a new entity type, unless you are working in a highly specialized domain. | ||
| Schema.org and the base Data Commons knowledge graph define entity types for just about everything in the world. An _entity type_ is a high-level concept, and is derived directly from a [`Class`](https://datacommons.org/browser/Class){: target="_blank"} type. Non-place entities are of two types: | ||
| - The thing you are measuring, known as the `populationType` in Data Commons. Often this is a `Person`, which is a commonly used population in Data Commons. But it could be something else entirely, like the beds in a hospital, the price of a commodity, Olympic medals won by a country, or the surface area of an ocean. |
There was a problem hiding this comment.
would be nice if we linked to the pop types for these examples
There was a problem hiding this comment.
Not sure I understand what you mean here. Do you mean link to the browser pages for these entities?
There was a problem hiding this comment.
yes, we could link to the "Person" browser page. i'm also wondering if we have poptypes for the others in the examples (beds in a hospital, price of a commodity, etc). in other words, i thought the reader might be curious about how those examples would be modeled.
There was a problem hiding this comment.
The thing is it's a bit confusing: the incoming populationType arcs section shows statvars (https://screenshot.googleplex.com/ASWnrHqVefUfW5r) while it's the domainIncludes section that actually shows the population types: https://screenshot.googleplex.com/BrTds7ghnFBvekP. I think at this point in the doc it's too early to get into these details, which are actually quite confusing.
any update on that?
|
This PR makes the following changes:
Staged at http://bullie.svl.corp.google.com:4000
Next PR will cover similar content for custom entities and enums