# Dataset Files

Sometimes your plugin/applet will need to include specialized data in order to operate. Some examples of data which a plugin might rely on are:

* New Concepts and ReferenceTerms for standardized vocabulary
* New Materials and ManufacturedMaterials which are used/distributed by the application
* New Places or Organizations which support a particular plugin.

## Dataset Files

All data in SanteDB's datastore can be distributed in the form of a data set. A data set file is a special file which contains a series of RIM objects (serialized in HDSI XML format) and instructions on how the RIM object should be treated.

Dataset files should be placed in the **data/** folder of your applet (if you're distributing via applet) or should be copied into the **data/** folder of the SanteDB server on installation (if you're distributing via .NET installer).

The structure of a dataset file is as follows:

```markup
<?xml version="1.0" encoding="utf-8" ?>
<dataset xmlns="http://santedb.org/data" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    id="My Super Cool Dataset">
  <update insertIfNotExists="false" skipIfError="false">
    <!-- Model Object -->
  </update>
  <insert updateIfExists="false" skipIfError="false">
    <!-- Model Object -->
  </insert>
</dataset>
```

The two options for dataset manipulation are:

* **insert -** Used to perform an insert on the database. Installation of the data is considered "FAILED" if the object already exists or is detected to be a duplicate. The @updateIfExists key is only used if an insertion was attempted and a duplicate primary key was detected. It does not check classifiers.
* **update** - Used to perform a lookup by classifier / key and update the object. If an object does not exist, or a classifier key does not exist and @insertIfNotExists is true, then an insert is done. Typically this operation is safer than insert, but is slower.

The following objects are supported on the dataset:

* Concept
* ConceptClass
* ConceptSet
* ReferenceTerm
* CodeSystem
* Place
* Organization
* Material
* ManufacturedMaterial
* ExtensionType
* IdentifierType
* AssigningAuthority

## Generating a Dataset File

You can generate a dataset from existing data in the SanteDB server by using the SanteDB Administration Console (sdbac) application. To do this you will use the `cdr.query` command, which is illustrated below to produce a dataset of all concepts in the FamilyMember concept set:

```markup
cdr.query -r Concept conceptSet.mnemonic=FamilyMember --dataset > output.dataset
```

You can now easily modify the existing concepts from within the dataset.

{% hint style="info" %}
You can use the user interfaces setup your data and then run the above CDR query to generate a related dataset. This makes your data portable to other SanteDB instances.
{% endhint %}


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://help.santesuite.org/developers/applets/distributing-data.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
