3. Projects and Datasets

Goal

Create projects, upload datasets, and validate data integrity before modeling.

Preconditions

  1. User has project create/upload permissions.

  2. Data file is available in CSV or Parquet format.

Create a project

  1. Open Projects.

  2. Click Create Project.

  3. Set: - Project name. - Optional description. - Visibility/team scope if enabled.

  4. Save and confirm project appears in list.

Upload a dataset

  1. Open Datasets inside the target project.

  2. Click Upload Dataset.

  3. Select file and optional metadata fields.

  4. Wait for ingestion completion status.

  5. Open dataset detail view.

Validate dataset quality

  1. Review schema inference: - Column names. - Data types. - Null counts.

  2. Review row count and duplicate indicators.

  3. Review first rows preview for parsing issues.

  4. Confirm delimiters/date formats were interpreted correctly.

Functional validation checklist

  1. Dataset artifact is created and visible in table.

  2. Reported row count matches source file expectation.

  3. Critical columns preserve expected types.

  4. Preview values are not shifted/truncated unexpectedly.

  5. Re-opening dataset detail returns same metadata (idempotent view).

Expected result

  1. Project and dataset are ready for downstream workflows.

  2. Dataset version is selectable in Builder/Experiments.

Common errors and recovery

  1. Upload stuck in processing: - Retry upload with smaller sample. - Check server logs for parser error.

  2. Wrong type detection: - Re-upload with cleaned headers/date format.

  3. Row mismatch vs source: - Validate delimiter/quote settings and malformed rows.

Screenshots

Projects list and create project flow

Projects module with creation and selection flow.

Dataset upload and list view

Dataset registry with upload status and metadata summary.