How to

#Databases #Interview Questions #SQL

Databases

What is a “database”?

A database is an organized collection of information that is structured to be processed by a computing system.

Back to table of contents

What is a “database management system”?

A database management system (DBMS) is a set of tools, either general-purpose or specialized, that provides the ability to create, access, and manage a database.

The main functions of a DBMS include:

Back to table of contents

What is a “relational data model”?

The relational data model is a logical framework for data representation and a theoretical foundation for constructing relational databases.

The relational data model comprises the following components:

Back to table of contents

Define the terms “simple”, “composite”, “candidate”, and “alternate” key.

A simple key consists of a single attribute (field). A composite key consists of two or more attributes.

A candidate key is either a simple or composite key that uniquely identifies each record in a dataset. It must satisfy the criterion of non-redundancy: removing any of the fields will cause the set of fields to no longer uniquely identify the record.

From the set of all candidate keys for a dataset, a primary key is chosen, while the remaining keys are referred to as alternate keys.

Back to table of contents

What is a “primary key”? What are the criteria for its selection?

A primary key in the relational data model is one of the candidate keys of a relation selected as the main key (default key).

If a relation has a single candidate key, it serves as the primary key. If there are multiple candidate keys, one is chosen as the primary, while the others are referred to as “alternate keys”.

Typically, the most convenient candidate key is selected as the primary key. Thus, the primary key is usually chosen to have the smallest size (physical storage) and/or include the fewest attributes. Another criterion for selecting a primary key is the preservation of its uniqueness over time. Therefore, it is preferable to choose a candidate key that is likely to maintain its uniqueness.

Back to table of contents

What is a “foreign key”?

A foreign key is a subset of attributes of a relation A, whose values must match the values of a candidate key from another relation B.

Back to table of contents

What is “normalization”?

Normalization is the process of transforming database relations into a form that meets the normal forms (a step-by-step, reversible process of replacing an original schema with another schema in which data sets have a simpler and more logical structure).

Normalization aims to bring the database structure to a state that ensures minimal logical redundancy and is not intended to reduce or increase performance or to change the physical size of the database. The ultimate goal of normalization is to reduce the potential inconsistency of the data stored in the database.

Back to table of contents

What normal forms exist?

The First Normal Form (1NF) — A relation is in 1NF if all its attributes’ values are atomic (indivisible).

The Second Normal Form (2NF) — A relation is in 2NF if it is in 1NF, and all non-key attributes depend only on the key as a whole, not on any part of it.

The Third Normal Form (3NF) — A relation is in 3NF if it is in 2NF and all non-key attributes do not depend on one another.

The Fourth Normal Form (4NF) — A relation is in 4NF if it is in 3NF and does not contain independent groups of attributes between which there is a “many-to-many” relationship.

The Fifth Normal Form (5NF) — A relation is in 5NF when every non-trivial join dependency in it is defined by a candidate key(s) of that relation.

The Sixth Normal Form (6NF) — A relation is in 6NF when it satisfies all non-trivial join dependencies, meaning it is irreducible and cannot be further decomposed without loss. Every relation in 6NF is also in 5NF. It is introduced as a generalization of the fifth normal form for temporal databases.

The Boyce-Codd Normal Form (BCNF) — A relation is in BCNF when every non-trivial and irreducible left functional dependency has as its determinant a candidate key.

The Domain-Key Normal Form (DKNF) — A relation is in DKNF if every constraint imposed on it is a logical consequence of the domain and key constraints imposed on the given relation.

Back to table of contents

What is “denormalization”? What is it applied for?

Denormalization of a database is the process of intentionally converting a database into a form that does not conform to normalization rules. This is usually necessary to improve performance and data extraction speed by increasing data redundancy.

Back to table of contents

What types of relationships exist in a database? Provide examples.

Each university is guaranteed to have one rector: 1 university → 1 rector.

Every university has several faculties: 1 university → many faculties.

One professor can teach in several faculties, while one faculty can have many professors: Several professors ↔ Several faculties.

Back to table of contents

What are “indexes”? Why are they used? What are their advantages and disadvantages?

An index is a database object created to improve the performance of data retrieval.

Data sets can have a large number of records stored in arbitrary order, and searching them by a given criterion through sequential scanning one record at a time can take a long time. An index is formed from the values of one or several fields and pointers to the corresponding records in the dataset, thus achieving a significant speed increase in data retrieval.

Advantages

Disadvantages

Indexes are preferable for:

Using indexes is not advisable for:

Back to table of contents

What types of indexes exist?

By sort order

By data source

By impact on data source

By structure

By numerical composition

By content characteristics

By update mechanism

By coverage of indexed content

Indexes in clustered systems

Back to table of contents

What is the difference between clustered and non-clustered indexes?

Non-clustered indexes have data physically arranged in arbitrary order but logically organized according to the index. This index type is suitable for frequently modified datasets.

In clustered indexing, data is physically ordered, significantly enhancing data retrieval speeds (but only during sequential data access). Only one clustered index can be created for any dataset.

Back to table of contents

Does it make sense to index data with a small number of possible values?

A guiding rule when creating an index is: if the volume of information (in bytes) that does NOT meet the selection condition is less than the size of the index (in bytes) for that selection condition, then generally, this optimization will slow down retrieval.

Back to table of contents

When is a full scan of the dataset more beneficial than indexed access?

A full scan is performed through multi-block reading. Index access is done through single-block reading. Additionally, when accessing via an index, the index must first be scanned, followed by reading blocks from the dataset. The number of blocks that need to be read from the dataset depends on the clustering factor. If the total cost of all necessary single block reads exceeds the cost of a full multi-block scan, then the full scan is chosen by the optimizer.

Thus, a full scan is typically chosen with weak selectivity of the query predicates and/or weak clustering of the data, or in cases of very small datasets.

Back to table of contents

What is a “transaction”?

A transaction is an operation on the database that transforms it from one consistent state to another, manifested by changes to the data stored in the database.

Back to table of contents

Name the main properties of a transaction.

Atomicity ensures that no transaction will be partially committed in the system. Either all its sub-operations will be executed or none at all.

Consistency. A transaction that reaches its normal conclusion and thereby commits its results maintains the consistency of the database.

Isolation. During the execution of a transaction, concurrent transactions should not affect its outcome.

Durability. Regardless of problems at lower levels (e.g., power failures or hardware faults), changes made by a successfully completed transaction must remain saved once the system is back online.

Back to table of contents

What levels of transaction isolation exist?

In order of increasing isolation levels and therefore reliability in data handling:

Back to table of contents

What problems can arise during concurrent access using transactions?

When transactions are executed concurrently, the following issues may occur:

Transaction 1 Transaction 2
SELECT SUM(f2) FROM tbl1;
INSERT INTO tbl1 (f1,f2) VALUES (15,20);
COMMIT;
SELECT SUM(f2) FROM tbl1;

In Transaction 2, a SQL command using all values of field f2 is executed. Then in Transaction 1, a new row insertion occurs, causing the re-execution of the SQL command in Transaction 2 to yield a different result. This situation is called phantom reading. It differs from non-repeatable reads in that the change occurs not due to updates or deletions of existing data, but due to the new phantom data emergence.

Back to table of contents

Sources