Question 1

What does a file permission of '755' mean in Linux?

Accepted Answer

In Linux, permissions are divided into Read (4), Write (2), and Execute (1) for three groups: User (Owner), Group, and Others.

755 breaks down as:
- User (7): 4+2+1 = Read, Write, Execute (full control)
- Group (5): 4+1 = Read, Execute
- Others (5): 4+1 = Read, Execute

This is commonly used for executable scripts and directories, giving the owner full control while allowing other users to read and execute the file.

Question 2

What is the difference between chmod, chown, and chgrp?

Accepted Answer

`chmod` (change mode): Modifies the read, write, and execute permissions of a file or directory.
Example: `chmod 755 script.sh` or `chmod +x script.sh`

`chown` (change owner): Changes the owner (and optionally the group) of a file.
Example: `chown ubuntu:www-data /var/www/html` changes owner to 'ubuntu' and group to 'www-data'.

`chgrp` (change group): Changes only the group ownership of a file.
Example: `chgrp developers myfile.txt`

Question 3

What is the difference between a process and a thread in Linux?

Accepted Answer

A Process is an independent executing program with its own memory space, file descriptors, and system resources. Processes are isolated from each other.

A Thread is a unit of execution within a process. Threads share the same memory space and resources of their parent process, making communication between them faster but also requiring synchronization (to avoid race conditions).

In Linux, both processes and threads are represented as 'tasks' internally. You can create threads using `pthread_create()` in C, or via higher-level abstractions in Python, Java, etc.

Question 4

How would you check which processes are listening on a specific port?

Accepted Answer

There are multiple commands to check this:

1. Using `ss` (modern replacement for netstat):

`ss -tulnp | grep <port_number>`

2. Using `netstat`:

`netstat -tulnp | grep <port_number>`

3. Using `lsof`:

`sudo lsof -i :<port_number>`

Flags: -t (TCP), -u (UDP), -l (listening), -n (no DNS resolution), -p (show process ID). Root/sudo is typically required to see process names.

Question 5

How do you check the routing table in Linux?

Accepted Answer

You can view the routing table using these commands:

1. `ip route show` or `ip route` (recommended, modern tool).
2. `route -n` (legacy, but shows numeric addresses without DNS lookups).
3. `netstat -rn` (legacy).

The routing table shows which network interface and gateway to use for traffic to different destination networks. The default gateway (0.0.0.0/0) is the route for all traffic that doesn't match a more specific rule.

Question 6

What is the difference between a hard link and a soft (symbolic) link?

Accepted Answer

Hard Link: A direct reference to the inode (the actual data on disk). Multiple filenames can point to the same inode. If you delete the original file, the hard link still works because the data is only erased when all links to it are removed. Hard links cannot cross filesystems.

Soft (Symbolic) Link: A pointer or shortcut to another filename. If the original file is deleted, the symlink becomes 'dangling' (broken). Symlinks can span different filesystems and can point to directories.

Create with: `ln file hardlink` or `ln -s target symlink`

Question 7

What does the shebang (#!) at the top of a script do?

Accepted Answer

The shebang (also called hashbang) is the two-character sequence `#!` at the very first line of a script file, followed by the path to the interpreter.

Example: `#!/bin/bash` or `#!/usr/bin/env python3`

When you execute a script, the Linux kernel reads the first line and uses the specified program to interpret the rest of the file. `#!/usr/bin/env python3` is generally preferred over `#!/usr/bin/python3` because it searches the PATH for the python3 executable, making scripts more portable across different systems.

Question 8

How do you find the top 5 largest files in a directory?

Accepted Answer

To find the largest files in the current directory recursively, use:

`find . -type f -exec du -h {} + | sort -rh | head -5`

Breakdown:
- `find . -type f`: Find all files recursively.
- `-exec du -h {} +`: Get the disk usage of each file in human-readable format.
- `sort -rh`: Sort in reverse order by human-readable sizes (largest first).
- `head -5`: Show only the top 5 results.

Alternatively, for a quick overview of directory sizes: `du -sh * | sort -rh | head -5`

Question 9

What is the out-of-memory (OOM) killer and how does it decide which process to kill?

Accepted Answer

The OOM killer is a Linux kernel feature activated when the system runs critically low on memory. To prevent a total crash, it sacrifices one or more processes to free up RAM.

It selects the victim based on an `oom_score` (0-1000). Higher score = more likely to be killed. The score is calculated based on:
1. The amount of virtual memory the process uses (higher usage = higher score).
2. How long the process has been running (newer processes score higher).
3. Nice value and other factors.

You can protect critical processes by setting `oom_score_adj` to -1000:
`echo -1000 > /proc/<pid>/oom_score_adj`

Question 10

What is a cron job and how do you schedule one?

Accepted Answer

A cron job is a scheduled task in Linux that runs automatically at specified time intervals. The cron daemon reads the crontab (cron table) files to determine what commands to execute.

To edit your crontab: `crontab -e`

Crontab syntax has 5 time fields followed by the command:
`* * * * * /path/to/script.sh`
`│ │ │ │ └── Day of week (0-7, Sun=0 or 7)
│ │ │ └──── Month (1-12)
│ │ └────── Day of month (1-31)
│ └──────── Hour (0-23)
└────────── Minute (0-59)`

Example: `0 2 * * * /home/user/backup.sh` runs backup.sh every day at 2:00 AM.

Question 11

What tools would you use to diagnose high CPU usage on a Linux server?

Accepted Answer

A systematic approach to diagnose high CPU:

1. `top` or `htop`: Real-time view of processes sorted by CPU usage. Press 'P' in `top` to sort by CPU.
2. `ps aux --sort=-%cpu | head -10`: List the top 10 CPU-consuming processes.
3. `mpstat -P ALL 1`: Show per-CPU core utilization (from `sysstat` package). Useful to see if load is spread across cores or pinned to one.
4. `sar -u 1 5`: Show CPU usage averaged over 5 intervals of 1 second.
5. `perf top` or `perf record`: Kernel-level profiling to see which functions/code paths are consuming CPU.
6. Check if it's a user process, system call overhead (high %sy), or I/O wait (high %wa) causing the issue.

Question 12

What is SSH key-based authentication and why is it more secure than password-based login?

Accepted Answer

SSH key-based authentication uses a pair of cryptographic keys:
1. A private key (kept secret on your local machine, never shared).
2. A public key (placed in `~/.ssh/authorized_keys` on the remote server).

When you connect, the server sends a challenge encrypted with your public key. Only your private key can decrypt it, proving your identity without ever transmitting a password.

Why it's more secure:
1. Not vulnerable to brute-force password attacks.
2. Not vulnerable to phishing (there's no password to steal).
3. Private keys can be protected with a passphrase for extra security.
4. Individual access can be revoked by simply removing a public key from `authorized_keys`.

Question 13

Scenario: A developer created a script at `/opt/scripts/deploy.sh`, but when they run it, they get 'Permission denied'. They ask you to run `chmod 777` on it. What is the security risk, and what is the proper fix?

Accepted Answer

Using `chmod 777` gives read, write, and execute permissions to everyone on the system, which is a massive security risk as any user could modify the deployment script to run malicious code. The proper fix is to grant execute permissions only to the owner (or group) who needs it, using `chmod 755 /opt/scripts/deploy.sh` or `chmod +x` combined with the correct `chown` ownership.

Question 14

Scenario: You notice a process called `backup-worker` is consuming 99% of the CPU. You try to kill it with `kill 1234` (where 1234 is the PID), but it doesn't stop. Why, and what do you do next?

Accepted Answer

The standard `kill` command sends a SIGTERM (15) signal, which is a polite request to terminate. The process might be ignoring it or stuck in a loop. I would first try to see if it's waiting on I/O. If it must be stopped immediately, I would use `kill -9 1234` to send a SIGKILL signal, which is handled directly by the kernel and forcefully terminates the process.

Question 15

Scenario: An application on your Linux server cannot connect to a remote database on port 5432. How do you check if the network port is reachable from your server?

Accepted Answer

I would use `nc` (netcat) or `telnet`. The command `nc -vz <database_IP> 5432` will attempt a TCP handshake and report if the port is open and reachable. If it times out or connection is refused, it indicates a firewall issue or that the database isn't listening.

Question 16

Scenario: You log into a server and need to quickly find a configuration file named `nginx.conf` but you have no idea where it is installed. What command is the fastest way to find it?

Accepted Answer

I would use the `find` command starting from the root directory: `find / -type f -name nginx.conf 2>/dev/null`. The `2>/dev/null` part hides all the 'Permission denied' errors for directories I don't have access to, making the output clean.

Question 17

Scenario: You want to monitor an application's log file in real-time to see errors as they happen, but the file is extremely large and you only want to see lines containing the word 'ERROR'. How do you do this?

Accepted Answer

I would use `tail` combined with `grep`. The command `tail -f /var/log/app.log | grep 'ERROR'` will continuously output new lines appended to the file, and filter them so only the ones containing the keyword 'ERROR' are displayed on the terminal.

Question 18

Scenario: When you type `ls` in the terminal and hit enter, what high-level steps occur in the bash shell and Linux OS to execute that command?

Accepted Answer

First, the bash shell parses the input. It checks if `ls` is an alias or built-in, then searches the $PATH directories for an executable named `ls`. Finding `/bin/ls`, the shell uses the `fork()` system call to create a child process, and then `exec()` to replace the child's memory space with the `ls` program, which then makes system calls to read directory contents.

Question 19

Scenario: You are on an Ubuntu server and need to install `curl`, but `apt install curl` fails saying the package cannot be found. What did you likely forget to do?

Accepted Answer

I likely forgot to update the local package repository cache. Before installing new software on a Debian/Ubuntu system, especially a freshly provisioned one, I must run `sudo apt update` to fetch the latest package lists from the configured repositories.

Question 20

Scenario: A new employee joined. You created their user account, but they report they cannot use `sudo` to restart services. How do you grant them this permission securely?

Accepted Answer

Instead of editing the sudoers file directly in an unsafe way, I would either add them to the explicit sudo group (e.g., `usermod -aG sudo username` on Debian/Ubuntu, or `usermod -aG wheel username` on CentOS/RHEL), or use `visudo` to add a specific rule granting them sudo access only for the systemctl restart commands they need.

Question 21

Scenario: You accidentally deleted a critical text file using `rm`. Is there a recycle bin in Linux CLI, and can the file be easily recovered natively?

Accepted Answer

No, standard Linux CLI has no concept of a recycle bin; `rm` immediately unlinks the file from the directory tree. While the data might still exist physically on disk until overwritten, native recovery is extremely difficult for casual users and often requires unmounting the drive and using advanced forensic tools like `extundelete` immediately.

Question 22

Scenario: You have a CSV file `users.csv` and just want to extract the 3rd column (email addresses). What command-line tool is best for this?

Accepted Answer

The `awk` or `cut` command is best. Using cut: `cut -d',' -f3 users.csv`. Here `-d','` sets the delimiter to a comma, and `-f3` extracts the third field.

Question 23

Scenario: A server is completely unresponsive to web traffic. You manage to SSH in, and `top` shows a 'load average' of 45.0 on a 4-core CPU, but CPU usage is 2% and RAM is mostly free. What is the bottleneck?

Accepted Answer

High load average with low CPU and RAM usage indicates an intense I/O bottleneck (processes in state 'D', uninterruptible sleep). The processes are stuck in the run queue waiting infinitely for disk reads/writes or network storage responses. I would immediately use `iostat -x 1` or `iotop` to identify which specific process or disk is saturating the IOPS.

Question 24

Scenario: A developer claims the Linux server is inexplicably dropping external API responses. You need to prove whether the packets are actually reaching the server's network interface. How do you capture the raw packets?

Accepted Answer

I would use `tcpdump`. I would run `sudo tcpdump -i eth0 host api.example.com -w capture.pcap` to intercept all traffic on the main interface to and from that specific host. This captures the raw TCP handshakes and payloads, which I can then analyze directly or download to open in Wireshark to prove exactly what is happening at the network layer.

Question 25

Scenario: The `/var` partition is 100% full. The system uses Logical Volume Management (LVM). Explain the step-by-step process to add a new physical 50GB disk to the system and expand `/var` without downtime.

Accepted Answer

1. Initialize the new disk: `pvcreate /dev/sdb`. 2. Add it to the existing volume group: `vgextend system_vg /dev/sdb`. 3. Extend the logical volume for /var: `lvextend -l +100%FREE /dev/system_vg/var_lv`. 4. Resize the filesystem online (ext4 example): `resize2fs /dev/system_vg/var_lv`. This expands the storage seamlessly without requiring a reboot.

Question 26

Scenario: An application needs to run on port 80 (HTTP), but you do not want to run the application as the root user for security reasons. How do you allow a non-root user to bind to privileged ports (ports < 1024)?

Accepted Answer

The modern and secure approach is to use Linux Capabilities. I would apply the `CAP_NET_BIND_SERVICE` capability to the application binary using the command `setcap 'cap_net_bind_service=+ep' /path/to/binary`. This grants the binary the specific privilege to bind to low ports without granting it total root access.

Question 27

Scenario: You execute a long-running database import script over SSH. If your Wi-Fi disconnects, the SSH session drops and the script terminates, corrupting the import. How do you run the script safely so it survives a disconnect?

Accepted Answer

I would use a terminal multiplexer like `tmux` or `screen`. I would start a `tmux` session, run the script inside it, and if my SSH connection drops, the session remains active on the server. I can reconnect later and run `tmux attach` to check the progress. Alternatively, for simpler jobs, running it with `nohup ./script.sh &` prevents the SIGHUP signal from terminating it.

Question 28

Scenario: You delete a massive 20GB log file using `rm`, but `df -h` shows the partition is still 100% full. How is this possible and how do you fix it?

Accepted Answer

This happens when the file is deleted from the filesystem index but a running process (like Nginx or a logging daemon) still holds a lock/file descriptor open to it. The kernel cannot free the disk blocks until the process closes the handler. I would use `lsof +L1` to identify deleted files still held by processes, and then restart or reload that specific service to release the space.

Question 29

Scenario: You wrote a custom Python application and want Linux to automatically start it on boot, and restart it if it crashes. How do you implement this?

Accepted Answer

I would create a systemd service unit file (e.g., `/etc/systemd/system/myapp.service`). In this file, I'd define `ExecStart=/usr/bin/python3 /opt/app.py`, `Restart=always` (to handle crashes), and `WantedBy=multi-user.target` (to ensure it runs on boot). Finally, I'd run `systemctl daemon-reload`, `systemctl enable myapp`, and `systemctl start myapp`.

Question 30

Scenario: Your Java application mysteriously vanishes without a trace or error log during nightly batch processing. Where do you check in the Linux system to see if the kernel killed it?

Accepted Answer

I would check the system logs, specifically looking for the Out-Of-Memory (OOM) killer. I can use `dmesg -T | grep -i oom` or check `/var/log/syslog` (or `/var/log/messages`). If the system ran out of RAM, the kernel's OOM killer will log an entry explicitly stating it sacrificed the Java process to save the system.

Question 31

Scenario: You manage an internal web server. You want to use iptables to drop all incoming traffic by default, but allow only SSH and HTTP from entirely anywhere. What are the core commands?

Accepted Answer

First, establish the allows: `iptables -A INPUT -p tcp --dport 22 -j ACCEPT` and `iptables -A INPUT -p tcp --dport 80 -j ACCEPT`. Next, ensure existing connections don't break: `iptables -A INPUT -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT`. Finally, set the default drop policy: `iptables -P INPUT DROP`.

Question 32

Scenario: A server can ping `8.8.8.8` perfectly, but pinging `google.com` results in 'Name or service not known'. Which configuration file contains the DNS resolver settings, and what should you check?

Accepted Answer

The DNS resolver settings are typically stored in `/etc/resolv.conf`. I would check this file to ensure there are valid `nameserver` entries configured (e.g., `nameserver 8.8.8.8` or an internal corporate DNS IP). I would also verify that port 53 (UDP/TCP) outbound is not blocked by a firewall.

Question 33

Scenario: A compiled C application is segfaulting randomly in production. The developers need a core dump to debug it, but no core files are being generated. How do you enable and capture them?

Accepted Answer

Core dumps are likely disabled by a ulimit. I would first check `ulimit -c`. If it returns 0, I execute `ulimit -c unlimited` for that session or service to allow core file generation. Additionally, I would check `/proc/sys/kernel/core_pattern` to see where the kernel is piping the core dumps (often to systemd-coredump), and retrieve them using `coredumpctl`.

Question 34

Scenario: A high-traffic web server is reporting 'TCP: time wait bucket table overflow' in dmesg, and dropping new connections. What system limitation are you hitting, and how do you tune it?

Accepted Answer

The server is exhausting its tracking table for sockets in the TIME_WAIT state due to massive amounts of short-lived TCP connections. I would tune sysctl parameters. Specifically, I would edit `/etc/sysctl.conf` to increase `net.ipv4.tcp_max_tw_buckets`. Alternatively, to aggressively recycle sockets, I might enable `net.ipv4.tcp_tw_reuse=1` and apply it with `sysctl -p`.

Question 35

Scenario: A reverse proxy server starts throwing 'Too many open files' errors and refusing client connections. Setting `ulimit -n 1000000` in your bash session and restarting the service via systemd doesn't fix it. Why?

Accepted Answer

User limits set in a bash session via `ulimit` do not inherit into background services started by systemd. Systemd ignores `/etc/security/limits.conf` for system services. To fix this, I must edit the specific service unit file (e.g., `systemctl edit nginx`) and add the directive `LimitNOFILE=1000000` to the `[Service]` block, then run `systemctl daemon-reload` and restart it.

Question 36

Scenario: A massive PostgreSQL database with 128GB of RAM runs slower than expected. The CPU spends a huge amount of time in 'system' context resulting from Translation Lookaside Buffer (TLB) misses. What Linux memory feature drastically improves this?

Accepted Answer

Enabling 'HugePages'. By default, Linux allocates memory in small 4KB pages. For a 128GB database, managing millions of tiny pages saturates the CPU's TLB cache, causing misses. By configuring HugePages (e.g., 2MB or 1GB sizes) via `vm.nr_hugepages`, we significantly reduce the number of memory pages the kernel must track, slashing TLB misses and vastly accelerating database performance.

Question 37

Scenario: An ext4 filesystem becomes completely corrupted and will not mount. A standard `fsck` fails because the primary superblock is destroyed. How do you recover the partition?

Accepted Answer

Ext4 distributes backup superblocks across the disk geometry during creation. I would first run `mke2fs -n /dev/sdX1` to perform a dry-run and reveal the specific block locations of the backup superblocks. Then, I would force `e2fsck` to use one of those backups to rebuild the table using `e2fsck -b <backup_block_number> /dev/sdX1`, recovering the filesystem structure.

Question 38

Scenario: Two VMs in the same physical subnet have identical IP addresses assigned by accident (IP conflict). VM-A was there first, VM-B booted later. Traffic is behaving erratically. How do you diagnose this exclusively from VM-C on the same subnet?

Accepted Answer

From VM-C, I would continuously run `arping <Conflicted_IP>`. Because both VMs share the identical IP but have different MAC addresses, the `arping` output will alternate, showing replies coming from completely different MAC addresses for the same IP. Examining the local ARP cache (`ip neigh`) on VM-C would also show 'flap' warnings as the MAC address rapidly changes back and forth.

Question 39

Scenario: A dedicated database server with 256GB RAM starts heavily using swap space when it reaches 180GB of RAM usage, tanking performance. It has plenty of free RAM. Why is it swapping prematurely, and how is it fixed?

Accepted Answer

This is controlled by the `vm.swappiness` kernel parameter (default 60). The kernel aggressively swaps out idle anonymous memory to free up RAM for the filesystem cache. For dedicated databases that manage their own caching in memory, this is detrimental. The fix is to edit `/etc/sysctl.conf` and set `vm.swappiness=1` (or a very low value), forcing the kernel to only swap as an absolute last resort to avert an OOM crash.

Question 40

Scenario: You are configuring a bastion host exposed to the internet. To pass a strict security audit, you must prevent any possibility of port forwarding (tunneling) through SSH, allow only key-based auth, and disable root login. Detail the `sshd_config` adjustments.

Accepted Answer

I would edit `/etc/ssh/sshd_config`. To disable tunneling, I set `AllowTcpForwarding no` and `X11Forwarding no`. To enforce key-based auth, I set `PasswordAuthentication no`. To prevent direct root access, I set `PermitRootLogin no`. Afterward, I securely restart the sshd service to apply the hardened posture.

Question 41

Scenario: After a kernel update, a CentOS server reboots but halts immediately at the 'grub>' prompt. It cannot seamlessly load the OS. What are the high-level steps to manually boot it from this prompt?

Accepted Answer

From the `grub>` prompt, I must manually define the boot sequence. 1. Identify the boot partition using `ls` (e.g., `(hd0,msdos1)`). 2. Define the root: `set root=(hd0,msdos1)`. 3. Load the kernel: `linux /vmlinuz- root=/dev/mapper/centos-root ro`. 4. Load the initial ramdisk (initrd) which contains drivers to mount the real root: `initrd /initramfs-.img`. 5. Execute `boot`.

Question 42

Scenario: A proprietary binary application hangs indefinitely upon startup with no logs or console output. How do you determine exactly what system resource or file the process is infinitely waiting on?

Accepted Answer

I would use `strace` to intercept and record the system calls the binary is making to the kernel. By running `strace -T -tt -f ./application`, I can trace all child processes (`-f`) and see timestamps (`-tt`). The trace will clearly reveal if the process is stuck looping on a `read()` from a blocked socket, attempting `open()` on an unavailable file, or hanging on a `futex()` lock.

Question 43

Scenario: A production microservice written in C has a slow memory leak that takes weeks to crash the system. Traditional tools like Valgrind introduce too much latency for production. What modern Linux kernel technology allows you to profile and capture the exact stack traces allocating the leaked memory with near-zero overhead?

Accepted Answer

The technology is eBPF (Extended Berkeley Packet Filter). I would utilize eBPF-based profiling tools, specifically `memleak` from the BCC (BPF Compiler Collection) toolset. By executing `memleak-bpfcc -p <PID>`, eBPF dynamically instruments the kernel, hooking directly into memory allocation/free syscalls (like malloc/free). It tracks allocations that lack corresponding frees and dumps the exact origin stack traces, all inside a secure, sandboxed kernel environment with minimal production overhead.

Question 44

Scenario: A server hosts an SFTP service and a critical Web API. During large SFTP transfers, the Web API latency spikes wildly because the network interface buffer fills up entirely (bufferbloat). How do you enforce Quality of Service (QoS) natively in Linux to prioritize HTTP over SFTP?

Accepted Answer

I would utilize `tc` (Traffic Control) to configure the kernel's Queuing Disciplines (QDISC). I would replace the default pfifo_fast with a hierarchical qdisc like HTB (Hierarchical Token Bucket). I create classes to guarantee bandwidth for different ports. Then, I configure `iptables` to mark packets (e.g., mark port 80/443 traffic as Priority 1, port 22 traffic as Priority 2). The `tc` filters then route the marked Web API packets into the high-priority, low-latency queues, aggressively throttling the SFTP traffic to prevent bufferbloat.

Question 45

Scenario: You are designing an ultra-low latency server architecture for High Frequency Trading (HFT). The microsecond overhead of the CPU scheduler moving processes between cores is unacceptable. How do you architect the Linux kernel to absolutely dedicate specific CPU cores solely to the matching engine application?

Accepted Answer

This requires 'CPU Isolation' (CPU pinning). At the bootloader level (GRUB), I append `isolcpus=2,3` to the kernel parameters, entirely removing cores 2 and 3 from the general Linux SMP scheduler. The kernel will no longer run routine system processes or interrupts on those cores. Inside the application, I use `sched_setaffinity` (or `taskset`) to explicitly bind the critical matching engine thread to those isolated cores, achieving 100% dedicated CPU cycle availability with zero context-switching overhead.

Question 46

Scenario: You manage an overarching storage server for a scientific computing cluster. It must provide sub-volume branching, native fast-cloning (Copy-on-Write) of massive 50GB datasets for experiments, and built-in checksumming to detect silent data corruption. Traditional ext4+LVM cannot do this efficiently. Checksumming is critical. What filesystem do you deploy?

Accepted Answer

I must deploy ZFS or Btrfs. Both are advanced next-generation filesystems featuring Copy-on-Write (CoW). CoW allows instantaneous, zero-byte snapshots and clones of the 50GB datasets because new data writes alongside the old data rather than overwriting it immediately. More importantly, they maintain cryptographic checksums of all data and metadata trees natively at the block level, allowing automatic detection and (if mirrored) self-healing of 'bit rot' and silent data corruption.

Question 47

Scenario: Docker is fundamentally just a wrapper around underlying Linux kernel primitives. What two specific Linux kernel features actually make container isolation and resource limitation mathematically possible?

Accepted Answer

The two foundational primitives are Namespaces and cgroups (Control Groups). Namespaces provide the strict logical isolation illusion (e.g., PID namespaces make a process think it's PID 1, Network namespaces provide isolated virtual network stacks). Cgroups enforce the physical resource limitations (capping the specific namespace at a maximum of 2GB RAM or 50% of CPU time). Docker merely orchestrates these intrinsic kernel primitives.

Question 48

Scenario: Two highly performant localized C processes need to exchange gigabytes of tracking data per second. Routing this over local TCP sockets (localhost) introduces unacceptable kernel networking stack overhead. What is the fastest Inter-Process Communication (IPC) method in Linux?

Accepted Answer

The absolute fastest IPC method is POSIX Shared Memory (`shm_open`, `mmap`). Instead of copying data structures back and forth through the kernel space via sockets or pipes, Shared Memory maps a specific region of memory directly into the address spaces of both disparate processes simultaneously. They can directly read and write to the same physical RAM array, operating at native bus speeds. This requires rigorous mutex locking (like POSIX semaphores) to prevent race conditions during concurrent accesses.

Question 49

Scenario: A specific ext4 system call (`ext4_sync_file`) occasionally takes 500ms instead of 2ms, causing latency spikes. To debug kernel-level execution times dynamically on a production system without recompiling the kernel, what tracing framework do you leverage?

Accepted Answer

I would leverage `Ftrace` or explicitly `bpftrace`. Because the issue is a specific kernel function, I use dynamic Kprobes (Kernel Probes). `bpftrace` allows me to write a simple script that attaches a kprobe at the entry of `ext4_sync_file` (recording a timestamp) and a kretprobe at the exit of the function (calculating the delta). This allows me to aggregate and histogram the exact microsecond execution times of that specific kernel function dynamically in real-time, uncovering the latency distribution.

Question 50

Scenario: A bare-metal production server completely locks up and crashes with a 'Kernel Panic' once a month. The screen displays the panic, but by the time you SSH in, it's dead, and `/var/log/syslog` contains nothing about the crash because the disk writes failed. How do you reliably capture the panic dump?

Accepted Answer

To capture panics that cannot be written to the local rotational disk, I configure `Kdump` (Kernel Crash Dump). Kdump pre-reserves a small chunk of memory to hold a secondary, isolated 'crash kernel'. When the primary kernel panics, the hardware immediately boots into the crash kernel using `kexec` without a hardware reset. This pristine crash kernel has network drivers loaded and is specifically configured to dump the crashed memory image (vmcore) securely over the network via SSH/NFS to a centralized diagnostic server for post-mortem analysis.

Question 51

Scenario: You configure a custom Nginx reverse proxy listening on port 8090 on RHEL. The configuration is perfect, file permissions are 777, iptables is disabled, but Nginx still throws a 'Permission Denied' when trying to bind to port 8090. What is blocking it, and how is it corrected defensively?

Accepted Answer

The blockage is caused by SELinux (Security-Enhanced Linux) enforcing Mandatory Access Control (MAC). Although DAC (file permissions) allowed it, the SELinux policy strictly restricts the `httpd_t` security context from binding to random non-standard ports. The defensive correction is NOT to disable SELinux. Instead, I explicitly update the SELinux port policy using `semanage`: `semanage port -a -t http_port_t -p tcp 8090`. This informs the MAC framework that port 8090 is legally authorized for web traffic.

Question 52

Scenario: An enterprise database server uses an array of modern NVMe solid-state drives. The server is heavily loaded, but IOPS performance is plateauing far below the hardware limit. You check the block device and see the I/O scheduler is set to 'mq-deadline'. Why might this restrict NVMe throughput, and what is the optimal path?

Accepted Answer

Legacy and complex I/O schedulers like deadline or cfq were engineered to deeply sort and merge read/write requests to minimize the physical movement of the mechanical read head on spinning hard drives. NVMe SSDs have no moving parts and possess massive internal parallelism to ingest millions of IOPS inherently. Operating complex sorting algorithms in the software kernel layer simply introduces CPU overhead and bottlenecks. The optimal path for NVMe is the `none` scheduler (or `kyber`), which totally bypasses software queue sorting and rapidly hands requests directly to the hardware's massive parallel multi-queue (blk-mq) architecture.

Linux Interview Questions

Interview Questions Database

Filter by Experience Level